45
Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center

Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms

Embed Size (px)

DESCRIPTION

Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms. Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center. Outline. Tandem mass-spectrometry of peptides Detection of alternative splicing protein isoforms - PowerPoint PPT Presentation

Citation preview

Proteomics and Glycoproteomics(Bio-)Informatics

of Protein Isoforms

Nathan EdwardsDepartment of Biochemistry and

Molecular & Cellular Biology

Georgetown University Medical Center

Outline

Tandem mass-spectrometry of peptides

Detection of alternative splicing protein isoforms

Phyloproteomics using top-down mass-spec.

Characterization of glycoprotein microheterogeneity by mass-spectrometry

2

Mass Spectrometer

3

Ionizer

Sample

+_

Mass Analyzer Detector

• MALDI• Electro-Spray

Ionization (ESI)

• Time-Of-Flight (TOF)• Quadrapole• Ion-Trap

• ElectronMultiplier(EM)

Mass Spectrum

4

Mass is fundamental

5

Sample Preparation for MS/MS

6

Enzymatic Digestand

Fractionation

Single Stage MS

7

MS

Tandem Mass Spectrometry(MS/MS)

8

Precursor selection

Tandem Mass Spectrometry(MS/MS)

9

Precursor selection + collision induced dissociation

(CID)

MS/MS

Why Tandem Mass Spectrometry?

MS/MS spectra provide evidence for the amino-acid sequence of functional proteins.

Key concepts: Spectrum acquisition is unbiased Direct observation of amino-acid sequence Sensitive to small sequence variations

10

Unannotated Splice Isoform

Human Jurkat leukemia cell-line Lipid-raft extraction protocol, targeting T cells von Haller, et al. MCP 2003.

LIME1 gene: LCK interacting transmembrane adaptor 1

LCK gene: Leukocyte-specific protein tyrosine kinase Proto-oncogene Chromosomal aberration involving LCK in leukemias.

Multiple significant peptide identifications11

Unannotated Splice Isoform

12

Unannotated Splice Isoform

13

Translation start-site correction

Halobacterium sp. NRC-1 Extreme halophilic Archaeon, insoluble membrane

and soluble cytoplasmic proteins Goo, et al. MCP 2003.

GdhA1 gene: Glutamate dehydrogenase A1

Multiple significant peptide identifications Observed start is consistent with Glimmer 3.0

prediction(s)17

Halobacterium sp. NRC-1ORF: GdhA1

K-score E-value vs PepArML @ 10% FDR Many peptides inconsistent with annotated

translation start site of NP_279651

0 40 80 120 160 200 240 280 320 360 400 440

18

What if there is no "smoking gun" peptide…

20

What if there is no "smoking gun" peptide…

21

What if there is no "smoking gun" peptide…

22

HER2/Neu Mouse Model of Breast Cancer

Paulovich, et al. JPR, 2007 Study of normal and tumor mammary tissue

by LC-MS/MS 1.4 million MS/MS spectra

Peptide-spectrum assignments Normal samples (Nn): 161,286 (49.7%) Tumor samples (Nt): 163,068 (50.3%)

4270 proteins identified in total 2-unique generalized protein parsimony

23

Nascent polypeptide-associated complex subunit alpha

24

7.3 x 10-8

Pyruvate kinase isozymes M1/M2

25

2.5 x 10-5

Phyloproteomics

Fragment intact proteins (top-down MS)

Match the spectra to protein sequences

Place the organism phylogenetically

Works even for unknown microorganisms without any available sequences

26

27

E:\Yersinia Work\yr_inclusion 3/11/2009 3:43:13 PM yrohdei

RT: 19.04 - 25.39

19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0

Time (min)

0

20

40

60

80

100

0

20

40

60

80

100

Re

lative

Ab

un

da

nce

25.3619.9919.93

25.2720.04 25.2319.89 23.0322.97 23.08

20.1019.83 23.64 25.1923.7022.88 24.6324.5720.1422.82

20.2019.7822.7220.2519.48

22.5220.41 22.0821.8420.60 21.04

20.00

21.03 21.46

NL: 1.66E8

TIC MS yr_inclusion

NL: 1.01E7

TIC F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00] MS yr_inclusion

yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

10

20

30

40

50

60

70

80

90

100

Re

lative

Ab

un

da

nce

576.83z=2

840.16z=7

720.39z=2 903.81

z=3785.41

z=4694.62

z=4

584.57z=4

928.49z=4559.55

z=41804.48

z=?992.53

z=3200.78z=?

329.71z=?

1253.14z=?

555.29z=4

1610.27z=?

1883.75z=?

1491.23z=?

1118.93z=?

1666.89z=?

1345.30z=?

461.16z=?

756.70 +8 MW 6044.11

CID Protein Fragmentation Spectrum from Y. rohdei

28

E:\Yersinia Work\yr_inclusion 3/11/2009 3:43:13 PM yrohdei

RT: 19.04 - 25.39

19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0

Time (min)

0

20

40

60

80

100

0

20

40

60

80

100

Re

lative

Ab

un

da

nce

25.3619.9919.93

25.2720.04 25.2319.89 23.0322.97 23.08

20.1019.83 23.64 25.1923.7022.88 24.6324.5720.1422.82

20.2019.7822.7220.2519.48

22.5220.41 22.0821.8420.60 21.04

20.00

21.03 21.46

NL: 1.66E8

TIC MS yr_inclusion

NL: 1.01E7

TIC F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00] MS yr_inclusion

yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

10

20

30

40

50

60

70

80

90

100

Re

lative

Ab

un

da

nce

576.83z=2

840.16z=7

720.39z=2 903.81

z=3785.41

z=4694.62

z=4

584.57z=4

928.49z=4559.55

z=41804.48

z=?992.53

z=3200.78z=?

329.71z=?

1253.14z=?

555.29z=4

1610.27z=?

1883.75z=?

1491.23z=?

1118.93z=?

1666.89z=?

1345.30z=?

461.16z=?

756.70 +8 MW 6044.11

CID Protein Fragmentation Spectrum from Y. rohdei

Match to Y. pestis 50SRibosomal Protein L32

Exact match sequence…

29

Phylogeny: Protein vs DNA

30

Protein Sequence 16S-rRNA Sequence

What about mixtures?

31

34

DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128

Eight proteins identified with "large" |Δ|

Identified E. herbicola proteins

36

DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128

Extract N- and C-terminus sequence supported by at least 3 b- or y-ions

Identified E. herbicola proteins

37

E. herbicola protein sequences

39

Phylogenetic placement of E. herbicola

Phylogram Cladogramphylogeny.fr – "One-Click"

Glycoprotein Microheterogeneity

Glycosylation is important, but our analytic tools are rather rudimentary Detach glycans (PNGase-F) and analyze glycans Detach glycans (PNGase-F) and analyze peptides Get glycan structures, but no association with protein

or protein site, or Get glycosylation sites, but no association with glycan

structures. We analyze glycopeptides directly…

Challenges all facets of glycoproteomics40

Altered N-Glycosylation in Cancer

•41

NX

S/T

COO-

NH3+Fut-VIII(α1-6 Fuc)Comunale, 2010

GnT-V(β1-6 GlcNAc)Wang, 2007

ST-VI Gal1(α 2-6 NeuAc)Hedlund, 2008

Fut-VI(α1-3 Fuc)Higai,2008

Glycosyltransferase Expression or Glycan Analyses GalNAc Sialic Acid Gal GlcNAc Man

K. Chandler

The informatics challenge

Identify glycopeptides in large-scale tandem mass-spectrometry datasets Many glycopeptide enriched fractions Many tandem mass-spectra / fraction

Good, but not great, instrumentation QStar Elite – CID, good MS1/MS2 resolution

Strive for hypothesis-generating analysis Site-specific glycopeptide characterization Glycoform occupancy in differentiated samples

42

CID Glycopeptide Spectrum

43

Observations

Oxonium ions (204, 366) help distinguish glycopeptides from peptides… …but do little to identify the glycopeptide

Few peptide b/y-ions to identify peptides… …but intact peptide fragments are common

If the peptide can be guessed, then… …the glycan's mass can be determined

44

Hap

tog

lob

in (

HP

T_H

UM

AN

)

NLFLNHSE*NATAK

MVSHHNLTTGATLINE

VVLHPNYSQVDIGLIK

Haptoglobin Standard

45

• N-glycosylation motif (NX/ST)* Site of GluC cleavage

Pompach et al. Journal of Proteome Research 11.3 (2012): 1728–1740.

Tuning the filters…

We estimate the number of false-positives……so that the user can tune the search parameters

47

Application of Exoglycosidasesto locate Fucose

At ITIH4 site N517

48LPTQNITFQTE

LPTQNITFQTE

LPTQNITFQTE

LPTQNITFQTE

K. Chandler

NVVFVIDK ITIH4 Glycopeptide

49

K. Chandler

Similar Glycopeptides Spectra( mass Δ ~ +162 Da)

50

MVSHHNLTTGATLINE

?

+162 Da

Fragmented Glycopeptides( mass Δ ~ +162 Da)

51

MVSHHNLTTGATLINE

?

+162 Da

MVSHHNLTTGATLINE

Propagating Annotations

•MVS+A1G1

•MVS+A1G1

•MVS+A2G2

•MVS+A2G2

•MVS+A2G2

•VVL+A1G1

•VVL+A2G2

52

G. Berry

Summary

Mass-spectrometry coupled with protein chemistry and good informatics can look beyond the obvious to the unexpected...

…and there is plenty to find!

53

Acknowledgements

Edwards lab Kevin Chandler Gwenn Berry

Fenselau lab (UMD) Colin Wynne Avantika Dhabaria

Goldman lab (GU) Kevin Chandler Petr Pompach

NSF Graduate Fellowship (Chandler)

Funding: NCI

54