27
Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004: Agencourt, 454, Microchip, 2005: Nanofluidics, Network, VisiGen

Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

Thanks to:

George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM

Personal Genomics meets Quantitative Proteomics

NHGRI Seq Tech 2004: Agencourt, 454, Microchip, 2005: Nanofluidics, Network, VisiGen Affymetrix, Helicos, Solexa-Lynx

Page 2: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

"Open-source" Personal Genome Project (PGP)

• Harvard Medical School IRB Human Subjects protocol submitted 16-Sep-2004, approved Aug 31, 2005.

• Gradual plan. Start with "highly-informed" individuals consenting to non-anonymous genomes & extensive phenotypes (medical records, imaging, omics).

• Cell lines in Coriell NIGMS Repository

• Diploid genome subsets at $0.1/kb, <3E-7 FP Errors How? Polony bead Sequencing-by-Ligation (SbL)

Page 3: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

Analyses of single chromosomes (single cells , RNAs, particles)

(1) When we only have one cell as in Preimplantation Genetic Diagnosis (PGD) or environmental samples

(2) Candidate chromosome region sequencing

(3) Prioritizing or pooling (rare) species based on an initial DNA screen.

(4) Multiple chromosomes in a cell or virus

(5) RNA splicing

(6) Cell-cell interactions (predator-prey, symbionts, commensals, parasites)

Page 4: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

CD44 Exon Combinatorics (Zhu & Shendure)

• Alternatively Spliced Cell Adhesion Molecule• Specific variable exons are up-or-down-regulated in

various cancers (>2000 papers)• v6 & v7 enable direct binding to chondroitin sulfate,

heparin…

Zhu,J, et al. Science. 301:836-8.

Page 5: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

Zhu J, Shendure J, Mitra RD, Church GM. Science 301:836-8. Single molecule profiling of alternative pre-mRNA splicing.

EXON PATTERN Eph4 Eph4bDD TOTALEph4 FRATIO LSTP-PV------------7-8-9-10 609 764 1373 1.17 1E-4--------------8-9-10 320 390 710 1.13 3E-2----------6-7-8-9-10 431 251 682 -1.85 4E-18------4-5-6-7-8-9-10 218 216 434 -1.08 2E-1----------------9-10 68 143 211 1.96 7E-7--------5-6-7-8-9-10 86 39 125 -2.37 2E-6----3-4-5-6-7-8-9-10 40 56 96 1.30 9E-2------4-5---7-8-9-10 16 74 90 4.30 2E-9--2-3-4-5-6-7-8-9-10 44 28 72 -1.69 1E-21-2-3-4-5-6-7-8-9-10 22 5 27 -4.73 3E-4--------5---7-8-9-10 5 19 24 3.53 3E-3----3-4-5---7-8-9-10 1 15 16 13.95 4E-4--2-3-4-5---7-8-9-10 1 10 11 9.30 5E-3

Eph4 = murine mammary epithelial cell line

Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic)

CD44 RNA isoforms

Page 6: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

Molecular Weight Assessment of Proteins in Total Proteome Profiles Using 1D-PAGE and LC/MS/MS.

Proteome Sci. 3:6 (2005) Ahmad R, Nguyen DH, Wingerd MA, Church GM, Steffen MA.

Candidates for alternative splicing (AS), endoproteolytic processing (EPP), & post-translational modifications (PTMs) in Lymphoblastoid cells

Protein Name Predicted MW Observed MW Difference before & after leader cleavageCytochrome c oxidase subunit IV isoform 1 19577 2582 205NADH dehydrogenase 21750 5084 334Coproporphyrinogen oxidase 50175 13632 357MHC II, DQ 1 29733 25896 404NADH (ubiquinone) Fe-S protein 2 52545 48185 815Mito short-chain enoyl-coA hydratase 1 31371 27499 901Peptidylprolyl isomerase B (cyclophilin) 23742 19360 940

Page 7: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

-Glc-1P ADP-Glc -1,4-glucosyl-glucan glycogenCentralCarbonMetabol.

glgC

glgX

glgA glgB

glgP

Glycogen metabolism

Time (hours)

0 4 8 12 16 20 24 28 32 36 40 44 48

Nor

mal

ized

Exp

ress

ion

0.1

1

10

glgAglgBglgCglgXglgP

Zinser et al. unpublZinser et al. unpubl..

Light regulated Circadian metabolism

Page 8: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

Viral Photosynthetic Proteins

Podovirus P-SSP7 46 kb

PC HLIPs Fd D1

12kb 24kb

PC HLIPs Fd D1

12kb 24kb

~500 bp

HLIPs D1 D2

6.4kb 2.8kb

~500 bp

Myovirus P-SSM4 181 kbHLIPs D1 D2

6.4kb 2.8kb

Lindell, Sullivan, Chisholm et al. 2004Lindell, Sullivan, Chisholm et al. 2004

HLIP D1

Myovirus P-SSM2 255 kb

Page 9: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

Photosynthesis genes in marine viruses yield proteins during host infection.

Nature 2005 438:86-9. Lindell D, Jaffe JD,

Johnson ZI, Church GM, Chisholm SW.

Page 10: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

Photosynthesis genes in marine viruses yield proteins during host infection.

Nature 2005 438:86-9. Lindell D, Jaffe JD, Johnson ZI, Church GM, Chisholm SW.

15N 13C synthetic standards

host

phage

Page 11: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

Improving MS Peptide Coverage

? Ionization efficiencyX Ions outside the mass range of the analyzer ? Chromatographic behavior ? Sample preparation bias X Instrument duty cycle • Improve Spectra interpretation over current algorithms

– Details of fragmentation patterns– Dipeptide P, DE/KR, V.G intensity effects– B & Y ions unequal & co-dependent – More intense ions in middle of peptides

MDQuest: Mike Chou, Dan Schwartz, Steve Gygi, Josh Elias http://gygi.med.harvard.edu/dpsp/

Page 12: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

SEQUEST vs MDQUEST PerformanceROC Curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1 - Specificity (FP rate)

Se

nsi

tivity

(T

P r

ate

)

sequest

mdquest

Page 13: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

MapQuant is a program designed to isolate unique organic species and quantify their relative

abundances from an LC/MS experiment.

Scheme: Data from an LC/MS experiment are analyzed after being formatted into a data structure called a 2-D map, analogous to a gray-scale image.

Scan number: N N+1 N+2 N+3

2-D peptide map

time or scans

m/z

uni

ts

m/z

uni

ts

Page 14: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

2-D map

Retention time

m/z

uni

ts

MapQuant Gives a List of All Organic Species In the Sample

MapQuant

AbundanceVolume

Retention Time

RT m/z

MZ Charge Carbons

60123 27.30 0.118 828.938 0.0117 2 7530227 42.67 0.162 772.432 0.0102 2 7619363 48.01 0.150 913.449 0.0143 3 13513838 34.52 0.131 736.060 0.0092 3 1089726 28.17 0.129 797.385 0.0108 2 745370 34.19 0.131 762.360 0.0099 2 744729 52.25 0.153 906.988 0.0141 2 871612 47.22 0.136 786.402 0.0105 4 165151 24.65 0.116 883.525 0.0132 1 33

Page 15: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

MapQuant is a program designed to isolate unique organic species and quantify their relative

abundances from an LC/MS experiment.

Scheme: Data from an LC/MS experiment are analyzed after being formatted into a data structure called a 2-D map, analogous to a gray-scale image.

Scan number: N N+1 N+2 N+3

2-D peptide map

time or scans

m/z

uni

ts

m/z

uni

ts

Page 16: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

2-D map

Retention time

m/z

uni

ts

MapQuant Gives a List of All Organic Species In the Sample

MapQuant

AbundanceVolume

Retention Time

RT m/z

MZ Charge Carbons

60123 27.30 0.118 828.938 0.0117 2 7530227 42.67 0.162 772.432 0.0102 2 7619363 48.01 0.150 913.449 0.0143 3 13513838 34.52 0.131 736.060 0.0092 3 1089726 28.17 0.129 797.385 0.0108 2 745370 34.19 0.131 762.360 0.0099 2 744729 52.25 0.153 906.988 0.0141 2 871612 47.22 0.136 786.402 0.0105 4 165151 24.65 0.116 883.525 0.0132 1 33

Page 17: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

Leptos et al. Proteomics 2006

Page 18: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

MapQuant is publicly available at http://arep.med.harvard.edu/mapquant.html

Page 19: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

Leptos et al. Proteomics 2006

Page 20: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

Leptos et al. Proteomics 2006

Page 21: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

retention time (in min)

m/z

units

EKLAVSAR

QEPERSEK

DAFLSGER

??

?

MapQuant gives me a list of all organic species in the sample BUT

WHAT ARE THEIR IDENTITIES?

Page 22: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

MapQuant identifies approx. 2x104 organic species per LC/MS experiment.

ONLY ~ 500 (3%) organic species have fragmentation (CID) spectra and hence sequence IDs

retention time (in min)

EKLAVSAR

QEPERSEK

DAFLSGER

??

?m/z units

Dealing With Many Peptides (Organic Species)22

= CID spectrum or MS/MS event

Page 23: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

Dealing With Many Peptides (Organic Species)

retention time (in min)

EKLAVSAR

QEPERSEK

DAFLSGER

??

?

Database of 11845 peptides from ALL LC/MS experiments carried out on

Prochlorococcus samples

(rt, m/z) coordinatesm/z units

Page 24: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

Proteins observedin diel

experiment

Proteinsobserved in experimentsprior to diel

TOTAL NUMBER OF ORFS: 1742

1314 539

522792 17

Protein Distribution Among Experiments

Page 25: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

Sequence Coverage of the Protein groES

Page 26: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

Summary

Proteome Sci. 3:6 (2005) Ahmad R, Nguyen DH, Wingerd MA, Church GM, Steffen MA.

• Open Personal Genome Project (PGP) including Proteomics• Single molecule RNAs for alternative splicing (AS)• Gel –MS methods for endoproteolytic processing • MapQuest for MS quantitation without isotopic labeling

http://arep.med.harvard.edu

Page 27: Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004:

Thanks to:

George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM

Personal Genomics meets Quantitative Proteomics

NHGRI Seq Tech 2004: Agencourt, 454, Microchip, 2005: Nanofluidics, Network, VisiGen Affymetrix, Helicos, Solexa-Lynx