32
George Church 2:30- 3:00 PM Tue 3-Oct-2006 Cancer Genomics & Emerging Technologies Thanks to: NCI/NIH HMS-CGCC Single Cell, RNA, & Chromosome Sequencing Technologies AppliedBiosystems-Agencourt, Affymetrix, Helicos, 454, Solexa, DNAdirect, CompleteGenomics, Codon Devices

George Church 2:30- 3:00 PM Tue 3-Oct-2006 Cancer Genomics & Emerging Technologies Thanks to: NCI/NIH HMS-CGCC Single Cell, RNA, & Chromosome Sequencing

  • View
    217

  • Download
    3

Embed Size (px)

Citation preview

George Church 2:30- 3:00 PM Tue 3-Oct-2006 Cancer Genomics & Emerging Technologies

Thanks to: NCI/NIH HMS-CGCC

Single Cell, RNA, & Chromosome Sequencing Technologies

AppliedBiosystems-Agencourt, Affymetrix, Helicos, 454, Solexa, DNAdirect, CompleteGenomics, Codon Devices

Muliplex Polony Summary

Technologies for selecting genomic regions • Mbp scale for rearrangements• RNA tags & spliceforms• 1 to 200 bp scale for SNPs & exons (1%)

Low cost & high accuracy : $.07/kbp at 3E-7 errors Paired-end-tags (PET) for rearrangements

Detection of rare mutations (e.g. drug resistance alleles)

60 million reads per run

Selective genome sequencing

Numerous (100K) Small Regions (exons & point mutations) • PCR : 21 Mbp >$250K Sjoblom et al (2006) Science• Highly multiplexed molecular inversion probe genotyping: over 10,000 targeted SNPs genotyped in a single tube assay. Hardenbol et al. Genome Res. 2005 Feb;15(2):269-75. • Analyzing genes using closing and replicating circles. Nilsson et al. (2006) Trends Biotechnol 24:83.

One large region• Single molecule amplification 1 to 4 Mbp Zhang et al. 2006 Nature Biotech. 24:680 • Direct genomic [BAC hybridization] selection. [50% pure] Bashiardes et al (2005) Nat Methods 2: 63.

Selective genome sequencing

Shendure, et al. Science 309(5741):1728-32. Nilsson et al. (2006) Trends Biotechnol 24:83.

Red=Synthetic; Yellow=genomic

How do we optimize >100K 100mers ?

Two ways to capture alleles from genomic ss-DNA

In vitro

Paired-tag library

Gap fill

Cleave& ligate

Zhang, Chou, Shendure, Li, Leproust, Church, Dahl, Davis, Nilsson

How? 10 Mbp of oligos / $1000 chip

8K Atactic/Xeotron/Invitrogen

Photo-Generated Acid

12K Combimatrix/Codon Electrolytic

44K Agilent Ink-jet standard reagents

380K Nimblegen/GA Photolabile 5'protection

Tian et al. Nature. 432:1050; Carr & Jacobson 2004 NAR; Smith & Modrich 1997

PNAS

~1000X lower oligo costs

Amplify pools of 50mers using flanking universal PCR primers &

3 paths to 10X error correction

Digital Micromirror Array

Padlock, Molecular Inversion Probes (MIPs)

CG to CA,TG 35% of germline, 44% of colorectal cancer mutations(not restricted to single nucleotides nor common polymorphisms)

Vitkup, Sander, Church The Amino-acid Mutational Spectrum of Human Genetic Disease. Genome Biol. 4: R72. (CG to CA, TG)

CGCATG

Genomic DNA

Alternative alleles

Universal primers

R

L

Optional multiplex tag

Zhang, Chou, Shendure, Li, Leproust, Church, Dahl, Davis, Nilsson (10K to 1M 100-mer probes per pool -- see Kun Zhang’s poster)

Sequencing genomes from single cells via polymerase clones -- Plones

1) When we only have one cell as in Preimplantation Genetic Diagnosis/Haplotyping (PGD/PGH) or environmental samples (poor lab growth)2) Candidate chromosome region sequencing3) Prioritizing or pooling (rare) species based on an initial DNA screen (metagenomics)4) Multiple chromosomes in a cell or virus5) RNA splicing6) Cell-cell interactions (predator-prey, symbionts, commensals, parasites)

Phi-29 Polymerase Stand-displacement amplification

(single chromosome, cell , RNA or particle)Zhang, et al. (2006) . Nature Biotech. June ’06

Single molecule amplification sequencing

Multiple Displacement Amplification (MDA)

NBT (2006) 24: 657-8. .

Zhang et al., Nature Biotechnology (2006) 24:680

Note!: Single human cell 1000X easier than 5 Mbp

Single-cell sequencing: 4.7 Mbp (plones)

• Ultra-clean conditions for reduction of background amplification + Real-Time monitoring

• Post-amplification chip hybridization distinguishes alleles

• Amplification variation random & easily filled by PCR

EXON PATTERN Eph4 Eph4bDD TOTALEph4 FRATIO LSTP-PV------------7-8-9-10 609 764 1373 1.17 1E-4--------------8-9-10 320 390 710 1.13 3E-2----------6-7-8-9-10 431 251 682 -1.85 4E-18------4-5-6-7-8-9-10 218 216 434 -1.08 2E-1----------------9-10 68 143 211 1.96 7E-7--------5-6-7-8-9-10 86 39 125 -2.37 2E-6----3-4-5-6-7-8-9-10 40 56 96 1.30 9E-2------4-5---7-8-9-10 16 74 90 4.30 2E-9--2-3-4-5-6-7-8-9-10 44 28 72 -1.69 1E-21-2-3-4-5-6-7-8-9-10 22 5 27 -4.73 3E-4--------5---7-8-9-10 5 19 24 3.53 3E-3----3-4-5---7-8-9-10 1 15 16 13.95 4E-4--2-3-4-5---7-8-9-10 1 10 11 9.30 5E-3

CD44 Counts (RNA splicing forms)

Eph4 = mammary epithelial cell line

Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic)

Zhu, Shendure, Mitra, Church, Single Molecule Profiling of Alternative Pre-mRNA Splicing. Science 301:836-8.

Reading PoloniesBeads or not, Ligase or Polymerase

G

A

C

T

‘Next Generation’ Sequencing Status

Multi-molecule Reaction Volume AB/APG Ligase beads 1 fL 454/Roche Pol beads 100,000 fL Solexa Pol term 1 fLCGI Ligase 1 fLAffymetrix Hybr array 100 fLSingle molecules Helicos Biosci Pol <1fLVisigen Biotech Pol FRET <1fLPacific Biosci Pol <1fLAgilent Nanopores <1fL

fL =1E-15 liters(femto)

(7/9 involve our lab)

Length& run-time vs. Accuracy&Cost

"Future improvements in the read lengths, demonstrated at 7 consecutive bases per tag (Shendure et al., 2005) and reductions in the run time, currently 60 hours, will make this a useful platform for resequencing." --Leamon, et al. (454) Gene Therapy and Regulation 3: 15-31 

Note that without ‘future improvements’:Affymetrix/Illumina read-lengths of 1 base per tag are useful.

60 million reads/run is 10X faster per read than 500K reads/run.& 50X lower cost per bp due to lower reagent & instrument costs.

$140K$500/run

Autosampler

(96 wells)

(HPLC-like) syringe pump

Polony Sequencing Equipment

flow-cell

temperature

control

microscope with

xyz controls

CCD camera

In vitro paired tag libraries

Bead polonies via emulsion PCR

Monolayer immobilizati

on

Enrich amplified

beads

SOFTWARE

Images → Tag Sequences

Tag Sequences → GenomeSBE or SBLsequencing

Epifluorescence & Flow Cell $140K

Shendure, Porreca, Reppas, Lin, McCutcheon, Rosenbaum, Wang, Zhang, Mitra, Church (2005) Science 309:1728.

Integrated Polony Sequencing Pipeline(open source hardware, software, wetware)

Dressman et al PNAS 2003

3’5’ Tag 1 ePCR bead

7 bp 6 bp 7 bp 6 bp

Tag 2

Each yields 6 to 7 bp of contiguous sequence

26 bp new sequence per 135 bp amplicon

4 positions for paired-end anchor 'primers'

L M R

ACUCAUC…(3’)…TAGAGT????????????????TGAGTAG…(5’)

5’-Cy5-nnnnAnnnn-3’ 5’-Cy3-nnnnGnnnn-3’ 5’-TR-nnnnCnnnn-3’ 5’-Cy3+Cy5-nnnnTnnnn-3’

5'PO4

Sequencing by Ligation (SBL) with fluorescent combinatorial 9-mers

Excitation Emission 647 700 555 605 572 630 555 700

nm

Shendure, Porreca, et al. (2005) Science 309:1728

Consensus error rate Total errors (E.coli)

(Human)

1E-4 Bermuda/Hapmap 500

600,000

4E-5 454 200 240,000

3E-7 Polony-SbL @6X 0 1800

1E-8 Goal for 2006 0 60

Goal of genotyping & resequencing Discovery of variantse.g. cancer somatic mutations 4E-6 (&lab-evolved cells)

Why low error rates?

Also, effectively reduce (sub)genome target size by enrichment for exons or common SNPs to reduce cost & # false positives.

Microbial lab evolution

Lenski Citrate utilizationChurch Trp/Tyr exchangePalsson Glycerol utilizationEdwards Radiation resistanceIngram Lactate productionStephanopoulosEthanol resistanceMarliere ThermotoleranceJ&J Diarylquinoline resistance

(TB)DuPont 1,3-propanediol production

Position

TypeGen

eLocation Function Mechanism

986,334T >

Gomp

FPromoter-

10

Promoter of Non-specific transport channel

Makes promoter more consensus-like

985,797 T >

Gomp

FGlu > Ala Non-specific

transport channel

Makes pore bigger and more hydrophobic

931,9608

bp

lrp frameshiftGeneral

Transcriptional Regulator

?

Polony-based Whole-Genome Mutation Discovery of Trp clone

Shendure, et al. (2005) Science 309:1728

ompF – non specific transport channel• Glu-117 → Ala (in the pore)• Charged residue known to affect pore size and

selectivity• Can increase import & export capability

simultaneously

PCR amplification and sequencing of OmpF and Lrp from multiple clones from 3 independent lines of Trp/Tyr co-cultures:

OmpF: 42R G, L, C, 113 DV, 117 EAArg Gly, Leu, Cys ; Asp Val; Glu Ala

Hydrophillic and bulky hydrophobic and smaller

Promoter: -12AC, -35 CAMore consensus like

Lrp: 1bp deletion, 9bp deletion, 8bp deletion, IS2 insertion, R->L in DBD.Change in global gene regulation?

Heterogeneity within each time-point reflects colony heterogeneity.

Reppas, Lin, et al (unpublished)

Evolving Population: Multiple Genotypes, Similar Themes

0.48

0.5

0.52

0.54

0.56

0.58

0.6

0 10 20 30 40 50 60 70 80

Gro

wth

Ra

te(1

/hr)

0.12

0.14

0.16

0.18

0.2

0.22

0.24

0 10 20 30 40 50 60 70 80

Number of passages

Lineage 1 Lineage 2 Lineage 3 Wild Type

0.12

0.14

0.16

0.18

0.2

0.22

0.24

0 10 20 30 40 50 60 70 80

Number of passages

Gro

wth

Ra

te(1

/hr)

proximal tagplacement

distal tagplacement

1200000 1216001

1200000 12160011,206k 1,210k

Incorrect distanceRed=same strand

Black opposite strand

Mixture of wild & 2kb Inversion (pin)

Using paired ends, rearrangement & copy-number detection is >1000X easier than point mutation detection (6X vs 6000X)

Polonies for human inversions

>300 kbp long inverted repeats

Turner, Hurles, et al. 2006 Nat Methods 3:439-45. Sanger Inst. & HMS

Zhang et al. Nature Genet. Mar 2006

Polonies for haplotyping, recombination, LOH

Sequencing/genotyping on single human chromosomes

153Mbp

Monitoring resistance to BCR-ABL-kinase inhibitors with polonies during CML patient therapy Nardi, Raz, Chao, Wu, Stone, Cortes, Deininger, Church, Zhu, Daley (submitted)

E255K

T315I

M244V

Muliplex Polony Summary

Technologies for selecting genomic regions • Mbp scale for rearrangements• RNA tags & spliceforms• 1 to 200 bp scale for SNPs & exons (1%)

Low cost & high accuracy : $.07/kbp at 3E-7 errors Paired-end-tags (PET) for rearrangements

Detection of rare mutations (e.g. drug resistance alleles)

60 million reads per run

.

.

Polonies with & without beads or gels

Increases from 14 to 57 million polony beads per run & improves data quality.Kim, Porreca, Seidman, Church

unpublished

Consensus error rate Total errors (E.coli)

(Human)

1E-4 Bermuda/Hapmap 500

600,000

4E-5 454 @40X 200 240,000

3E-7 Polony-SbL @6X 0 1800

1E-8 Goal for 2006 0 60

Goal of genotyping & resequencing Discovery of variantse.g. cancer somatic mutations 4E-6 (&lab-evolved cells)

Why low error rates?

Also, effectively reduce (sub)genome target size by enrichment for exons or common SNPs to reduce cost & # false positives.

0

2

4

6

8

10

12

14

16

18

1E-8 1E-7 1E-6 1E-5 1E-4 1E-3 1E-2 1E-1

SBL $/kb

ABI $/kb

454 $/kb

$/kb @4E-5 $7 $9 0.8 0.07

$/3e9@1X 3M 300K $30K Paired ends yes no yesDevice $ 365K 400K 140K

Cost vs consensus error rate

AB3730 454 454 Sep05Sep05 PolonyPolonySep05 Sep 06

Cancer exon sequencing

$250K per sample (13,023 genes, 21 Mbp, 135,483 primer pairs) using PCR & capillary sequencing.

$3K per sample (estimate) using single tube capture & polonies

Sjoblom et al. The Consensus Coding Sequences of Human Breast and Colorectal Cancers. Science. 2006 Sep;

Davies et al. Somatic mutations of the protein kinase gene family in human lung cancer. Cancer Res. 2005 65:7591-5.