View
217
Download
3
Tags:
Embed Size (px)
Citation preview
George Church 2:30- 3:00 PM Tue 3-Oct-2006 Cancer Genomics & Emerging Technologies
Thanks to: NCI/NIH HMS-CGCC
Single Cell, RNA, & Chromosome Sequencing Technologies
AppliedBiosystems-Agencourt, Affymetrix, Helicos, 454, Solexa, DNAdirect, CompleteGenomics, Codon Devices
Muliplex Polony Summary
Technologies for selecting genomic regions • Mbp scale for rearrangements• RNA tags & spliceforms• 1 to 200 bp scale for SNPs & exons (1%)
Low cost & high accuracy : $.07/kbp at 3E-7 errors Paired-end-tags (PET) for rearrangements
Detection of rare mutations (e.g. drug resistance alleles)
60 million reads per run
Selective genome sequencing
Numerous (100K) Small Regions (exons & point mutations) • PCR : 21 Mbp >$250K Sjoblom et al (2006) Science• Highly multiplexed molecular inversion probe genotyping: over 10,000 targeted SNPs genotyped in a single tube assay. Hardenbol et al. Genome Res. 2005 Feb;15(2):269-75. • Analyzing genes using closing and replicating circles. Nilsson et al. (2006) Trends Biotechnol 24:83.
One large region• Single molecule amplification 1 to 4 Mbp Zhang et al. 2006 Nature Biotech. 24:680 • Direct genomic [BAC hybridization] selection. [50% pure] Bashiardes et al (2005) Nat Methods 2: 63.
Selective genome sequencing
Shendure, et al. Science 309(5741):1728-32. Nilsson et al. (2006) Trends Biotechnol 24:83.
Red=Synthetic; Yellow=genomic
How do we optimize >100K 100mers ?
Two ways to capture alleles from genomic ss-DNA
In vitro
Paired-tag library
Gap fill
Cleave& ligate
Zhang, Chou, Shendure, Li, Leproust, Church, Dahl, Davis, Nilsson
How? 10 Mbp of oligos / $1000 chip
8K Atactic/Xeotron/Invitrogen
Photo-Generated Acid
12K Combimatrix/Codon Electrolytic
44K Agilent Ink-jet standard reagents
380K Nimblegen/GA Photolabile 5'protection
Tian et al. Nature. 432:1050; Carr & Jacobson 2004 NAR; Smith & Modrich 1997
PNAS
~1000X lower oligo costs
Amplify pools of 50mers using flanking universal PCR primers &
3 paths to 10X error correction
Digital Micromirror Array
Padlock, Molecular Inversion Probes (MIPs)
CG to CA,TG 35% of germline, 44% of colorectal cancer mutations(not restricted to single nucleotides nor common polymorphisms)
Vitkup, Sander, Church The Amino-acid Mutational Spectrum of Human Genetic Disease. Genome Biol. 4: R72. (CG to CA, TG)
CGCATG
Genomic DNA
Alternative alleles
Universal primers
R
L
Optional multiplex tag
Zhang, Chou, Shendure, Li, Leproust, Church, Dahl, Davis, Nilsson (10K to 1M 100-mer probes per pool -- see Kun Zhang’s poster)
Sequencing genomes from single cells via polymerase clones -- Plones
1) When we only have one cell as in Preimplantation Genetic Diagnosis/Haplotyping (PGD/PGH) or environmental samples (poor lab growth)2) Candidate chromosome region sequencing3) Prioritizing or pooling (rare) species based on an initial DNA screen (metagenomics)4) Multiple chromosomes in a cell or virus5) RNA splicing6) Cell-cell interactions (predator-prey, symbionts, commensals, parasites)
Phi-29 Polymerase Stand-displacement amplification
(single chromosome, cell , RNA or particle)Zhang, et al. (2006) . Nature Biotech. June ’06
Single molecule amplification sequencing
Multiple Displacement Amplification (MDA)
NBT (2006) 24: 657-8. .
Zhang et al., Nature Biotechnology (2006) 24:680
Note!: Single human cell 1000X easier than 5 Mbp
Single-cell sequencing: 4.7 Mbp (plones)
• Ultra-clean conditions for reduction of background amplification + Real-Time monitoring
• Post-amplification chip hybridization distinguishes alleles
• Amplification variation random & easily filled by PCR
EXON PATTERN Eph4 Eph4bDD TOTALEph4 FRATIO LSTP-PV------------7-8-9-10 609 764 1373 1.17 1E-4--------------8-9-10 320 390 710 1.13 3E-2----------6-7-8-9-10 431 251 682 -1.85 4E-18------4-5-6-7-8-9-10 218 216 434 -1.08 2E-1----------------9-10 68 143 211 1.96 7E-7--------5-6-7-8-9-10 86 39 125 -2.37 2E-6----3-4-5-6-7-8-9-10 40 56 96 1.30 9E-2------4-5---7-8-9-10 16 74 90 4.30 2E-9--2-3-4-5-6-7-8-9-10 44 28 72 -1.69 1E-21-2-3-4-5-6-7-8-9-10 22 5 27 -4.73 3E-4--------5---7-8-9-10 5 19 24 3.53 3E-3----3-4-5---7-8-9-10 1 15 16 13.95 4E-4--2-3-4-5---7-8-9-10 1 10 11 9.30 5E-3
CD44 Counts (RNA splicing forms)
Eph4 = mammary epithelial cell line
Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic)
Zhu, Shendure, Mitra, Church, Single Molecule Profiling of Alternative Pre-mRNA Splicing. Science 301:836-8.
‘Next Generation’ Sequencing Status
Multi-molecule Reaction Volume AB/APG Ligase beads 1 fL 454/Roche Pol beads 100,000 fL Solexa Pol term 1 fLCGI Ligase 1 fLAffymetrix Hybr array 100 fLSingle molecules Helicos Biosci Pol <1fLVisigen Biotech Pol FRET <1fLPacific Biosci Pol <1fLAgilent Nanopores <1fL
fL =1E-15 liters(femto)
(7/9 involve our lab)
Length& run-time vs. Accuracy&Cost
"Future improvements in the read lengths, demonstrated at 7 consecutive bases per tag (Shendure et al., 2005) and reductions in the run time, currently 60 hours, will make this a useful platform for resequencing." --Leamon, et al. (454) Gene Therapy and Regulation 3: 15-31
Note that without ‘future improvements’:Affymetrix/Illumina read-lengths of 1 base per tag are useful.
60 million reads/run is 10X faster per read than 500K reads/run.& 50X lower cost per bp due to lower reagent & instrument costs.
$140K$500/run
Autosampler
(96 wells)
(HPLC-like) syringe pump
Polony Sequencing Equipment
flow-cell
temperature
control
microscope with
xyz controls
CCD camera
In vitro paired tag libraries
Bead polonies via emulsion PCR
Monolayer immobilizati
on
Enrich amplified
beads
SOFTWARE
Images → Tag Sequences
Tag Sequences → GenomeSBE or SBLsequencing
Epifluorescence & Flow Cell $140K
Shendure, Porreca, Reppas, Lin, McCutcheon, Rosenbaum, Wang, Zhang, Mitra, Church (2005) Science 309:1728.
Integrated Polony Sequencing Pipeline(open source hardware, software, wetware)
Dressman et al PNAS 2003
3’5’ Tag 1 ePCR bead
7 bp 6 bp 7 bp 6 bp
Tag 2
Each yields 6 to 7 bp of contiguous sequence
26 bp new sequence per 135 bp amplicon
4 positions for paired-end anchor 'primers'
L M R
ACUCAUC…(3’)…TAGAGT????????????????TGAGTAG…(5’)
5’-Cy5-nnnnAnnnn-3’ 5’-Cy3-nnnnGnnnn-3’ 5’-TR-nnnnCnnnn-3’ 5’-Cy3+Cy5-nnnnTnnnn-3’
5'PO4
Sequencing by Ligation (SBL) with fluorescent combinatorial 9-mers
Excitation Emission 647 700 555 605 572 630 555 700
nm
Shendure, Porreca, et al. (2005) Science 309:1728
Consensus error rate Total errors (E.coli)
(Human)
1E-4 Bermuda/Hapmap 500
600,000
4E-5 454 200 240,000
3E-7 Polony-SbL @6X 0 1800
1E-8 Goal for 2006 0 60
Goal of genotyping & resequencing Discovery of variantse.g. cancer somatic mutations 4E-6 (&lab-evolved cells)
Why low error rates?
Also, effectively reduce (sub)genome target size by enrichment for exons or common SNPs to reduce cost & # false positives.
Microbial lab evolution
Lenski Citrate utilizationChurch Trp/Tyr exchangePalsson Glycerol utilizationEdwards Radiation resistanceIngram Lactate productionStephanopoulosEthanol resistanceMarliere ThermotoleranceJ&J Diarylquinoline resistance
(TB)DuPont 1,3-propanediol production
Position
TypeGen
eLocation Function Mechanism
986,334T >
Gomp
FPromoter-
10
Promoter of Non-specific transport channel
Makes promoter more consensus-like
985,797 T >
Gomp
FGlu > Ala Non-specific
transport channel
Makes pore bigger and more hydrophobic
931,9608
bp
lrp frameshiftGeneral
Transcriptional Regulator
?
Polony-based Whole-Genome Mutation Discovery of Trp clone
Shendure, et al. (2005) Science 309:1728
ompF – non specific transport channel• Glu-117 → Ala (in the pore)• Charged residue known to affect pore size and
selectivity• Can increase import & export capability
simultaneously
PCR amplification and sequencing of OmpF and Lrp from multiple clones from 3 independent lines of Trp/Tyr co-cultures:
OmpF: 42R G, L, C, 113 DV, 117 EAArg Gly, Leu, Cys ; Asp Val; Glu Ala
Hydrophillic and bulky hydrophobic and smaller
Promoter: -12AC, -35 CAMore consensus like
Lrp: 1bp deletion, 9bp deletion, 8bp deletion, IS2 insertion, R->L in DBD.Change in global gene regulation?
Heterogeneity within each time-point reflects colony heterogeneity.
Reppas, Lin, et al (unpublished)
Evolving Population: Multiple Genotypes, Similar Themes
0.48
0.5
0.52
0.54
0.56
0.58
0.6
0 10 20 30 40 50 60 70 80
Gro
wth
Ra
te(1
/hr)
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0 10 20 30 40 50 60 70 80
Number of passages
Lineage 1 Lineage 2 Lineage 3 Wild Type
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0 10 20 30 40 50 60 70 80
Number of passages
Gro
wth
Ra
te(1
/hr)
proximal tagplacement
distal tagplacement
1200000 1216001
1200000 12160011,206k 1,210k
Incorrect distanceRed=same strand
Black opposite strand
Mixture of wild & 2kb Inversion (pin)
Using paired ends, rearrangement & copy-number detection is >1000X easier than point mutation detection (6X vs 6000X)
Polonies for human inversions
>300 kbp long inverted repeats
Turner, Hurles, et al. 2006 Nat Methods 3:439-45. Sanger Inst. & HMS
Zhang et al. Nature Genet. Mar 2006
Polonies for haplotyping, recombination, LOH
Sequencing/genotyping on single human chromosomes
153Mbp
Monitoring resistance to BCR-ABL-kinase inhibitors with polonies during CML patient therapy Nardi, Raz, Chao, Wu, Stone, Cortes, Deininger, Church, Zhu, Daley (submitted)
E255K
T315I
M244V
Muliplex Polony Summary
Technologies for selecting genomic regions • Mbp scale for rearrangements• RNA tags & spliceforms• 1 to 200 bp scale for SNPs & exons (1%)
Low cost & high accuracy : $.07/kbp at 3E-7 errors Paired-end-tags (PET) for rearrangements
Detection of rare mutations (e.g. drug resistance alleles)
60 million reads per run
Polonies with & without beads or gels
Increases from 14 to 57 million polony beads per run & improves data quality.Kim, Porreca, Seidman, Church
unpublished
Consensus error rate Total errors (E.coli)
(Human)
1E-4 Bermuda/Hapmap 500
600,000
4E-5 454 @40X 200 240,000
3E-7 Polony-SbL @6X 0 1800
1E-8 Goal for 2006 0 60
Goal of genotyping & resequencing Discovery of variantse.g. cancer somatic mutations 4E-6 (&lab-evolved cells)
Why low error rates?
Also, effectively reduce (sub)genome target size by enrichment for exons or common SNPs to reduce cost & # false positives.
0
2
4
6
8
10
12
14
16
18
1E-8 1E-7 1E-6 1E-5 1E-4 1E-3 1E-2 1E-1
SBL $/kb
ABI $/kb
454 $/kb
$/kb @4E-5 $7 $9 0.8 0.07
$/3e9@1X 3M 300K $30K Paired ends yes no yesDevice $ 365K 400K 140K
Cost vs consensus error rate
AB3730 454 454 Sep05Sep05 PolonyPolonySep05 Sep 06
Cancer exon sequencing
$250K per sample (13,023 genes, 21 Mbp, 135,483 primer pairs) using PCR & capillary sequencing.
$3K per sample (estimate) using single tube capture & polonies
Sjoblom et al. The Consensus Coding Sequences of Human Breast and Colorectal Cancers. Science. 2006 Sep;
Davies et al. Somatic mutations of the protein kinase gene family in human lung cancer. Cancer Res. 2005 65:7591-5.