Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
July 7th 2009
DNA sequencing
Sequencing instruments at MPI EVA
5 x2 x
454 FLX Titanium Illumina Genome Analyzer ABI SOLiD HeliScope
ABI 3730/3730xl
pictures by Illumina, Roche, ABI, Helicos
Overview
• Sequencing technologies
• Sequencing strategies
• Sample preparation
Sanger sequencing - principle
= dideoxy method = chain termination method
Template (PCR product, plasmid) dNTP ddNTP
Sanger sequencing - principle
autoradiogram annotated bands
Original method
Sanger sequencing - technology
Improved over time, automated sequencing
- dye-labelled ddNTPs
- capillary electrophoresis +
ABI 3730/3730xl
=
Sanger sequencing – accuracy
Phred-scores = quality scores:
- peak height
- peak shape
- peak density
Sanger sequencing – throughput
200 €
€ run
2 cents
€ base
50-100 kb96500 –1100 b
Sanger
Bases per run
Sequences per run
Read length
Technology
Sanger sequencing – what you need
1) Sample: Clonal copies of your sequencing template
- PCR product
- plasmid
2) Sequencing primer
Sanger sequencing – strategies
A small and simple exon (1 000 bp)
DNA
PCR
PCR product
sequencing2 (4) sequences
A human mitochondrial genome (16 500 bp)
PCRPCR product
sequencing64 sequences
Sanger sequencing – strategies
Lysozyme, short exon 1 (500 bp); many paralogues!
DNA
PCRPCR product
subcloning
many sequences
A bush baby mitochondrial genome (16 500 bp); divergent!
LR-PCR LR-PCR product sequencing by
primer walking
64 sequences
sequencing
Sanger sequencing – strategies
Genome sequencing
DNA
Chop into pieces
A lot of sequencing,
assembly
Venter style
(WGS) subcloning
Sanger sequencing – strategies
Genome sequencing
Consortium style
(hierarchical shotgun)
Lander et al. 2001. Nature 409:860-921
454 sequencing – principle
Pyrosequencing (Nyrén / Ronaghi 1996)
Sequencing by synthesis
- Successive addition of nucleotides (dATPαS,dCTP,dGTP,dTTP)
- Nucleotide incorporation enzymatically translated into light
TACACGACGCTCTTCCGATCT TACACGACGCTCTTCCGATCTAAGTACACGACGCTCTTCCGATCTAATACACGACGCTCTTCCGATCTAAGTTTACACGACGCTCTTCCGATCTAAGTTG
dATPαSdATPαSdATPαSdATPαS
dATPαSdATPαSdCTPdCTPdCTPdCTP
dCTPdCTPdGTPdGTPdGTPdGTP
dGTPdGTPdTTPdTTPdTTPdTTP
dTTPdTTP
GATGTGCTGCGAGAAGGCTAGATTCAACGAGGAGCATTGCACTAGCCTTCTCGAGCATACG
454 sequencing – principle
Pyrosequencing massively parallelized by 454 Life Sciences
454 sequencing is not single molecule sequencing
⇒ Parallelization of sample preparation and amplification required
454 Sequenzier-Technologie
I - IIIPreparation of a
sequencing library
454 sequencing – principle
454 Sequenzier-Technologie
Emulsions PCR(emPCR)
IV
454 sequencing – principle
454 Sequenzier-Technologie
V
Bead enrichmentPrimer annealing
454 sequencing – principle
454 Sequenzier-Technologie
Sequenzierung
454 sequencing – principle
454 sequencing – accuracy
Phred Q44
Show homopolymer problems
454 sequencing – throughput
2 cents200 €50-100 kb96500 –1100
Sanger
0.001 cents6000 €500 Mb~1 million500454 Titanium
€ run € baseBases per run
Sequences per run
Read length
[bp]
Technology
454 sequencing – applications
Genome sequencing: Sanger
DNA
Chop into pieces
A lot of sequencing,
assembly
Venter style
(shotgun) subcloning
Genome sequencing: 454
DNA
Chop into pieces
less sequencing,
assembly
Venter style
(shotgun) library preparation
454 sequencing – applications
Bush baby mitochondrial genome: Sanger
Bush baby mitochondrial genome: 454
LR-PCR LR-PCR product sequencing by
primer walking
LR-PCR LR-PCR product Shotgun sequencing
660 sequences
~ 20x oversampling
64 sequences
454 sequencing – applications
PCR product sequencing (Lysozyme): Sanger
PCR product sequencing (Lysozyme): 454
DNA
PCRPCR product
subcloning
many sequences
sequencing
sequencinglibrary preparation
a LOT of sequences
DNA
PCRPCR product
454 sequencing – limitations and solutions
Large amounts of starting material (5 ug)
Meyer et al.; Nucleic Acids Research 2008
quantitative PCRreduces material demands from ~ 5 μg to ~ 20 pg
454 sequencing – limitations and solutions
Sequencing samples in parallel
- Initially limited to 16
GS FLX Titanium platform
- 1/16th lane ~ 25,000 sequences, 500 €
~ 2000 x coverage of a 6 kb plasmid
~ 700 x coverage of a mitochondrial genome
Meyer et al.Nucleic Acids Research 2007Nature Protocols 2008
454 sequencing – limitations and solutions
454 sequencing – limitations and solutions
Using barcoding (e.g. PTS)
- 1/16th lane ~ 25,000 sequences, 500 €
~ 100 plasmids (6 kb) with 20 x coverage
~ 35 mitochondrial genomes with 20 x coverage
~ 1,250 PCR products with 20 x coverage
Limitations in sequencing throughput
=> Limitations in sample preparation
Direct multiplex sequencing
Stiller et al. Genome Research (in press)
454 sequencing – limitations and solutions
Solexa (Illumina) sequencing – principle
Reversible terminator sequencing
Modified polymerase incorporates dye-labeled, terminated nucleotides
1) Incorporation of a single nucleotide
2) Detection of label
3) Removal of terminator/label
TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAACGTTGCAGGAGCATTGCACTAGCCTTCTCGAGCATACGGCAGAAGACGAACACACTCTTTCCCTACACGACGCTCTTCCGATCT
A
C
G
T
AC
G
T
CGTTTTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAACGTTGCAGGAGCATTGCACTAGCCTTCTCGAGCATACGGCAGAAGACGAAC
ACACTCTTTCCCTACACGACGCTCTTCCGATCT
A
C
G
T
AC
G
T
GTTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAACGTTGCAGGAGCATTGCACTAGCCTTCTCGAGCATACGGCAGAAGACGAAC
ACACTCTTTCCCTACACGACGCTCTTCCGATCT
A
C
G
T
AC
G
T
T
TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAACGTTGCAGGAGCATTGCACTAGCCTTCTCGAGCATACGGCAGAAGACGAACACACTCTTTCCCTACACGACGCTCTTCCGATCT G
TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAACGTTGCAGGAGCATTGCACTAGCCTTCTCGAGCATACGGCAGAAGACGAACACACTCTTTCCCTACACGACGCTCTTCCGATCT T
TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAACGTTGCAGGAGCATTGCACTAGCCTTCTCGAGCATACGGCAGAAGACGAACACACTCTTTCCCTACACGACGCTCTTCCGATCT CGTT
TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAACGTTGCAGGAGCATTGCACTAGCCTTCTCGAGCATACGGCAGAAGACGAACACACTCTTTCCCTACACGACGCTCTTCCGATCT
Solexa (Illumina) sequencing – principle
pictures by Illumina, Inc.
Sodium hydroxide melting
flow cell
pictures by Illumina, Inc.
Solexa (Illumina) sequencing – principle
Solexa (Illumina) sequencing – principle
Solexa (Illumina) sequencing – principle
Solexa (Illumina) sequencing – accuracy
1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATATATAAATTA1 AAAAAAAAAAAAACAAAAAACAAAAAAAAAACAAACAAAACAACAAATAA1 AAAAAAAATATTTAATTATTTTTATTTATAATTTTTTTGTTTTTTGTTTT1 AAACAAACCACACAAACAAAAAAACACAACAAAACAACACCACCACCCAA1 ATTCTATTTAATACAAATAAAATATCAATTTAAAACTACACTATACATAA1 CAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACA1 CAAATATATTTATATTTATTTTTTTATTTAATTTTTATATTTTTATTTAT1 CATTTATTCTTTTTTTTTTTTTTTTTATTTTTTTTTTTTTTTTTTTTTTT1 CCCCCCCCCCCCCCCCCCCCACCCCCCCCCCACCCACCCCACCCCCCCCC1 CCCCCCCCCCCCCCTTCCCCCCTCTTCTTCTCTCTTTTCTTTTTTTTTTT1 CCCCCCCCCCCCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT1 CCCGCGCCCCCCCGCCGCCGCGCCCAGCCCAGGCCACCACACACGCACCC1 CCTCCCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
Artifact sequences
Solexa (Illumina) sequencing – accuracy
1) Map all reads against a reference sequence
2) Eliminate reads with > 2 mismatches in the first 36 bp
3) Check error profiles for the remaining reads
0 10 20 30 40 500 10 20 30 40 50
0.00
0.01
0.02
0.03
Position in read
Mis
mat
ch ra
te
A/CA/GA/TC/AC/GC/TG/AG/CG/TT/AT/CT/GN
Bustard IbisAverage raw error: 2.02% Average raw error: 1.13%
Solexa (Illumina) sequencing – throughput
0.001 cents6,000 €500 Mb~1 million500454 Titanium
0.00004 cents
10,000 €28 Gb~ 140 million
2 x100
Solexa(currently)
2 cents200 €50-100 kb
96500 –1100
Sanger
€ run € baseBases per run
Sequences per run
Read length
[bp]
Technology
Ultra high-throughput sequencing
Solexa (Illumina) sequencing – applications
Genome Re-Sequencing
DNA
Chop into pieces
sequencing
mapping assemblylibrary preparation
8x coverage of human genome
Targeted Sequencing
~ 1 lane of the flowcell ~ 20 million sequences
1 million PCR products
12,500 mitochondrial genomes at 20 x coverage ?
Array capture
Glass slide
Probes
Genome-wide in situ exon capture for selective resequencingHodges et al., Nature Genetics., 2007
• ~5Mb targeted per array•7 arrays, whole exome•~98% of exons retrieved6,000 LR-PCRs
Target enrichment methods
Target enrichment methods
Combine multiplex array capture and sequencing
DNA
shearingpooling
prepare barcoded libraries
DNA
shearing
For each project, array with different targets
Solexa sequencing
How long until we only sequence genomes?
Other sequencing technologies
ABI/SOLiD Polonator Helicos
And dozens under development:
- PacBio
- Oxford Nanopore
- ...
Be warned...
Skills required for DNA sequencing projects
1 % 99 %
Thanks!
For your attention...
MPI EVAN
Martin Kircher