View
13
Download
0
Category
Preview:
Citation preview
Illumina’s sequencing by synthesisAnalysis of Next-Generation Sequencing Data
Friederike Dündar
Applied Bioinformatics Core
Slides at https://bit.ly/2T3sjRg1
January 21, 2020
1https://physiology.med.cornell.edu/faculty/skrabanek/lab/angsd/schedule_2020/F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 1 / 38
1 DNA Sequencing Overview & Recap
2 Template preparation
3 Sequencing-by-synthesis
4 Single and paired-end reads
5 References
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 2 / 38
DNA Sequencing Overview & Recap
DNA Sequencing Overview & Recap
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 3 / 38
DNA Sequencing Overview & Recap
Three Generations of DNA Sequencing
1st: Sanger sequencing [Sanger et al., 1977]I Cost per Mb: USD 2,400I Read length: 800 bpI Run time: 3 hrs
2nd: Next-generation or high-throughput sequencing [Illumina]I Cost per Mb: (less than) USD 0.07I Read length: 50-150 bpI Run time: 10 days
3rd: Single-molecule and/or long-read sequencing [PacBio]I Cost per Mb: USD 0.13-0.6I Read length: 1.4 kbI Run time: 0.5-2h
Ease-of-use and through-put have been dramatically increased at the costof (some) accuracy.
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 4 / 38
DNA Sequencing Overview & Recap
Three Generations of DNA Sequencing
Table from Keith [2017]
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 5 / 38
DNA Sequencing Overview & Recap
NGS = Illumina-based sequencing
In practice, Illumina’ssequencing platform is by farthe most dominant onethanks to its high throughput,constant improvements, andlibrary preparation support(kits).
Since acquiring Solexa in2006, Illumina has beensetting the pace in terms ofoptimizing yield and costs(e.g. Reuter et al. [2015]).
By mid-2019, PacBio was expected to belong toIllumina, too – on Jan 2, 2020, Illumina steppedaway from the deal with a $98M termination fee.
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 6 / 38
DNA Sequencing Overview & Recap
Main steps of typical NGS experiments
TEMPLATEPREP
Obtaining themolecules ofinterest:
DNA, RNA,nucleotide-protein
complexes⇓
Librarypreparation:
fragmentation andligation of
sequencing adapters⇓
Amplification
SEQUENCING
Sequencing bySynthesis
vs.Sequencing by
Ligation
short reads vs. longreads
BIOINFORMATICS
Base calling⇓
AlignmentIdentifying loci ofthe sequencedfragments
⇓Additionalprocessing
⇓Interpretation
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 7 / 38
Template preparation
Template preparation
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 8 / 38
Template preparation
Template preparation
1 Nucleic acid extraction2 Library preparation ⇒ adapters for sequencing3 Clonal amplification ⇒ making sure the signal isgoing to be strong enough
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 9 / 38
Template preparation
Template preparation
1. DNA/RNA extractionNucleic acids must be purified out of a mix of all sorts of organic andinorganic molecules.
Fig. from: https://en.wikipedia.org/wiki/EukaryoteF. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 10 / 38
Template preparation
1. DNA/RNA extraction
Basic stepsGoal: Little or no degradation and complete profiling of the entire lengthof each DNA or RNA molecule.
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 11 / 38
Template preparation
1. DNA/RNA extraction
LysisLysis = release of nucleic acids (NA) from cells/nuclei (= cell &nucleus destruction) using
I salt solutions, detergents, lytic enzymes orI physical forces: mechanical force, heat, freezing
different cells (bacteria, plant cells, mammalian tissues. . . ) have verydifferent optimal lysis properties (see Thatcher [2015]!)
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 12 / 38
Template preparation
1. DNA/RNA extraction
LysisLysis = release of nucleic acids (NA) from cells/nuclei (= cell &nucleus destruction) using
I salt solutions, detergents, lytic enzymes orI physical forces: mechanical force, heat, freezing
different cells (bacteria, plant cells, mammalian tissues. . . ) have verydifferent optimal lysis properties (see Thatcher [2015]!)
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 12 / 38
Template preparation
1. DNA/RNA extraction
LysisLysis = release of nucleic acids (NA) from cells/nuclei (= cell &nucleus destruction) using
I salt solutions, detergents, lytic enzymes orI physical forces: mechanical force, heat, freezing
different cells (bacteria, plant cells, mammalian tissues. . . ) have verydifferent optimal lysis properties (see Thatcher [2015]!)
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 12 / 38
Template preparation
1. DNA/RNA extraction
LysisLysis = release of nucleic acids (NA) from cells/nuclei (= cell &nucleus destruction) using
I salt solutions, detergents, lytic enzymes orI physical forces: mechanical force, heat, freezing
different cells (bacteria, plant cells, mammalian tissues. . . ) have verydifferent optimal lysis properties (see Thatcher [2015]!)
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 12 / 38
Template preparation
1. DNA/RNA Extraction
Separate NA: Liquid-liquid extraction (Phenol-Chloroform)
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 13 / 38
Template preparation
1. DNA/RNA extraction
Separate NA: Solid-phase DNA extractionliquid-liquid extraction relies on toxic chemicals and is difficult toautomate/standardizesolid phase extraction is based on silica molecules (e.g. within acolumn or as magnetic silica-based beads) that will bind the nucleicacids in the presence of a chaotropic buffer a
non-DNA components are washed away, before releasing the DNA fromthe solid adsorber
aA chaotrope is an ion that disrupts hydrogen bonding, leading to higherprotein solubility in water.
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 14 / 38
Template preparation
2. Library preparation: getting the NA molecules ready forthe sequencer
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 15 / 38
Template preparation
2. Library preparation
TruSeq Library Prep Protocol Nextera Library Prep Protocol
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 16 / 38
Template preparation
Different library preparations may yield differentdistributions of PCR fragment sizes – should be suited tothe question at hand
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 17 / 38
Template preparation
What to consider before choosing a library preparation
1 Sample typeI High quality DNA? Easy to extract?I How much?
2 Experiment goalI RNA-seq, ChIP-seq, variant identification, . . . ?
3 Beware of excess PCR cycles!
Library preps all come with their own advantages and disadvantages! Knowwhat to look for during and talk to other people (in your lab, the
sequencing facility, online. . . )!
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 18 / 38
Template preparation
Loading the library onto the flowcell
Following library prep, the DNA fragments are floated over the flowcell,which is essentially a glass side full of oligonucleotides that arecomplementary to the adapters of the library, thereby leading to the physicalattachment of the DNA fragments.
8 microfluidic channels (="lanes")within the channels, thesequencing reaction will happen
Figure from Illumina Inc [2015]
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 19 / 38
Template preparation
3. Clonal amplification = cluster generation
Flowcell
Clusters
To generate strong signals duringsequencing, every fragment is"cloned", yielding physicallyseparate clusters of DNAfragments with identicalsequences.
Ideally, the fragments representthe full genome.
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 20 / 38
Template preparation
3. Clonal amplification = cluster generation via PCR
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 21 / 38
Sequencing-by-synthesis
Sequencing-by-synthesis
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 22 / 38
Sequencing-by-synthesis
Decoding the DNA: DNA polymerasecannot start DNA synthesis from scratch, always needs primersrelies on the presence of a template strand, which is complemented
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 23 / 38
Sequencing-by-synthesis
Identifying the order of the nucleotides for every fragmentIllumina’s sequencing is based on fluorophore-labelled dNTPs with reversibleterminator elements that will become incorporated and excited by a laser one at atime.
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 24 / 38
Sequencing-by-synthesis
The number of cycles determines the read length50-150 cycle repetitions = 50-150 bp read length
The actual raw data of Illumina sequencing are images, but nowadays Illumina willreturn the base calls, i.e. text files of As, Cs, Ts, Gs.
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 25 / 38
Sequencing-by-synthesis
The number of flowcell lanes determines the sequencingdepthEvery read represents one cluster on the flowcell.
every cluster = one DNA fragmentthe more clusters one sequences, the more information (= reads) onegets
Machine Yield per lane
HiSeq4000 400 mio readsNovaSeq 800-2500 mio reads
Application Recommended seq. depth
differential gene expression 20 - 50 mio SR, 75 bpvariant calling 30-200x coveragewhole-genome bisulfite sequencing 30x coverage
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 26 / 38
Single and paired-end reads
Single and paired-end reads
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 27 / 38
Single and paired-end reads
Types of reads
Single reads are the cheaper.Paired-end (PE) reads arehelpful for:
alignment along repetitiveregions
chromosomalrearrangements and genefusion detection
de novo genome andtranscriptome assembly
precise information aboutthe size of the originalfragment (insert size)
PCR duplicate identification
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 28 / 38
Single and paired-end reads
Paired-end read generation
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 29 / 38
Single and paired-end reads
Paired-end read generation
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 30 / 38
Single and paired-end reads
Paired-end read generation
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 31 / 38
Single and paired-end reads
Paired-end read generation
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 32 / 38
References
References
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 33 / 38
References
See the website
https://bit.ly/2T3sjRg
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 34 / 38
References
References
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 35 / 38
References
Figure taken from the following publications:Levy and Myers [2016]
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 36 / 38
References
Illumina Inc. Patterned Flow Cell Technology. In Technical Spotlight:Sequencing, pages 1–2. 2015. URLhttps://www.illumina.com/content/dam/illumina-marketing/documents/products/technotes/patterned-flow-cell-technology-technical-note-770-2015-010.pdf.
Jonathan M. Keith, editor. Bioinformatics: Volume I: Data, SequenceAnalysis, and Evolution, volume 1525. Humana Press, methods inmolecular biology edition, 2017.
Shawn E. Levy and Richard M. Myers. Advancements in Next-GenerationSequencing. Annual Review of Genomics and Human Genetics, 2016. doi:10.1146/annurev-genom-083115-022413.
Jason A. Reuter, Damek V. Spacek, and Michael P. Snyder.High-Throughput Sequencing Technologies. Molecular Cell, 58(4):586–597, May 2015. doi: 10.1016/j.molcel.2015.05.004.
F. Sanger, S. Nicklen, and A. R. Coulson. DNA sequencing withchain-terminating inhibitors. Proceedings of the National Academy ofSciences, 74(12), 1977. doi: 10.1073/pnas.74.12.5463.
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 37 / 38
References
Stephanie A. Thatcher. DNA/RNA preparation for molecular detection.Clinical Chemistry, 2015. doi: 10.1373/clinchem.2014.221374.
F. Dündar (ABC, WCM) Illumina’s sequencing by synthesis January 21, 2020 38 / 38
Recommended