18
King's Next-Generation Sequencing Meeting Michelle Lupton

King's Next-Generation Sequencing Meeting Michelle Lupton

Embed Size (px)

Citation preview

Page 1: King's Next-Generation Sequencing Meeting Michelle Lupton

King's Next-Generation Sequencing MeetingMichelle Lupton

Page 2: King's Next-Generation Sequencing Meeting Michelle Lupton

Additional references

Linnarsson S. Recent advances in DNA sequencing methods - general principles of sample preparation. Exp Cell Res. 2010 May 1;316(8):1339-43. Epub 2010 Mar 6. Review. PubMed PMID: 20211618.

Buehler B, Hogrefe HH, Scott G, Ravi H, Pabón-Peña C, O'Brien S, Formosa R, Happe S. Rapid quantification of DNA libraries for next-generation sequencing. Methods. 2010 Apr;50(4):S15-8. Review. PubMed PMID: 20215015.-the use of real time PCR

Page 3: King's Next-Generation Sequencing Meeting Michelle Lupton

Stages in the library preparationSteps accompanied by numbers are those for which we suggest alternatives to the standard Illumina protocols. Numbers correspond to those given in Supplementary Protocols

Page 4: King's Next-Generation Sequencing Meeting Michelle Lupton

Fragmentation NebulizationUneconomical distribution of fragment size. Approximately half of the DNA vaporises

Adaptive Focused acoustics – CovarisAcoustic energy is controllably focused into the aqueous DNA sample by a dish-shaped transducer,

resulting in cavitatin events within the sample.17% of the sample is in the 200bp size range, and little DNA loss

Page 5: King's Next-Generation Sequencing Meeting Michelle Lupton

FragmentationEnzymatic digestion (Linnarsson 2010) Two recent commercial enzymatic fragmentation kits were introduced. Fragmentase (New England Biolabs) - based on V.vulnificus nuclease that generates

random nicks, and modified T7 endonuclease that recognises the nicks and cleaves the opposite strands.

Nextera (Epicentre) - based on random transposon insertion. Also introduces adapter sequences simultaneously with fragmentation.

Page 6: King's Next-Generation Sequencing Meeting Michelle Lupton

A-tailing, ligation and size selection

Artefacts from standard library prep;1. Bias in base composition

2. High frequency of chimeric sequences

3. Imperfect insert size distribution

Overcome by;1. Pair-end oligos

2. Gel extraction-melt slice at room temp-reduces GC bias

3. Improved efficiency of the end repair and A-tailing

4. Double size selection

5. Paired end size selection-only excise a 2mm size gel slice

Page 7: King's Next-Generation Sequencing Meeting Michelle Lupton

Figure 3A-tailing, ligation and size selectionGC plots before (a) and after (b) optimisation of gel extraction. The figures show the total area in which reads with a particular GC content are distributed, with the mean and standard deviation. The greater width of shaded area in plot a) indicates a wider dispersion of coverage for all values of GC content for which sequences were obtained.Agilent traces Bioanalyzer 2100 traces for two suboptimal libraries c) 60bp insert library, with optimised PCR, d) the same 60bp library with excess DNA in PCR e) 200bp insert library, showing shoulder of small fragments.Insert size distribution from sequenced human DNA using f) the standard and g) modified paired end library prep protocols

Page 8: King's Next-Generation Sequencing Meeting Michelle Lupton
Page 9: King's Next-Generation Sequencing Meeting Michelle Lupton

PCR Template quality -use optimized quantities of DNA template.

Use of high fidelity polymerases in an optimised reaction.

Use of solid phase reversible immobilization SPRI technology (SPRI) removes a higher proportion of primers and adapter dimers than spin columns.

Reduce the number of PCR cycles: 3ng DNA and 14 cycles of PCR amplification for single end libraries, 25ng DNA and 12 cycles for high complexity libraries, and 10ng DNA and 18 cycles for lower complexity samples. These quantities give the optimal compromise between clean libraries and a low frequency of duplicate sequences.

Possible to eliminate the PCR step by ligating on appropriate adaptors after A-tailing.

Direct sequencing of short amplicons.

Page 10: King's Next-Generation Sequencing Meeting Michelle Lupton

Figure 4PCRa) A ~200bp fragment library was prepared, and 10ng was amplified for 18 cycles using standard Illumina conditions, and with more optimal PCR conditions.b) After PCR we divided the library into two: half was purified following the standard Illumina protocol, through a Qiaquick PCR cleanup column, whereas the other was purified using SPRI technology. Each was then run on an agarose gel alongside a 100bp ladder to view the DNA species that remained.

Page 11: King's Next-Generation Sequencing Meeting Michelle Lupton

PCR

PCR duplication example;

LIBRARY

UNPAIRED READS EXAMINED

READ PAIRS EXAMINED

UNMAPPED READS

UNPAIRED READ DUPLICATES

READ PAIR DUPLICATES

READ PAIR OPTICAL DUPLICATES

PERCENT DUPLICATION

ESTIMATED LIBRARY SIZE

Pre hybridisation PCR cycles

Post hybridisation PCR cycles

KPOA0006 62519 5464702 986509 47544 2657808 15731 0.487918 3598454 5 16 KPOC0002 57432 3448312 624078 33414 542256 16503 0.160759 10024617 5 12 KPOA0005 45961 3143590 562785 21530 351362 9954 0.114359 13316441 5 16 KPOC0001 51485 2884859 547339 32763 596494 10627 0.210567 6055683 5 12 NA18507 21791 2187848 369343 15030 707778 7613 0.325319 2620318 5 12 KPOC0005 54337 3648741 681663 47073 2802655 14976 0.768841 858549 6 12 KPOA0010 25997 2449286 410855 17482 711382 8187 0.292461 3376826 5 12 KPOA0008 48848 3474580 628848 33125 1017855 10306 0.295632 4734070 5 12 KPOC0004 38812 2350763 396988 24710 528626 8809 0.228246 4462098 5 12 KPOC0003 201355 4528357 1027539 137213 2028173 23747 0.452963 3410538 5 12 KPOA0017 74130 3097782 600782 48333 1114520 15594 0.363235 3218521 5 12 KPOA0014 52506 3493530 618672 37310 1206707 11793 0.348136 3829677 5 12

Page 12: King's Next-Generation Sequencing Meeting Michelle Lupton

Quantification

Optimal concentration range of DNA that will yield clusters in the optimal density range.

Spectrophotometry is not accurate. From [bp] To [bp]

Corr. Area

% of Total

Average Size [bp]

Size distribution in CV [%] Conc. [pg/µl]

Molarity [pmol/l]

200 1,000 232.6 79 375 24.1 165.15 714.3

Quantitative PCR. Quantify unknown libraries against standard libraries that have been sequenced previously for which cluster number is known.

Electrophoresis with Agilent bioanalyser-Gives a check of size distribution.-Can be inaccurate for a small proportion of libraries, may be due to single stranded DNA not easily quantified when mixed with double stranded-Can use the bioanalyser to check size distribution and Fluorometery to determine the concentration more accurately (e.g. Qubit dsDNA BR Assay)

Page 13: King's Next-Generation Sequencing Meeting Michelle Lupton

Quantificationa) Cluster throughput as a

function of total clusters for 200 and 500bp inserts. The 500bp inserts underwent fewer cycles of cluster amplification (28, compared to 35 for the 200bp libraries), resulting in smaller clusters, and so a cluster density of 40-44k / tile (GA1) will produce the maximum yield from either insert size.

b) Standardisation of cluster density with qPCR quantification. Runs were grouped into 25-run bins and a boxplot plotted. After some initial problems with degradation of standards, cluster number has levelled out at ~35-40k / tile.

Page 14: King's Next-Generation Sequencing Meeting Michelle Lupton

DenaturationFor low concentrations of Double stranded DNA denaturation by heating can

damage DNA and introduce G+C bias.

Use Modified hybrization buffers; prefer use of 0.1NaOH to heating.

Subnanomolar libraries require an alternative buffer.1. Addition of Tris to illumina buffer prevents rise in pH.2. Diluting supplied 2M NaOH and using a greater volume reduces fluctuation

caused by pipetting error.

Page 15: King's Next-Generation Sequencing Meeting Michelle Lupton

Denaturationa) pH titration of hybridisation buffers. The

concentration of NaOH in DNA templates is 0.1M NaOH. Adding more than 8μl of this denatured template to the 1ml of Hybridisation Buffer prior to loading DNA onto the flowcell, increases the pH to above 10. This prevents efficient hybridisation, and thus the cluster density falls. The addition of Tris-HCl pH7.3 to the supplied bottles of Hybridisation Buffer dramatically increases buffering capacity, making template hybridisation more robust.

b) the addition of 5mM Tris-HCl pH 7.3 to Illumina Hybridisation Buffer allows a greater volume of denatured template to be added before high pH prevents effective annealing of templates to the oligos on the flowcell surface. This increases the robustness of cluster generation, by counteracting pipetting errors in the denaturation step.

Page 16: King's Next-Generation Sequencing Meeting Michelle Lupton

Amplification Quality control

After cluster amplification double stranded DNA on the flow cell can be stained using an intercalating dye to be detected by a fluorescence microscope.

Use on flow cells before linearization and blocking to confirm that the cluster density is appropriate.

Page 17: King's Next-Generation Sequencing Meeting Michelle Lupton

Additions to the method Careful DNA quantification before fragmentation and checking for degraded DNA. Use of low absorbing plastic ware (Linnarsson 2010), e.g Beckman Coulter “non stick” or

equivalent. Also advise to add some detergent (e.g. 0.02% Tween-20) to reduce absorption to tube walls.

The implementation of SPRI XP beads for all purification steps. The use of the bioanalyser to check concentration and size distribution after fragmentation. Cheaper alternatives to illumina kits, e.g. NEB kits, making own adapters and primers.

Page 18: King's Next-Generation Sequencing Meeting Michelle Lupton

Conclusion The Genome Analyzer is a powerful sequencing technology, Here the authors

describe a number of modifications that allow for more efficient library preparation, and which enable a stable workflow in a production environment.

At the Sanger Institute, they have several teams for every stage of sequencing. All steps in the process are recorded using custom-written lab-tracking and run-tracking database software.

Combined with improvements to the image analysis software and a faster run time, they predicted that by Christmas 2008, their output will reach 6-10 terabases of high-quality sequence per year - equivalent to 180 human genomes at 15-fold coverage, or approximately 200,000 bases per second.

The improved workflow and high yield should maintain the Genome Analyzer as their next-generation sequencing platform of choice for the immediate future. But how long this remains true depends upon the performance of existing rival technologies, and those that are on the horizon. For example Oxford Nanopore Technologies, and Pacific Biosciences’ Single Molecule Real Time technology which promise to bring us closer to the eagerly anticipated $1,000 genome.