1
Figure 2: Example of a figure caption Pacific Biosciences, PacBio, SMRT, SMRTbell and the Pacific Biosciences logo are trademarks of Pacific Biosciences of California, Inc. All other trademarks are the property of their respective owners. © 2013 Pacific Biosciences of California, Inc. All rights reserved. Greater than 10 kb Read Lengths Routine when Sequencing with Pacific Biosciences’ XL Release Cheryl Heiner, Primo Baybayan, Susana Wang, Yan Guo, Meredith Ashby, Joan Wilson, Kevin Travers, Jason Chin, and Jason Underwood Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025 PacBio’s SMRT ® Sequencing produces the longest read lengths of any sequencing technology currently available. There have been a number of recent improvements to further extend the length of PacBio ® RS reads. With an exponential read length distribution, there are many reads greater than 10 kb, and some reads at or beyond 20 kb. These improvements include library prep methods for generating >10 kb libraries, a new XL polymerase, magnetic bead loading, stage start, new XL sequencing kits, and increasing data collection time to 120 minutes per SMRT Cell. Each of these features will be described, with data illustrating the associated gains in performance. With these developments, we are able to obtain greatly improved and, in some cases, completed assemblies for genomes that have been considered impossible to assemble in the past, because they include repeats or low complexity regions spanning many kilobases. Long read lengths are valuable in other areas as well. In a single read, we can obtain sequence covering an entire viral segment, read through multi-kilobase amplicons with expanded repeats, and identify splice variants in long, full-length cDNA sequences. Examples of these applications will be shown. Introduction Applications of SMRT ® Sequencing Very Large Insert SMRTbell™ Library Prep Key steps in preparing very large insert libraries Extended collection times maximize read length or throughput Conclusion Recent Developments in SMRT ® Sequencing New XL polymerase extends read lengths, while maintaining high consensus accuracy Combination of new features yields long subreads, some beyond 10 kb: Sequencing full-length cDNA transcripts Recent improvements in SMRT ® Sequencing provide a wide range of options, including the capability to sequence over 10 kb fragments in a single read, enabling the sequencing community to answer biological questions at a level never before possible. Ideal sample 10kb→ 20kb→ 30kb→ 5kb→ Not ideal 10kb→ 20kb→ 30kb→ 5kb→ 10 11 12 13 14 ←10kb ←20kb ←30kb Samples: 10. K12 gDNA (dil.11/1/2012) 11. K12 shear, regular g-TUBE, 5500 rpm, 50 μL @ 100 ng/μL 12. K12 shear, regular g-TUBE, 5000 rpm, 50 μL @ 100 ng/μL 13. K12 shear, regular g-TUBE, 4500 rpm, 50 μL @ 100 ng/μL 14. K12 shear, regular g-TUBE, 4000 rpm, 50 μL @ 100 ng/μL Lane 11 = 18.8 kbp Lane 14 = 30.5 kbp Left, ideal sample, nearly all high molecular weight; right, sample has high molecular weight band, but shorter fragments will dominate loading and sequence data Start with high quality input DNA: pulsed-field gel QC Shearing to 10-20 kb: Covaris ® g-TUBE ® devices Results from varying spin speed with g-TUBE fragmentation using the Eppendorf ® MiniSpin ® plus. The lower the speed, the larger the size, but also the more likely sample will remain in the upper reservoir and be lost or not sheared. Fragment size decrease post shearing due to handling during library prep; gentle handling helps but does not eliminate this issue Converting to SMRTbell™ libraries: large DNA fragments are fragile Samples: 1. Input E. coli K12 gDNA 2. Sheared E. coli K12 gDNA 3. E. coli K12 SMRTbell Library 1 2 3 Shear = 22.1 kbp Library = 16.1 kbp Stage start for longer subread lengths Cell Prep Station Start Coverage Stage Start Coverage Sequencing the 9,749 bp HIV genome Left, cell prep station start excludes first and last 1000 bases. Right, stage start increases coverage range nearly to ends of genome. Along with XL polymerase and 120 minute movies, the entire genome can be covered in a single read. Workflow for full-length cDNA sequencing Detection of novel splice forms of a cyclin-dependent kinase polyA+ RNA PCR Optimization SMARTer PCR cDNA Agarose Size Selection: <1kb, 1-2kb, 2-3kb, >3kb Large Scale PCR SuperScript Full Length cDNA SMRTbell™ Template Preparation Total RNA PacBio’s draft cDNA sequencing protocol is now available as a Shared Protocol on SampleNet: http://www.smrtcommunity.com/Share/Protocol/List Chicken transcript library: full-pass subreads correspond with full-length reference sequences >10 kb read joins 17 contigs Example from Gbase genome assembly project Very long inserts can join regions of long repeats, greatly improving problematic assemblies. For more information on assembly methods, see poster P0998, Towards Finished Genome Assemblies using SMRT ® Sequencing . PacBio variants confirmed by PCR-Sanger Diffusion Loading MagBead Loading Magnetic bead loading for more efficient sample utilization, removal of small fragments with large insert libraries Problematic sample with many small fragments <1 kb Fragments <1 kb are excluded with MagBead loading Sequence through 12-base homopolymer Many reads span entire multi-kb transcripts Sequencing through >2000 bases of pure CGG repeats Collaboration with UC Davis: Expanded CCG-Repeat Alleles of the Fragile X Gene Loomis et al. (2012) Sequencing the unsequenceable: Expanded CGG-repeat alleles of the fragile X gene. Genome Research, accepted for publication. Regions of long, difficult sequence context are covered in single reads High consensus accuracy of >Q50 obtainable with PacBio sequencing Clone Reference PacBio Sanger BAC 1 T G G T -- -- T A A G T T C T T C T T G C C C G G BAC 2 C T T C G G A G G G A A T C C T C C C T T BAC 3 T -- -- T -- -- T C C T C C G T T A C C BAC 4 A G G T C C A T T T -- -- T G G 22 indels and 4 SNPs in human BAC confirmed by PCR-Sanger 400 Mb rice genome, CSHL, 17 kb library Polymerase C2 C2 C2 XL Loading Diffusion MBS MBS MBS Input into DNA Repair 5 μg (minimum) 5 μg 1 μg (minimum) 1 μg (minimum) 15% recovery 750 ng 750 ng 150 ng 150 ng Primer Annealing 5 nM 5 nM 0.8333 nM 0.8333 nM Polymerase Binding 3 nM 3 nM 0.5 nM 0.5 nM Loading (on cell) 150 pM 150 pM 10 pM 5.5 pM Total # SMRT Cells 52 (with reuse) 184 (no reuse) 36 (no reuse) 68 (no reuse) >10 kb library prep recommendations XL polymerase, C2 sequencing chemistry 1 x 120 minute collection time Stage start MagBead loading 2 x 55 min movies 11 kb plasmidbell 1 x 120 min movie Average: 4,200 bp 95 th Percentile: 9,500 bp Max: 13,000 bp 2 kb lambda library 120 minute movies maximize number of 10-20 kb reads 2 x 55 minute movies maximize total number of total reads and Mb / sample Average: 4,500 bp 95 th Percentile: 12,000 bp Max: 21,000 bp PacBio Reads Subread lengths, plant and microbial libraries Template input reduced, number of SMRT Cells increased with MagBead loading and XL polymerase Acknowledgements The authors would like to thank Jonathan Bingham, Kathryn Keho, Wendy Wise, Jenny Gu, and the many contributors in the PacBio community, including CSHL, UC Davis, and U Washington. 50% of sequence from subreads >4800 bases Fraction of sequence from subreads >_x_ Read Length Single-Pass Accuracy Single-Pass Accuracy XL/ C2 C2 /C2 High consensus accuracy due to randomness of errors in individual reads # of subreads per SMRT Cell Consensus Accuracy 10kb libraries

Poster PAG2013 GreaterThan10kbReads

Embed Size (px)

DESCRIPTION

Greater than 10 kb Read Lengths Routine whenSequencing with Pacific Biosciences’ XL ReleaseCheryl Heiner, Primo Baybayan, Susana Wang, Yan Guo, Meredith Ashby, Joan Wilson, KevinTravers, Jason Chin, and Jason UnderwoodPacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025

Citation preview

Figure 2: Example of a figure caption Pacific Biosciences, PacBio, SMRT, SMRTbell and the Pacific Biosciences logo are trademarks of Pacific Biosciences of California, Inc.All other trademarks are the property of their respective owners. 2013 Pacific Biosciences of California, Inc.All rights reserved. Greater than 10 kb Read Lengths Routine when Sequencing with Pacific Biosciences XL ReleaseCheryl Heiner, Primo Baybayan, Susana Wang, Yan Guo, Meredith Ashby, Joan Wilson, Kevin Travers, Jason Chin, and Jason Underwood Pacific Biosciences, 1380 Willow Road, Menlo Park, CA94025 PacBios SMRT Sequencing produces the longest read lengths of any sequencing technology currently available.There have been a number of recent improvements to further extend the length of PacBio RS reads. With an exponential read length distribution, there are many reads greater than 10 kb, and some reads at or beyond 20 kb.These improvements include library prep methods for generating >10 kb libraries, a new XL polymerase, magnetic bead loading, stage start, new XL sequencing kits, and increasing data collection time to 120 minutes per SMRT Cell.Each of these features will be described, with data illustrating the associated gains in performance. With these developments, we are able to obtain greatly improved and, in some cases, completed assemblies for genomes that have been considered impossible to assemble in the past, because they include repeats or low complexity regions spanning many kilobases.Long read lengths are valuable in other areas as well.In a single read, we can obtain sequence covering an entire viral segment, read through multi-kilobase amplicons with expanded repeats, and identify splice variants in long, full-length cDNA sequences.Examples of these applications will be shown. IntroductionApplications of SMRT Sequencing Very Large Insert SMRTbell Library Prep Key steps in preparing very large insert libraries Extended collection times maximize read length or throughput Conclusion Recent Developments in SMRT Sequencing New XL polymerase extends read lengths, while maintaining high consensus accuracy Combination of new features yields long subreads, some beyond 10 kb: Sequencing full-length cDNA transcripts Recent improvements in SMRT Sequencing provide a wide range of options, including the capability to sequence over 10 kb fragments in a single read, enabling the sequencing community to answer biological questions at a level never before possible. Ideal sample 10kb20kb30kb5kbNot ideal 10kb20kb30kb5kb1011121314 10kb20kb30kbSamples: 10. K12 gDNA (dil.11/1/2012) 11. K12 shear, regular g-TUBE, 5500 rpm, 50 L @100 ng/L 12. K12 shear, regular g-TUBE, 5000 rpm, 50 L @100 ng/L 13. K12 shear, regular g-TUBE, 4500 rpm, 50 L @100 ng/L 14. K12 shear, regular g-TUBE, 4000 rpm, 50 L @100 ng/LLane 11 = 18.8 kbpLane 14 = 30.5 kbp Left, ideal sample, nearly all high molecular weight; right, sample has high molecular weight band, but shorter fragments will dominate loading and sequence data Start with high quality input DNA: pulsed-field gel QC Shearing to 10-20 kb: Covaris g-TUBE devices Results from varying spin speed with g-TUBE fragmentation using the Eppendorf MiniSpin plus.The lower the speed, the larger the size, but also the more likely sample will remain in the upper reservoir and be lost or not sheared. Fragment size decrease post shearing due to handling during library prep; gentle handling helps but does not eliminate this issue Converting to SMRTbell libraries: large DNA fragments are fragile Samples: 1. Input E. coli K12 gDNA 2. Sheared E. coli K12 gDNA 3. E. coli K12 SMRTbell Library123Shear = 22.1 kbp Library = 16.1 kbp Stage start for longer subread lengths Cell Prep Station StartCoverageStage Start Coverage Sequencing the 9,749 bp HIV genome Left, cell prep station start excludes first and last 1000 bases.Right, stage start increases coverage range nearly to ends of genome.Along with XL polymerase and 120 minute movies, the entire genome can be covered in a single read. Workflow for full-length cDNA sequencing Detection of novel splice forms of a cyclin-dependent kinase polyA+ RNAPCROptimizationSMARTerPCR cDNAAgarose SizeSelection: 3kb Large ScalePCRSuperScript FullLength cDNASMRTbellTemplatePreparationTotal RNA PacBios draft cDNA sequencing protocol is nowavailable as a Shared Protocol on SampleNet: http://www.smrtcommunity.com/Share/Protocol/List

Chicken transcript library: full-pass subreads correspond with full-length reference sequences>10 kb read joins 17 contigsExample from Gbase genome assembly project Very long inserts can join regions of long repeats, greatly improving problematic assemblies. For more information on assembly methods, see poster P0998, Towards Finished Genome Assemblies using SMRT Sequencing . PacBio variants confirmed by PCR-Sanger Diffusion LoadingMagBead Loading Magnetic bead loading for more efficient sample utilization, removal of small fragments with large insert libraries Problematic sample with manysmall fragments Q50 obtainable with PacBio sequencing CloneReference PacBioSanger BAC 1 TGG T---- TA AGTT CTT CTT GCC CGG BAC 2 CTT CGG A GG GA ATCC TCC CTT BAC 3 T---- T---- TCC TCC GTT A CC BAC 4 A GG TCC A TT T---- TGG 22 indels and 4 SNPs in human BAC confirmed byPCR-Sanger400 Mb rice genome, CSHL, 17 kb libraryPolymeraseC2C2C2XL LoadingDiffusionMBSMBSMBS Input into DNA Repair5 g (minimum)5 g 1 g (minimum)1 g (minimum) 15% recovery 750 ng 750 ng 150 ng 150 ngPrimer Annealing 5 nM 5 nM 0.8333 nM 0.8333 nMPolymerase Binding 3 nM 3 nM 0.5 nM 0.5 nMLoading (on cell)150 pM 150 pM 10 pM 5.5 pMTotal # SMRT Cells52 (with reuse)184 (no reuse)36 (no reuse)68 (no reuse) >10 kb library prep recommendations XL polymerase, C2 sequencing chemistry 1 x 120 minute collection time Stage start MagBead loading 2 x 55 min movies 11 kb plasmidbell 1 x 120 min movie Average: 4,200 bp 95thPercentile: 9,500 bp Max: 13,000 bp 2 kb lambda library 120 minute movies maximize number of 10-20 kb reads 2 x 55 minute movies maximize total number of total reads and Mb / sample Average: 4,500 bp 95thPercentile: 12,000 bp Max: 21,000 bp PacBio Reads Subread lengths, plant and microbial libraries Template input reduced, number of SMRT Cells increased with MagBead loading and XL polymerase Acknowledgements The authors would like to thank Jonathan Bingham, Kathryn Keho, Wendy Wise, Jenny Gu, and the many contributors in the PacBio community, including CSHL, UC Davis, and U Washington. 50% of sequence from subreads >4800 bases Fr act i onof sequencef r om subr eads >_x_ Read Length Single-Pass Accuracy Single-Pass AccuracyXL/ C2 C2 /C2 High consensus accuracy due to randomness of errors in individual reads #ofsubr eads per SMRTCel l Consensus Accuracy 10kb libraries