1
For Research Use Only. Not for use in diagnostic procedures. © Copyright 2020 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. Pacific Biosciences does not sell a kit for carrying out the overall No-Amp Targeted Sequencing method. Use of the No-Amp method may require rights to third-party owned intellectual property. FEMTO Pulse and Fragment Analyzer are trademarks of Agilent Technologies Inc. All other trademarks are the sole property of their respective owners. Resolving Highly Diverse HLA and CYP2D6 Alleles using HiFi Sequencing for Long-Range Amplicon Data with a New Clustering Algorithm Abstract #: B14 John Harting, Lei Zhu, Ian McLaughlin, Zev Kronenberg PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025 HLA and CYP2D6 loci are highly diverse genes important to pharmacogenetics and immunology. Resolving and phasing individual alleles without imputation requires long and highly accurate reads. We demonstrate and benchmark the accuracy of PacBio HiFi reads and the pbaa clustering algorithm for resolving these important loci. Introduction Methods HLA - 8 Coriell samples HG001,HG002,HG003,HG004,HG005,HG007, 06986-3,C1-218 - 6 Loci HLA-A/ -B / -C / -DPB1 / -DQB1 / -DRB1 - GenDx NGSgo-MX6-1 kit - Replicate samples barcoded and pooled at 96 plex - HiFi reads analyzed by pbaa - Typing results validated with NGSEngine CYP2D6 - 22 Coriell samples (see table 3) - 3 Amplicon primer design (see ASHG Poster #3540) - Barcoded and pooled at 22 plex - HiFi reads analyzed by pbaa and pbCYP2D6typer - Typing results validated against GeT RM pharmacogenetics panel Figure 1. CYP2D6 Primer Design. Three amplicon design captures duplicates and deletion alleles in one assay. HLA Pbaa results are highly accurate at the recommended 100-fold coverage per locus. Results Table 1. HLA Accuracy Titration. Pbaa results compared against the truth set. Shaded rows are presence/absence benchmark statistics. True positive (TP) is a pbaa consensus sequence that has a best match with an allele in the truth set. False negatives (FN) are alleles filtered by pbaa or missing completely from pbaa result set. False positives (FP) are additional clusters generated for expected truth alleles. Figure 2. Pbaa Workflow and Visualization. (A) Clustering workflow. HiFi reads are assigned to guides and errors are masked within groups. Corrected reads are clustered and consensuses are generated. Post process filters separate pass/fail clusters. (B) Clustered and painted aligned HiFi reads in IGV. (C) Corrected HiFi read graph, colors match alignments with passing clusters in image B. CYP2D6 Pbaa results are highly accurate at the recommended minimum 100-fold coverage. Table 2. CYP2D6 Accuracy Titration. Pbaa results compared against truth set. Table 3. HiFi CYP2D6 *-Allele Calls. Published calls compared to calls generated from long read HiFi amplicons. Calls in red are improved with respect to published results. Improved calls: - NA09301 Duplication resolved - NA17217 Missed variant in reference - NA17232 Phased variants improve call - Multiple Hybrid alleles (*36) identified Results Discussion - The long read lengths and high accuracy of PacBio HiFi reads allow for unprecedented precision when sequencing complex and diverse loci such as HLA and CYP2D6. - The new pbaa algorithm for HiFi reads generates highly accurate consensus sequences as benchmarked against 6 HLA loci. - Typing CYP2D6 samples with structural variants (SV), pseudogenes, and pooled amplicons can be problematic with current assays. - Long highly accurate reads map uniquely for consistent type calls - Pbaa can resolve CYP2D6 alleles into easily-typed, fully phased consensus sequences. Pbaa: https://github.com/PacificBiosciences/pbAA CYP2D6 typing: https://github.com/PacificBiosciences/apps- scripts/tree/master/CYP2D6tools HLA Benchmark Data: https://downloads.pacbcloud.com/public/dataset/pbAmpliconAnalysis_ HLA/ GenDx kit: https://www.gendx.com/product_line/ngsgo-mx6-1/ NGSEngine: https://www.gendx.com/product_line/ngsengine/ GeT RM: GeT_RM_pharmacogenetics Conclusion References The pbaa clustering algorithm was developed as a HiFi successor to the previous PacBio long amplicon analysis tool for clustering long targeted reads. The application of PacBio HiFi reads and pbaa to the HLA and CYP2D6 targets provides a demonstration of the utility of these tools. - Comprehensive variant detection, including SV - Robust read clustering with fully phased results - Uniquely map-able to gene or pseudogene - Highly accurate *-allele calls

Resolving Highly Diverse HLA and CYP2D6 Alleles using HiFi

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

For Research Use Only. Not for use in diagnostic procedures. © Copyright 2020 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. Pacific Biosciences does not sell a kit for carrying out the overall No-Amp Targeted Sequencing method. Use of the No-Amp method may require rights to third-party owned intellectual property. FEMTO Pulse and Fragment Analyzer are trademarks of Agilent Technologies Inc. All other trademarks are the sole property of their respective owners.

Resolving Highly Diverse HLA and CYP2D6 Alleles using HiFi Sequencing for Long-Range Amplicon Data with a New Clustering Algorithm Abstract #: B14John Harting, Lei Zhu, Ian McLaughlin, Zev KronenbergPacBio, 1305 O’Brien Drive, Menlo Park, CA 94025

HLA and CYP2D6 loci are highly diverse genes important to pharmacogenetics and immunology.Resolving and phasing individual alleles without imputation requires long and highly accurate reads. We demonstrate and benchmark the accuracy of PacBio HiFi reads and the pbaa clustering algorithm for resolving these important loci.

Introduction

Methods

HLA- 8 Coriell samples

HG001,HG002,HG003,HG004,HG005,HG007,06986-3,C1-218

- 6 Loci HLA-A/ -B / -C / -DPB1 / -DQB1 / -DRB1

- GenDx NGSgo-MX6-1 kit- Replicate samples barcoded and pooled at 96 plex- HiFi reads analyzed by pbaa- Typing results validated with NGSEngine

CYP2D6- 22 Coriell samples (see table 3)- 3 Amplicon primer design (see ASHG Poster #3540)- Barcoded and pooled at 22 plex- HiFi reads analyzed by pbaa and pbCYP2D6typer- Typing results validated against GeT RM

pharmacogenetics panel

Figure 1. CYP2D6 Primer Design. Three amplicon design captures duplicates and deletion alleles in one assay.

HLAPbaa results are highly accurate at the recommended 100-fold coverage per locus.

Results

Table 1. HLA Accuracy Titration. Pbaa results compared against the truth set. Shaded rows are presence/absence benchmark statistics. True positive (TP) is a pbaa consensus sequence that has a best match with an allele in the truth set. False negatives (FN) are alleles filtered by pbaa or missing completely from pbaa result set. False positives (FP) are additional clusters generated for expected truth alleles.

Figure 2. Pbaa Workflow and Visualization. (A) Clustering workflow. HiFi reads are assigned to guides and errors are masked within groups. Corrected reads are clustered and consensuses are generated. Post process filters separate pass/fail clusters. (B) Clustered and painted aligned HiFi reads in IGV. (C) Corrected HiFi read graph, colors match alignments with passing clusters in image B.

CYP2D6Pbaa results are highly accurate at the recommended minimum 100-fold coverage.

Table 2. CYP2D6 Accuracy Titration. Pbaa results compared against truth set.

Table 3. HiFi CYP2D6 *-Allele Calls. Published calls compared to calls generated from long read HiFi amplicons. Calls in red are improved with respect to published results.

Improved calls:- NA09301 Duplication resolved- NA17217 Missed variant in reference- NA17232 Phased variants improve call- Multiple Hybrid alleles (*36) identified

Results Discussion

- The long read lengths and high accuracy of PacBio HiFi reads allow for unprecedented precision when sequencing complex and diverse loci such as HLA and CYP2D6.- The new pbaa algorithm for HiFi reads generates highly accurate consensus sequences as benchmarked against 6 HLA loci.- Typing CYP2D6 samples with structural variants (SV), pseudogenes, and pooled amplicons can be problematic with current assays.- Long highly accurate reads map uniquely for consistent type calls- Pbaa can resolve CYP2D6 alleles into easily-typed, fully phased consensus sequences.

Pbaa: https://github.com/PacificBiosciences/pbAACYP2D6 typing: https://github.com/PacificBiosciences/apps-scripts/tree/master/CYP2D6toolsHLA Benchmark Data: https://downloads.pacbcloud.com/public/dataset/pbAmpliconAnalysis_HLA/GenDx kit: https://www.gendx.com/product_line/ngsgo-mx6-1/NGSEngine: https://www.gendx.com/product_line/ngsengine/GeT RM: GeT_RM_pharmacogenetics

Conclusion

References

The pbaa clustering algorithm was developed as a HiFi successor to the previous PacBio long amplicon analysis tool for clustering long targeted reads. The application of PacBio HiFi reads and pbaa to the HLA and CYP2D6 targets provides a demonstration of the utility of these tools.

- Comprehensive variant detection, including SV - Robust read clustering with fully phased results- Uniquely map-able to gene or pseudogene- Highly accurate *-allele calls