1
Methods Abstract IrysPrep kit extraction of long DNA molecules IrysPrep reagents label DNA at specific sequence motifs IrysChip linearizes DNA in NanoChannel arrays Irys automates imaging of single molecules in NanoChannel arrays Molecules and labels detected in images by instrument software IrysView software assembles optical maps (1) Long molecules of DNA are labeled with IrysPrep ® reagents by (2) incorporation of fluorophore labeled nucleotides at a specific sequence motif throughout the genome. (3) The labeled genomic DNA is then linearized in the IrysChip ® using NanoChannel arrays and single molecules are imaged by Irys. (4) Single molecule data are collected and detected automatically. (5) Molecules are labeled with a unique signature pattern that is uniquely identifiable and useful in assembly into genome maps. (6) Maps may be used in a variety of downstream analyses using IrysView ® software. 1 2 3 4 5 6 Blood Cell Tissue Microbes Free DNA Solution DNA in a Microchannel DNA in a Nanochannel Gaussian Coil Partially Elongated Linearized Free DNA Displaced Strand Polymerase Nick Site Nickase Recognition Motif Position (kb) ©2015 BioNano Genomics. All rights reserved. In Silico and Biological Validation of BioNano Mapping Based Insertion, Deletion and Translocation Detection J Wang, T Anantharaman, X Zhang, E T Lam, L Zhang, W Andrews, A W Pang, J Lee, M Saghbini, Z. Zhu, A Hastie, H Cao BioNano Genomics, San Diego, CA Comprehensive genome structural characterization requires whole genome de novo assembly of DNA fragments that span repeat regions in the genome. Next-generation mapping (NGM) using BioNano Genomics’ Irys® System is based on the direct visualization of extremely long genomic fragments and it offers a high-throughput and genome-wide method, to interrogate genome structural variations (SVs) in the range of sub kilobase to megabase pairs, thus providing the most complete and structurally intact assembly available. Here we characterize the quality of NGM assemblies for the use of detecting structural variations in human genomes. Because there is no ground-truth human genome sequences available, as all assemblies have errors, we have produced simulated molecules that faithfully reproduce characteristics of BioNano’s single molecule datasets. These simulated molecules can then be de novo assembled and aligned to a reference human genome to find structural differences. We have introduced structural variations into our simulated molecules in order to measure the sensitivity and accuracy of the method across different variant types and sizes. We report on insertions and deletions between 500 bp and 500 kbp. This analysis shows that sensitivity ranges from 77% and 69% at 1.5 kbp up to 94% and 93% at 5 kbp for insertions and deletions, respectively. Sensitivity remains above 80% up to 150 kbp for insertions and begins to drop as the SV size approaches molecule (read) lengths. Importantly, the insertions are still detected but not correctly classified since the contiguity breaks within the inserted sequence. We also report on translocation sensitivity and accuracy. Here we evaluated translocation breakpoint detection in a heterozygous scenario and show that sensitivity is 76% for all tested translocation breakpoints with most of the remaining translocations being truncated and classified as “end” SVs. For all SV types we show biological validation that agrees well with the simulated data. This method is being adopted for cytogenetically characterizing genomes of patients with inherited and spontaneous genetic disease such as developmental disorders and cancer. The high speed, sensitivity and accuracy when compared to sequencing based methods and orders of magnitude higher resolution when compared to traditional cytogenetics make Irys an attractive choice. Translocation Sensitivity and Positive Predictive Value – Simulated Molecules Biological Validation: Insertions/ Deletions Insertion and Deletion Sensitivity and Positive Predictive Value – Simulated Molecules SV size comparison between overlapping BioNano and PacBio SV calls shows good size concordance (CHM1 hyditiform mole compared to calls from Chaisson et al.). Genomes were assembled from simulated molecules containing heterozygous insertions and deletion (no homozygous SVs ). After assembly, genome maps were aligned to hg19 to find variation. Sensitivity to simulated insertions averages 89% between 3 and 150 kbp and 93% for deletions between 3 and 500 kbp. Positive predictive value (PPV=TP/(TP+FP)) averaged >99% for all sizes of deletions and insertions. Genomes were assembled from simulated molecules containing heterozygous translocations and transpositions. Overall sensitivity was 76% but when including only Translocations that are at least 200 kbp, as is usually the case in cancer genomes, sensitivity was up to 82%. Translocation detection performed better outside of segmental duplication but still has 60% sensitivity even within segmental duplications. Sex chromosomes contained more chimeric genome maps when compared to autosomes, presumably as a result of psuedo-autosomal regions. Biological Validation: Translocations In a blind study, we analyzed three leukemia samples where translocations were found using fluorescence in situ hybridization. By using NGM we were able to correctly identify all three translocations and map the breakpoints to ~5 kbp regions. Sample Diagnosis FISH finding BNG finding BN-01 CML t(9;22) t(9;22) BN-03 CLL t(11;14) t(11;14) BN-10 CML t(9;22) t(9;22) Overall Number of Simulated Translocations 1836 Sensitivity 76% Size of Translocated Segment 0-100 kbp 100-200 kbp >200 kbp Number of Simulated Translocations 122 264 1448 Sensitivity 19% 73% 82% Performance Within Segmental Duplications Segmental Duplication Non-Segmental Duplication Number of Simulated Translocations 174 1662 Sensitivity 60% 79% All Autosomes Translocation Breakpoints Detected 1445 1276 Number of False Calls 20 10 Conclusions De novo detection of insertions, inversion and translocations is inecient and inconsistent by commercially available technologies. Next-generation mapping (NGM) using the Irys System, is ecient, cost eective, sensitive and specific for structural variation analysis. Sensitivities for Insertions and Deletions are within 80-95% across the entire genome above 1.5 kbp in simulation experiments. We further validated our insertion and deletion calls in a biological sample and those in the range between 1-20 kbp were validated by Pacific Biosciences sequencing. Translocation detection is very specific and sensitive in both in silico and biological validation experiments. Because of it’s speed and accuracy, NGM is being adopted by researchers for the characterization of clinical genomes. References 1) Lam, E.T., et al. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nature Biotechnology (2012); 10: 2303 2) Cao, H., et al., Rapid detection of structural variation in a human genome using nanochannel- based genome mapping technology. Gigascience (2014); 3(1):34 3) Chaisson et.al. (2015) Resolving the complexity of the human genome using single molecule sequencing. Nature. 517, 608-611 4). McCarroll, S.A., et al., Donor-recipient mismatch for common gene deletion polymorphisms in graft-versus-host disease. Nat Genet. (2009) 2009 Dec;41(12):1341-4. 5) Menard, V., et al., Copy-number variations (CNVs) of the human sex steroid metabolizing genes UGT2B17 and UGT2B28 and their associations with a UGT2B15 functional polymorphism. Hum. Mutat. (2009) 30: 1310-1319. **We would like to thank Jonathan Diver of Genoptix, Carlsbad, CA, collaborative efforts in experimental design and interpretation.

In Silico and Biological Validation of BioNano Mapping ... · entire genome above 1.5 kbp in simulation experiments. We further validated our insertion and deletion calls in a

Embed Size (px)

Citation preview

Methods

Abstract

IrysPrep kit extraction of long DNA molecules

IrysPrep reagents label DNA at specific sequence motifs

IrysChip linearizes DNA in NanoChannel arrays

Irys automates imaging of single molecules in

NanoChannel arrays

Molecules and labels detected in images by instrument

software

IrysView software assembles optical maps

(1) Long molecules of DNA are labeled with IrysPrep® reagents by (2) incorporation of fluorophore labeled nucleotides at a specific sequence motif throughout the genome. (3) The labeled genomic DNA is then linearized in theIrysChip® using NanoChannel arrays and single molecules are imaged by Irys. (4) Single molecule data are collected and detected automatically. (5) Molecules are labeled with a unique signature pattern that is uniquely identifiableand useful in assembly into genome maps. (6) Maps may be used in a variety of downstream analyses using IrysView® software.

1 2 3 4 5 6

Blood Cell Tissue Microbes

Free DNA Solution DNA in a Microchannel DNA in a Nanochannel

Gaussian Coil Partially Elongated Linearized

Free DNA Displaced Strand

Polymerase Nick Site Nickase Recognition

Motif

Position (kb)

©20

15 B

ioN

ano

Gen

omic

s. A

ll rig

hts

rese

rved

.

In Silico and Biological Validation of BioNano Mapping Based Insertion, Deletion and

Translocation Detection J Wang, T Anantharaman, X Zhang, E T Lam, L Zhang, W Andrews, A W Pang, J Lee, M Saghbini, Z. Zhu, A Hastie, H Cao BioNano Genomics, San Diego, CA

Comprehensive genome structural characterization requires whole genome de novo assembly of DNA fragments that span repeat regions in the genome. Next-generation mapping (NGM) using BioNano Genomics’ Irys® System is based on the direct visualization of extremely long genomic fragments and it offers a high-throughput and genome-wide method, to interrogate genome structural variations (SVs) in the range of sub kilobase to megabase pairs, thus providing the most complete and structurally intact assembly available.

Here we characterize the quality of NGM assemblies for the use of detecting structural variations in human genomes. Because there is no ground-truth human genome sequences available, as all assemblies have errors, we have produced simulated molecules that faithfully reproduce characteristics of BioNano’s single molecule datasets. These simulated molecules can then be de novo assembled and aligned to a reference human genome to find structural differences. We have introduced structural variations into our simulated molecules in order to measure the sensitivity and accuracy of the method across different variant types and sizes.

We report on insertions and deletions between 500 bp and 500 kbp. This analysis shows that sensitivity ranges from 77% and 69% at 1.5 kbp up to 94% and 93% at 5 kbp for insertions and deletions, respectively. Sensitivity remains above 80% up to 150 kbp for insertions and begins to drop as the SV size approaches molecule (read) lengths. Importantly, the insertions are still detected but not correctly classified since the contiguity breaks within the inserted sequence. We also report on translocation sensitivity and accuracy. Here we evaluated translocation breakpoint detection in a heterozygous scenario and show that sensitivity is 76% for all tested translocation breakpoints with most of the remaining translocations being truncated and classified as “end” SVs. For all SV types we show biological validation that agrees well with the simulated data.

This method is being adopted for cytogenetically characterizing genomes of patients with inherited and spontaneous genetic disease such as developmental disorders and cancer. The high speed, sensitivity and accuracy when compared to sequencing based methods and orders of magnitude higher resolution when compared to traditional cytogenetics make Irys an attractive choice.

Translocation Sensitivity and Positive Predictive Value – Simulated Molecules

Biological Validation: Insertions/Deletions Insertion and Deletion Sensitivity and Positive

Predictive Value – Simulated Molecules

SV size comparison between overlapping BioNano and PacBio SV calls shows good size concordance (CHM1 hyditiform mole compared to calls from Chaisson et al.).

Genomes were assembled from simulated molecules containing heterozygous insertions and deletion (no homozygous SVs). After assembly, genome maps were aligned to hg19 to find variation. Sensitivity to simulated insertions averages 89% between 3 and 150 kbp and 93% for deletions between 3 and 500 kbp. Positive predictive value (PPV=TP/(TP+FP)) averaged >99% for all sizes of deletions and insertions.

Genomes were assembled from simulated molecules containing heterozygous translocations and transpositions. Overall sensitivity was 76% but when including only Translocations that are at least 200 kbp, as is usually the case in cancer genomes, sensitivity was up to 82%. Translocation detection performed better outside of segmental duplication but still has 60% sensitivity even within segmental duplications. Sex chromosomes contained more chimeric genome maps when compared to autosomes, presumably as a result of psuedo-autosomal regions.

Biological Validation: Translocations

In a blind study, we analyzed three leukemia samples where translocations were found using fluorescence in situ hybridization. By using NGM we were able to correctly identify all three translocations and map the breakpoints to ~5 kbp regions.

Sample Diagnosis FISH finding BNG finding

BN-01 CML t(9;22) t(9;22)

BN-03 CLL t(11;14) t(11;14)

BN-10 CML t(9;22) t(9;22)

Overall Number of Simulated Translocations 1836

Sensitivity 76%

Size of Translocated Segment 0-100 kbp 100-200 kbp >200 kbpNumber of Simulated Translocations 122 264 1448

Sensitivity 19% 73% 82%

Performance Within Segmental Duplications

Segmental Duplication

Non-Segmental Duplication

Number of Simulated Translocations 174 1662 Sensitivity 60% 79%

All Autosomes Translocation Breakpoints Detected 1445 1276

Number of False Calls 20 10

Conclusions De novo detection of insertions, inversion and translocations is inefficient and inconsistent by commercially available technologies. Next-generation mapping (NGM) using the Irys System, is efficient, cost effective, sensitive and specific for structural variation analysis. Sensitivities for Insertions and Deletions are within 80-95% across the entire genome above 1.5 kbp in simulation experiments. We further validated our insertion and deletion calls in a biological sample and those in the range between 1-20 kbp were validated by Pacific Biosciences sequencing. Translocation detection is very specific and sensitive in both in silico and biological validation experiments. Because of it’s speed and accuracy, NGM is being adopted by researchers for the characterization of clinical genomes.

References 1) Lam, E.T., et al. Genome mapping on nanochannel arrays for structural variation analysis andsequence assembly. Nature Biotechnology (2012); 10: 23032) Cao, H., et al., Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology. Gigascience (2014); 3(1):343) Chaisson et.al. (2015) Resolving the complexity of the human genome using single moleculesequencing. Nature. 517, 608-6114). McCarroll, S.A., et al., Donor-recipient mismatch for common gene deletion polymorphisms ingraft-versus-host disease. Nat Genet. (2009) 2009 Dec;41(12):1341-4.5) Menard, V., et al., Copy-number variations (CNVs) of the human sex steroid metabolizing genesUGT2B17 and UGT2B28 and their associations with a UGT2B15 functional polymorphism. Hum.Mutat. (2009) 30: 1310-1319.**We would like to thank Jonathan Diver of Genoptix, Carlsbad, CA, collaborative efforts in experimental design and interpretation.

.