1
NEXTflex™ qRNA-Seq™ Molecular Indexing for ChIP-Seq and RNA-Seq Jan Risinger, Masoud Toloue Ph.D. * Bioo Scientific Corp, 7050 Burleson Road Austin, TX 78744 USA * [email protected] Abstract ERCC Spike-In RNAs Used as Proof of Concept Experiment Library Prep Workflow and Advantage of Molecular Indexing Most modern methods for NGS library prep require the use of enzyme processing, such as DNA polymerase reactions, which can introduce errors in the form of incorrect sequence and misrepresented copy number. Conventional RNA sequencing library construction involves the ligation of a population of cDNA molecules with adapters prior to amplification and sequencing. An inherent weakness of conventional RNA-Seq analysis is that cDNA fragments that amplify more efficiently will unavoidably result in a higher number of reads than cDNAs that do not amplify as well during the library construction PCR step. Therefore, when multiple reads mapping to the same transcript are encountered, it is not possible to determine whether sequenced reads originate from the same or different cDNA molecules. With Molecular Indexed™ libraries, each molecule is tagged with a molecular index randomly chosen from ~10,000 combinations so that any two identical molecules become distinguishable (with odds of 10,000/1), and can be independently evaluated in later data analysis. Analysis using molecular indexing information provides an absolute, digital measurement of gene expression levels, irrespective of common amplification distortions observed in many RNA-Seq experiments. This type of indexing requires no additional steps in RNA-Seq workflow and increases the precision of downstream analysis. At low sequencing depths, analysis using the molecular indices is identical to conventional analysis and generates equivalent RPKM values in all applications. As sequencing depth increases, individual molecular resolution also increases. In quantitative RNA-Seq experiments, the molecular indices distinguish re-sampling of the same molecule from sampling of a different molecule. At high sequencing depths, each molecule can be distinguished and the entire library can be analyzed to provide absolute numbers of each molecule. Resolving individual clones of molecules is critical for increasing sequencing accuracy or when identifying mutations in complex sample types. NEXTflex™ qRNA-Seq™ on Late Flowering Arabidopsis Mutant Counting Individual Molecules and Detecting Clonal Duplicates For a random stochastic labeling (STL) process, the number of label pairs chosen (k) can be calculated from the number of cDNA molecules labeled (n) using the equation k = m(1- e –(n/m) ). Plot k vs. n for a set of 9,216 total labels (m). NEXTflex™ qRNA-Seq™ generates libraries equivalent to standard RNA-Seq libraries with sample barcodes, but with the added feature of molecular indexing. To evaluate accurate measure of RNA abundance, determined RPKM values for a set of ERCC spike-in RNAs. ERCC set consists of 92 poly-adenylated artificial transcripts spanning a 10 6 fold concentration added to total RNA prior to mRNA purification. At low sequencing depth, application of molecular labels generate equivalent RPKM values as regular RNA-Seq. At increased sequencing depth, molecular labels distinguish individual molecules, providing more accurate measurement of RNA abundance. RPKM correlate well with spike-in concentrations however sampling errors increase at very low RNA concentrations. Each molecule tagged with molecular indices (MI) is randomly chosen from ~10K combinations At low sequencing depth, application of MIs generate RPKM values equivalent to regular RNA-Seq At increased sequencing depth, MIs distinguish individual molecules Resolving individual clones of molecules is especially useful for increasing sequencing accuracy or identifying mutations in complex mixes cDNA libraries prepared from small input samples requiring greater PCR amplification are more prone to duplicate reads and re- sampling biases Using the NEXTflex™ qRNA-Seq™ Kit, developed in conjunction with Cellular Research Inc.™, single transcripts of low expression mRNA molecules are detectable Genotyping of the nf-yc triple mutant reveals absolute gene expression of each NF-YC gene in WT and nf-yc triple mutant plants The late flowering phenotype of the nf-yc triple mutant is evident and supported by flowering time quantification The master regulator of photoperiodic flowering pathway, FLOWERING LOCUS T (FT) is down-regulated in the nf-yc triple mutant relative to FT expression in WT plants Down-regulation of FT in nf-yc triple mutant delays floral initiation by limiting expression of meristematic identity targets, LEAFY (LFY) and APETALA 1 (AP1) nf-yc3, nf-yc4, nf-yc9 triple mutant shows knockdown of all NF-YC3, 4, 9 isoforms except NF-YC9.1 WT (Col-0) nf-yc triple mutant nf-yc triple mutant is late flowering grown under long day conditions Under long day conditions. nf-yc triple mutant produces more leaves at time of bolting than WT plants nf-yc triple mutant shows down- regulation of photoperiodic flowering time targets downstream of CONSTANS. Kumimoto et al., 2009 AP1 CO LFY FT Circadian Clock Simplified Photoperiodic Flowering Pathway: Gene Length nSTLReads nUniqSTL +Start/Stop nUniStar t/Stop nUniqST L nMolecules RPKM (raw) LHCB6 1,183 26,326 25,978 12,725 6,850 12,531 1,011 LHCB1.1 1,176 48,937 48,144 19,545 7,994 18,620 1,890 LHCB1.3 1,044 136,297 134,094 35,402 8,942 32,399 5,930 LHCB1.5 1,045 91,021 89,526 27,703 8,699 26,548 3,956 RBCS3B 984 40,889 40,178 8,941 7,710 16,695 1,887 Counting individual molecules at increased sequencing depth and detecting clonal duplicates at specific loci Duplication Rate is defined as the number of reads divided by the number of unique STLs. The median is 1.25. The increasing duplication rate on the right-end indicates an exhaustion of STL labels. Number of reads (Y) vs. Number of unique STLs (X). Unique molecular labels begin to be exhausted at ~90,000 reads. Molecular Indexing is Compatible with ChIP DNA Samples At increased sequencing depth, molecular labels distinguish individual molecules, providing more accurate measurement of RNA abundance of highly expressed photosynthesis related genes Combining start/stop sites with the added feature of molecular indexing can distinguish clonal duplicates from unique reads Two DNA libraries prepared from 100 pg of ChIP DNA and corresponding input DNA as starting material. The NEXTflex™ qRNA-Seq™ Kit manual was followed starting with End-Repair through PCR. Using molecular labels enables higher numbers of PCR cycles to be used to make libraries from as little as 100 pg of ChIP DNA. Distinguishing PCR duplicates from unique ChIP reads is required to draw accurate, quantitative results from ChIP-Seq experiments. (Input and ChIP DNA samples provided by ABRF/NARG whom do not currently endorse this product). Conclusions & Applications

NEXTflex™ qRNA - ePosters · 2014-10-16 · NEXTflex™ qRNA-Seq™ Molecular Indexing for ChIP-Seq and RNA-Seq Jan Risinger, Masoud Toloue Ph.D. * Bioo Scientific Corp, 7050 Burleson

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NEXTflex™ qRNA - ePosters · 2014-10-16 · NEXTflex™ qRNA-Seq™ Molecular Indexing for ChIP-Seq and RNA-Seq Jan Risinger, Masoud Toloue Ph.D. * Bioo Scientific Corp, 7050 Burleson

NEXTflex™ qRNA-Seq™ Molecular Indexing for ChIP-Seq and RNA-Seq

Jan Risinger, Masoud Toloue Ph.D. * Bioo Scientific Corp, 7050 Burleson Road

Austin, TX 78744 USA *[email protected]

Abstract

ERCC Spike-In RNAs Used as Proof of Concept Experiment

Library Prep Workflow and Advantage of Molecular Indexing

Most modern methods for NGS library prep require the use of enzyme processing, such as DNA polymerase reactions, which can introduce errors in the form of incorrect sequence and misrepresented copy number. Conventional RNA sequencing library construction involves the ligation of a population of cDNA molecules with adapters prior to amplification and sequencing. An inherent weakness of conventional RNA-Seq analysis is that cDNA fragments that amplify more efficiently will unavoidably result in a higher number of reads than cDNAs that do not amplify as well during the library construction PCR step. Therefore, when multiple reads mapping to the same transcript are encountered, it is not possible to determine whether sequenced reads originate from the same or different cDNA molecules. With Molecular Indexed™ libraries, each molecule is tagged with a molecular index randomly chosen from ~10,000 combinations so that any two identical molecules become distinguishable (with odds of 10,000/1), and can be independently evaluated in later data analysis. Analysis using molecular indexing information provides an absolute, digital measurement of gene expression levels, irrespective of common amplification distortions observed in many RNA-Seq experiments. This type of indexing requires no additional steps in RNA-Seq workflow and increases the precision of downstream analysis. At low sequencing depths, analysis using the molecular indices is identical to conventional analysis and generates equivalent RPKM values in all applications. As sequencing depth increases, individual molecular resolution also increases. In quantitative RNA-Seq experiments, the molecular indices distinguish re-sampling of the same molecule from sampling of a different molecule. At high sequencing depths, each molecule can be distinguished and the entire library can be analyzed to provide absolute numbers of each molecule. Resolving individual clones of molecules is critical for increasing sequencing accuracy or when identifying mutations in complex sample types.

NEXTflex™ qRNA-Seq™ on Late Flowering Arabidopsis Mutant

Counting Individual Molecules and Detecting Clonal Duplicates

For a random stochastic labeling (STL) process, the number of label pairs chosen (k) can be calculated from the number of cDNA molecules labeled (n) using the equation k = m(1- e –(n/m) ). Plot k vs. n for a set of 9,216 total labels (m).

NEXTflex™ qRNA-Seq™ generates libraries equivalent to standard RNA-Seq libraries with sample barcodes, but with

the added feature of molecular indexing.

To evaluate accurate measure of RNA abundance, determined RPKM values for a set of ERCC spike-in RNAs. ERCC set consists of 92 poly-adenylated artificial transcripts spanning a 106 fold concentration added to total RNA prior to mRNA purification.

At low sequencing depth, application of molecular labels generate equivalent RPKM values as regular RNA-Seq. At increased sequencing depth, molecular labels distinguish

individual molecules, providing more accurate measurement of RNA abundance.

RPKM correlate well with spike-in concentrations however sampling errors increase at very low RNA concentrations.

Each molecule tagged with molecular indices (MI) is randomly chosen from ~10K combinations At low sequencing depth, application of MIs generate RPKM values equivalent to regular RNA-Seq At increased sequencing depth, MIs distinguish individual molecules Resolving individual clones of molecules is especially useful for increasing sequencing accuracy or identifying mutations in complex mixes cDNA libraries prepared from small input samples requiring greater PCR amplification are more prone to duplicate reads and re-sampling biases

Using the NEXTflex™ qRNA-Seq™ Kit, developed in conjunction with Cellular Research Inc.™, single transcripts of low expression mRNA molecules are detectable Genotyping of the nf-yc triple mutant reveals absolute gene expression of each NF-YC gene in WT and nf-yc triple mutant plants The late flowering phenotype of the nf-yc triple mutant is evident and supported by flowering time quantification The master regulator of photoperiodic flowering pathway, FLOWERING LOCUS T (FT) is down-regulated in the nf-yc triple mutant relative to FT expression in WT plants Down-regulation of FT in nf-yc triple mutant delays floral initiation by limiting expression of meristematic identity targets, LEAFY (LFY) and APETALA 1 (AP1)

nf-yc3, nf-yc4, nf-yc9 triple mutant shows knockdown of all NF-YC3, 4, 9 isoforms

except NF-YC9.1

WT (Col-0) nf-yc triple mutant

nf-yc triple mutant is late flowering grown under long

day conditions

Under long day conditions. nf-yc triple mutant produces more leaves

at time of bolting than WT plants

nf-yc triple mutant shows down-regulation of photoperiodic

flowering time targets downstream of CONSTANS. Kumimoto et al.,

2009

AP1 CO LFY FT Circadian Clock

Simplified Photoperiodic Flowering Pathway:

Gene Length

nSTLReads nUniqSTL

+Start/Stop

nUniStar

t/Stop

nUniqST

L

nMolecules RPKM (raw)

LHCB6 1,183 26,326 25,978 12,725 6,850 12,531 1,011

LHCB1.1 1,176 48,937 48,144 19,545 7,994 18,620 1,890

LHCB1.3 1,044 136,297 134,094 35,402 8,942 32,399 5,930

LHCB1.5 1,045 91,021 89,526 27,703 8,699 26,548 3,956

RBCS3B 984 40,889 40,178 8,941 7,710 16,695 1,887

Counting individual molecules at increased sequencing depth and detecting clonal duplicates at specific loci

Duplication Rate is defined as the number of reads divided by the number of unique STLs. The median is 1.25. The increasing duplication rate on the right-end indicates an exhaustion of STL labels.

Number of reads (Y) vs. Number of unique STLs (X). Unique molecular labels begin to be exhausted at ~90,000 reads.

Molecular Indexing is Compatible with ChIP DNA Samples

At increased sequencing depth, molecular labels distinguish individual molecules, providing more accurate

measurement of RNA abundance of highly expressed photosynthesis related genes

Combining start/stop sites with the added feature of

molecular indexing can distinguish clonal duplicates from unique reads

Two DNA libraries prepared from 100 pg of ChIP DNA and corresponding input DNA as starting material.

The NEXTflex™ qRNA-Seq™ Kit manual was followed starting with End-Repair through PCR.

Using molecular labels enables higher numbers of

PCR cycles to be used to make libraries from as little as 100 pg of ChIP DNA.

Distinguishing PCR duplicates from unique ChIP reads

is required to draw accurate, quantitative results from ChIP-Seq experiments.

(Input and ChIP DNA samples provided by ABRF/NARG whom do not

currently endorse this product).

Conclusions & Applications