1
Anjali A. Pradhan, Q. Doan, J. Sorenson, R. Fang, K. Poulter, M. Rydland, C. Forbes, S. Vijaychander, R. Nutter, S. Nidtha, P.Baybayan , Applied Biosystems, Foster City, CA-94404 Figure 7 shows the Genotype Report, designed specifically for Resequencing Applications. Resequencing coverage in this case for the complete mitochondrial genome is reported in a separate table. Variation across specimens can be studied in the genotype table as shown in Figure 7. Each report can be exported into a tab-delimited text file for easy integration with downstream workflows. As shown in Figure 4 a consensus is created for each DNA Specimen. Figure 5 shows an example of the behavior of the consensus calling algorithm. This algorithm examines the quality of each trace for background noise, the orientation of the traces and forms an accurate consensus call which is particularly useful in heterozygote base calling. Sequencing anomalies such as PCR noise or unincorporated dye terminators present in one strand causing a base calling error can be corrected at the consensus level as shown in Figure 5. This implies less manual editing as an accurate consensus is determined for each DNA sample being investigated. Mitochondrial DNA Data Analysis Using SeqScape® Software v2.1.1 / Poster # 39 ABSTRACT Sequence variations in mitochondrial DNA have been used in human evolution, disease association studies and forensic research. We present here an integrated solution to resequencing of mitochondrial DNA using Applied Biosytems VariantSEQr™ Resequencing System and SeqScape® Software v2.1.1. This system solution provides researchers the convenience of accurate variant identification with simplified data analysis and reduces the time required for many laborious and repetitive steps. Applied Biosystems SeqScape® software v2.1.1 was designed specifically to address the needs of analyzing and identifying sequence variations accurately and reliably, which are critical requirements for all resequencing applications. Analysis is demonstrated on four Coriell DNA Samples for the mitochondrial genome using SeqScape® Software v2.1.1. Results are displayed in comprehensive reports, which contain mapping of each variant in the context of the entire mitochondrial genome and detection of different types of mutations. SeqScape® Software v2.1.1 results in accurate variant identification, less manual data manipulation and requires shorter turnaround time for the complete analysis of the mitochondrial resequencing project. INTRODUCTION Most cells contain between 500 and 1000 copies of mt DNA molecule which makes it easier to extract than nuclear DNA especially in cases where nuclear DNA is degraded or is limited in quantity. Current strategies in resequencing (for example for mtDNA) do have their limitations in terms of technology with respect to throughput, standardization of protocols and lack of automation in data analysis. Throughput can be increased for mtDNA resequencing with the advent of MitoChips using microrray technology however there are concerns for sub-optimal hybridization in GC-rich regions, which results in the D-Loop in the mitochondrial genome region not being covered (Anirban Maitra et.al 2004). Homebrewed methods can be cumbersome as multiple PCR amplifications may be necessary to ensure that the primers do not amplify the noise or contamination present in degraded DNA. Data analysis can be done using BLAST, ClustalW, Mito Analyzer or SeqScape® software products where the study sample is compared to the Anderson or Cambridge Reference sequence (CRS). Both BLAST and ClustalW will align the study sequence to the reference sequence and identify all nucleotide variants, however; both software products do not provide the amino acid effect of the nucleotide variants. BLAST can process single samples while ClustalW can process multiple samples. Mito Analyzer software is a web-based tool from National Institute of Standardized Technology (NIST) which provides amino acid and protein effects after scientists already identify all variant nucleotides for each sample (using BLAST/ClustalW). The software allow scientists to process one variant in one sample at a time. To use the above tools, multiple tools and many steps are needed to obtain both nucleotide variants and the corresponding amino acid effects. SeqScape® Software from Applied Biosystems provides in one application all of the functionality other software packages do not; process multiple study samples—identifying both nucleotide variants and the corresponding amino acid effects in a single analysis process. When using the VariantSEQr™ System for Mitochondrial DNA, scientists do not need to spend time annotating this information and can use the prepackaged analysis template configured for SeqScape Software. MATERIALS AND METHODS For the data presented here, the complete Mitochondrial Resequencing Set was used to cover the entire 16 KB mt genomic reference from National Center of Biotechnology Information (NCBI). The VariantSEQr™ Resequencing System for the Mitochondrial genome provides: (1) Validated PCR primer sets (RSAs) tailed with M13 forward or Reverse sequencing primers, (2) A single protocol for PCR, sequencing and data generation (3) Analysis Project Template for use with SeqScape Software v2.1.1. The data presented here was sequenced using BigDye Terminator 3.1, collected on Applied Biosystems 3730xl DNA Analyzer and analyzed with SeqScape Software 2.1.1. CONCLUSIONS As demonstrated here for mitochondrial genome resequencing, the VariantSEQr™application is fully optimized starting with ready to use resequencing primers to data analysis with pre- configured project templates using SeqScape® Software v2.1.1. The entire process of resequencing is streamlined making it an easy and cost-effective method. With automation capabilities in Data Collection Software v2.0 and SeqScape® Software using the 3730 DNA Analyzer, a high throughput resequencing system is available. The software used for data analysis provides powerful features for mutation detection such as mapping of each mutation in the context of genomic annotations and capability to easily differentiate between sequencing errors and true heterozygotes. Integrated reports such as the Mutations and Genotype Reports allow easy export of information into other downstream workflows. With each of these enhanced capabilities in data analysis SeqScape® Software offers a competitive edge over other sequencing analysis tools especially for resequencing. REFERENCES 1. MITOMAP: A Human Mitochondrial Genome Database. <KWWSZZZPLWRPDSRUJ! 2. National Institute of Standards and Technology (NIST) (<www.ts.nist.gov>) MitoAnalyzer, National Institute of Standards and Technology, Gaithersburg, MD, USA. 3. Anirban Maitra, Yoram Cohen et.al, The Human Chip: A High Throughput Sequencing Microarray for Mitochondrial Mutation Detection. Genome Research, 2004 4. Jeronimo, C., Nomoto, S., Caballero, O.L et al, Mitochondrial Mutations in Early Stage Prostrate Cancer and Bodily fluids . Oncogen 20:5195-5198. Note: The VariantSEQr™ Mitochondrial Resequencing system will be made commercially available by the end of the year, 2004. The Resequencing sets could then be ordered online from our website. Users will have the flexibility to purchase either the complete Mitochondrial Genome resequencing set, or a resequencing set covering only the Hypervariable Regions (HVRI and HVRII). TRADEMARKS/LICENSING Notice to Purchaser: License Disclaimer Purchase of this software product alone does not imply any license under any process, instrument or other apparatus, system, composition, reagent or kit rights under patent claims owned or otherwise controlled by Applera Corporation, either expressly, or by estoppel. SeqScape Software has not undergone specific developmental validation for human identification applications. Human identification laboratories which choose to use SeqScape Software for data analysis should perform their own developmental validation studies. Applied Biosystems and SeqScape are registered trademarks and AB(Design), VariantSEQr and Applera are trademarks of Applera Corporation or its subsidiaries in the U.S. and/or certain other countries. © 2004 Applied Biosystems. All Rights Reserved Figure 7. Genotype Info. Report Data Reviewing can be done as shown in Figure 4 in the Project View, for the four DNA specimens for the complete Mitochondrial reference sequence. The top pane shows pink bars indicating variant distribution for each of the specimens. Data viewing options such as Tab jumps across variants, Character/Dots View (showing bases only for the variants) allow users to easily find variants in large sets of data. Figure 5. Consensus Accuracy Figure 1. Mito. Genome Map (Reference Mito Map 2002) The complete Mitochondrial Resequencing Set for VariantSEQr™system will contain a project template with the reference sequence information. This sequence used for comparative data analysis of the mitochondrial genome is taken from the NCBI Database (Ref_Seq File NC_001807). A graphical representation of the complete mitochondrial genome approx. 16KB is shown in Figure 1. It codes for 22 transfer RNAs, 2 ribosomal RNAs and 13 peptides with no introns. The only non- coding region of any size is called the control region (CR) or D-Loop in vertebrates which spans on either side of the origin and contains the hypervariable regions HV1 and HVII. Figure 2, shows the results of directly importing the Ref_Seq File in SeqScape® Software. As shown below a region of interest (ROI) is created for each of the features in the genome. This information will be populated in the VariantSEQr™ system project template. VARIANTSEQr™ SYSTEM DATA ANALYSIS USING SEQSCAPE® SOFTWARE v2.1.1 Step 1 in data analysis is to import the project template provided with the VariantSEQr™ System shipment in to SeqScape Software. This template contains the reference information, the analysis settings and the display settings for data processing and viewing. A new project for the study is created using the imported project template (for the complete mitochondrial genome) and data is added from the sequencing instrument. Sequence files from each Coriell DNA are grouped together to form a Specimen in SeqScape® Software. Step 3 involves a simple click on the analysis button which in a single step performs base calling, trimming, assembly, consensi alignment and reference comparison. Figure 3. Data Analysis Process RESULTS There are various reports in SeqScape® Software to display quality of data, nucleotide and amino acid mutations, genotypes across specimens and sequence confirmation results. Figure 6 shows the mutations for each of the DNA specimens for the ATP6 gene in the mitochondrial genome. Specimens NA00893 and NA10924 have a mutation at base 175 in the ATP6_gene. This substitution change is predicted to have a missense effect on the amino acid sequence. If the mutations are known they can be classified as known variants and this will also be reported in the table. VARIANTSEQr™REFERENCE SET UP IN THE PROJECT TEMPLATE Step 2 Add sequence files Step 1 Import SeqScape Software Project Template Step 3 Analyze the data in a Single Step Step 4 Review data- Use Quality Values and Analysis QC Report Step 5 Review Results in Reports- Mutation and Genotype Reports Figure 2. Results of Importing Ref_Seq NC_001807 for mtDNA Figure 4. View all data in the Project View Figure 6. Report Showing Mutations

Poster # 39tools.thermofisher.com/content/sfs/posters/cms_040570.pdf · Mitochondrial DNA Data Analysis Using SeqScape® Software v2.1.1 / Poster # 39 ABSTRACT Sequence variations

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Poster # 39tools.thermofisher.com/content/sfs/posters/cms_040570.pdf · Mitochondrial DNA Data Analysis Using SeqScape® Software v2.1.1 / Poster # 39 ABSTRACT Sequence variations

Anjali A. Pradhan, Q. Doan, J. Sorenson, R. Fang, K. Poulter, M. Rydland, C. Forbes, S. Vijaychander, R. Nutter, S. Nidtha, P.Baybayan , Applied Biosystems, Foster City, CA-94404

Figure 7 shows the Genotype Report, designed specifically for Resequencing Applications. Resequencing coverage in this case for the complete mitochondrial genome is reported in a separate table.

Variation across specimens can be studied in the genotype table as shown in Figure 7.Each report can be exported into a tab-delimited text file for easy integration with downstream workflows.

As shown in Figure 4 a consensus is created for each DNA Specimen. Figure 5 shows an example of the behavior of the consensus calling algorithm. This algorithm examines the quality of each trace for background noise, the orientation of the traces and forms an accurate consensus call which is particularly useful in heterozygote base calling. Sequencing anomalies such as PCR noise or unincorporated dye terminators present in one strand causing a base calling error can be corrected at the consensus level as shown in Figure 5. This implies less manual editing as an accurate consensus is determined for each DNA sample being investigated.

Mitochondrial DNA Data Analysis Using SeqScape® Software v2.1.1 / Poster # 39

ABSTRACT Sequence variations in mitochondrial DNA have been used in human evolution, disease association studies and forensic research. We present here an integrated solution to resequencing of mitochondrial DNA using Applied Biosytems VariantSEQr™ Resequencing System and SeqScape® Software v2.1.1. This system solution provides researchers the convenience of accurate variant identification with simplified data analysis and reduces the time required for many laborious and repetitive steps. Applied Biosystems SeqScape®software v2.1.1 was designed specifically to address the needs of analyzing and identifying sequence variations accurately and reliably, which are critical requirements for all resequencing applications. Analysis is demonstrated on four Coriell DNA Samples for the mitochondrial genome using SeqScape® Software v2.1.1. Results are displayed in comprehensive reports, which contain mapping of each variant in the context of the entire mitochondrial genome and detection of different types of mutations. SeqScape® Software v2.1.1 results in accurate variant identification, less manual data manipulation and requires shorter turnaround time for the complete analysis of the mitochondrial resequencing project.

INTRODUCTION

Most cells contain between 500 and 1000 copies of mt DNA molecule which makes it easier to extract than nuclear DNA especially in cases where nuclear DNA is degraded or is limited in quantity. Current strategies in resequencing (for example for mtDNA) do have their limitations in terms of technology with respect to throughput, standardization of protocols and lack of automation in data analysis. Throughput can be increased for mtDNA resequencing with the advent of MitoChips using microrray technology however there are concerns for sub-optimal hybridization in GC-rich regions, which results in the D-Loop in the mitochondrial genome region not being covered (Anirban Maitra et.al 2004). Homebrewed methods can be cumbersome as multiple PCR amplifications may be necessary to ensure that the primers do not amplify the noise or contamination present in degraded DNA.

Data analysis can be done using BLAST, ClustalW, Mito Analyzer or SeqScape® software products where the study sample is compared to the Anderson or Cambridge Reference sequence (CRS). Both BLAST and ClustalW will align the study sequence to the reference sequence and identify all nucleotide variants, however; both software products do not provide the amino acid effect of the nucleotide variants. BLAST can process single samples while ClustalW can process multiple samples. Mito Analyzer software is a web-based tool from National Institute of Standardized Technology (NIST) which provides amino acid and protein effects after scientists already identify all variant nucleotides for each sample (using BLAST/ClustalW). The software allow scientists to process one variant in one sample at a time. To use the above tools, multiple tools and many steps are needed to obtain both nucleotide variants and the corresponding amino acid effects.

SeqScape® Software from Applied Biosystems provides in one application all of the functionality other software packages do not; process multiple study samples—identifying both nucleotide variants and the corresponding amino acid effects in a single analysis process. When using the VariantSEQr™ System for Mitochondrial DNA, scientists do not need to spend time annotating this information and can use the prepackaged analysis template configured for SeqScape Software.

MATERIALS AND METHODS

For the data presented here, the complete Mitochondrial Resequencing Set was used to cover the entire 16 KB mt genomic reference from National Center of Biotechnology Information (NCBI). The VariantSEQr™ Resequencing System for the Mitochondrial genome provides: (1)Validated PCR primer sets (RSAs) tailed with M13 forward or Reverse sequencing primers, (2) A single protocol for PCR, sequencing and data generation (3) Analysis Project Template for use with SeqScape Software v2.1.1. The data presented here was sequenced using BigDye Terminator 3.1, collected on Applied Biosystems 3730xl DNA Analyzer and analyzed with SeqScape Software 2.1.1.

CONCLUSIONS

As demonstrated here for mitochondrial genome resequencing, the VariantSEQr™applicationis fully optimized starting with ready to use resequencing primers to data analysis with pre-configured project templates using SeqScape® Software v2.1.1. The entire process ofresequencing is streamlined making it an easy and cost-effective method. With automationcapabilities in Data Collection Software v2.0 and SeqScape® Software using the 3730 DNAAnalyzer, a high throughput resequencing system is available. The software used for dataanalysis provides powerful features for mutation detection such as mapping of each mutation inthe context of genomic annotations and capability to easily differentiate between sequencingerrors and true heterozygotes. Integrated reports such as the Mutations and Genotype Reportsallow easy export of information into other downstream workflows. With each of theseenhanced capabilities in data analysis SeqScape® Software offers a competitive edge overother sequencing analysis tools especially for resequencing.

REFERENCES 1. MITOMAP: A Human Mitochondrial Genome Database. <������������������ �������

2. National Institute of Standards and Technology (NIST) (<www.ts.nist.gov>)MitoAnalyzer, National Institute of Standards and Technology, Gaithersburg, MD, USA.

3. Anirban Maitra, Yoram Cohen et.al, The Human Chip: A High Throughput SequencingMicroarray for Mitochondrial Mutation Detection. Genome Research, 2004

4. Jeronimo, C., Nomoto, S., Caballero, O.L et al, Mitochondrial Mutations in Early StageProstrate Cancer and Bodily fluids . Oncogen 20:5195-5198.

Note: The VariantSEQr™ Mitochondrial Resequencing system will be made commercially available by the end of the year, 2004. The Resequencing sets could then be ordered online from ourwebsite. Users will have the flexibility to purchase either the complete Mitochondrial Genomeresequencing set, or a resequencing set covering only the Hypervariable Regions (HVRI andHVRII).

TRADEMARKS/LICENSING Notice to Purchaser: License DisclaimerPurchase of this software product alone does not imply any license under any process, instrument or otherapparatus, system, composition, reagent or kit rights under patent claims owned or otherwise controlled byApplera Corporation, either expressly, or by estoppel.SeqScape Software has not undergone specific developmental validation for human identificationapplications. Human identification laboratories which choose to use SeqScape Software for data analysisshould perform their own developmental validation studies.Applied Biosystems and SeqScape are registered trademarks and AB(Design), VariantSEQr and Applera aretrademarks of Applera Corporation or its subsidiaries in the U.S. and/or certain other countries. © 2004 Applied Biosystems. All Rights Reserved

Figure 7. Genotype Info. Report

Data Reviewing can be done as shown in Figure 4 in the Project View, for the four DNA specimens for the complete Mitochondrial reference sequence. The top pane shows pink bars indicating variant distribution for each of the specimens. Data viewing options such as Tab jumps across variants, Character/Dots View (showing bases only for the variants) allow users to easily find variants in large sets of data.

Figure 5. Consensus Accuracy

Figure 1. Mito. Genome Map (Reference Mito Map 2002)

The complete Mitochondrial Resequencing Set for VariantSEQr™system will contain a project template with the reference sequence information. This sequence used for comparative data analysis of the mitochondrial genome is taken from the NCBI Database (Ref_Seq File NC_001807). A graphical representation of the complete mitochondrial genome approx. 16KB is shown in Figure 1. It codes for 22 transfer RNAs, 2 ribosomal RNAs and 13 peptides with no introns. The only non-coding region of any size is called the control region (CR) or D-Loop in vertebrates which spans on either side of the origin and contains the hypervariable regions HV1 and HVII. Figure 2, shows the results of directly importing the Ref_Seq File in SeqScape® Software. As shown below a region of interest (ROI) is created for each of the features in the genome. This information will be populated in the VariantSEQr™ system project template.

VARIANTSEQr™ SYSTEM DATA ANALYSIS USING SEQSCAPE® SOFTWARE v2.1.1

Step 1 in data analysis is to import the project template provided with the VariantSEQr™System shipment in to SeqScape Software. This template contains the reference information, the analysis settings and the display settings for data processing and viewing. A new project for the study is created using the imported project template (for the complete mitochondrial genome) and data is added from the sequencing instrument. Sequence files from each Coriell DNA are grouped together to form a Specimen in SeqScape® Software.Step 3 involves a simple click on the analysis button which in a single step performs base calling, trimming, assembly, consensi alignment and reference comparison.

Figure 3. Data Analysis Process

RESULTS

There are various reports in SeqScape® Software to display quality of data, nucleotide and amino acid mutations, genotypes across specimens and sequence confirmation results. Figure 6 shows the mutations for each of the DNA specimens for the ATP6 gene in the mitochondrial genome. Specimens NA00893 and NA10924 have a mutation at base 175 in the ATP6_gene. This substitution change is predicted to have a missense effect on the amino acid sequence. If the mutations are known they can be classified as known variants and this will also be reported in the table.

VARIANTSEQr™REFERENCE SET UP IN THE PROJECT TEMPLATE

Step 2

Add sequence

files

Step 1Import

SeqScape Software Project

Template

Step 3

Analyze the data in a

Single Step

Step 4 Review data-Use Quality Values and

Analysis QC Report

Step 5Review Results

in Reports-Mutation and

Genotype Reports

Figure 2. Results of Importing Ref_Seq NC_001807 for mtDNA

Figure 4. View all data in the Project View

Figure 6. Report Showing Mutations