1
Genomic Region Analysis of Microsatellite Loci in Arabidopsis thaliana Abstract Microsatellite loci frequently occur throughout a genome in various locations. A correlation between function and location of microsatellites is not completely understood. The completed genome of Arabidopsis thaliana allowed for exploration of the relationships between microsatellite classifications with the various regions of the genome. 203 sample sequences containing microsatellites from the A. thaliana genome were analyzed and classified by canonical motif, the length of the motif, and the location on the genome. After analysis, it was found that the most common motif length in exon regions was the trinucleotide repeat. Janae Gonzales, Teresa Leslie, Efren Miranda, Mirna Rocha NIH RISE Genome Discovery Workshop, New Mexico State University, Las Cruces, NM, 88003 Introduction Microsatellite loci, or simple sequence repeats (SSRs), consist of 1-6 base pair units and are abundantly interspersed throughout a genome. They are generally variable among individuals within a species, allowing for these sequences to be useful in genetic analysis within a population. The great variability found in microsatellites can be attributed to it’s increased mutation rate. A. thaliana microsatellite loci data has been shown to be capricious (Innan et al. 1997) and copious (Casacuberta et al. 2000). The entire genome of Arabidopsis thaliana has been sequenced and annotated allowing for undemanding genetic research. A. thaliana has become a model organism Methods *The DNA information from Arabidopsis thaliana was obtained from C.D. Bailey Collection #69, Cornell University. Results o Of the 37 dinucleotide repeats found in the A. Thaliana data, the majority were found in intergenic regions and untranslated 3’ regions (Figure 1). A smaller amount, 14%, were found in exons. o There were 131 trinucleotide repeats found in the A. Thaliana data. The majority, 64%, was found in exon regions of the genome (Figure 2). About 13% was found in intergenic regions. o Of the 33 samples of tetranucleotides, almost half were found in intergenic regions (Figure 3). Around 22% were found in exons. oThe most common type of repeat was a trinucleotide repeat in exons only. For example, this would be Acknowledgements The authors acknowledge the National Institute of Health for funding the Genomes Discovery Workshop. Thank you to Dr. Gong Xin Yu and Alexander Tchourbanov for sharing their knowledge in Genomics Analysis and Bioinformatics, and Dr. Donovan Bailey for providing our data source. We would especially like to thank Dr. Brook Milligan, Nabeeh Hasan, and Erin Punke for their advisement and support. Also, a huge thanks to our fellow interns who helped with problem solving, insight, and constructive criticism. This program was supported by NMSU RISE to Excellence (NIH NIGMS MBRS Grant #R25GM061222), MRI: Acquisition of Genomic Sequencing Instrumentation (NSF #0821806), and CREST: Center for Research Excellence in Bioinformatics and Computational References •Casacuberta, E., P. Puigdomenech and A. Monfort, 2000. Distribution of microsatellites in relation to coding sequences within the Arabidopsis thaliana genome. Plant Sci. 157: 97-104 •Eckert, K.A., A. Mowery and S.E. Hile, 2002 Misalignment- mediated DNA polymerase beta mutations: comparison of microsatellite and frame-shift error rates using a forward mutation assay. Biochemistry 41: 10490-10498 •Gertz, E. M. "BLAST Arabidopsis Thaliana Sequences." Http://www.ncbi.nlm.nih.gov/. NCBI, 27 June 2001. Web. 30 June 2010. <http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi ?taxid=3702>. •Innan, H., R. Terauchi and N.T. Miyashita, 1997. Microsatellite polymorphism in natural populations of the wild plant Arabidopsis thaliana. Genetics 146: 1441-1452. •Marriage, T. N., Hudman, S., Mort, M. E., Orive, M. E., Shaw, R. G., & Kelly, J. K. (2009). Direct estimation of the mutation rate at dinucleotide microsatellite loci in Arabidopsis thaliana (Brassicaceae). Heredity, 103(4), 310-317. doi: 10.1038/hdy.2009.67 •Sutherland, G. R., and Richards, R. I. 1995. Simple tandem DNA repeats and human genetic disease. Proc. Natl. Acad. Sci 92:3636-3641 •Symonds, V. V., & Lloyd, A. M. (2003). An analysis of Figure 1. Percentages of Dinucleotide Repeats in Different Regions of the A. Thaliana Genome Figure 3. Percentages of Tetranucleotide Repeats in Different Locations of the A. Thaliana Genome Figure 2. Percentages of Trinucleotide Repeats in Different Regions of the A. Thaliana Genome Figure 4. Frequency and Percentages of Nucleotide Repeats in exons. Discussion It is very common for microsatellite loci to increase or even decrease during DNA replication because of polymerase slippage (Eckert et al. 2002). Previous data has shown that dinucleotide microsatellites mutate at a higher rate than other microsatellites (Chakraborty et al. 1997). Only 14% of dinucleotide repeats were found to be in exon regions. This does not mean, however, that the dinucleotides are not affecting the genome. It has been found that dinucleotide repeats in untranslated 3’ regions can be associated with rheumatoid arthritis in humans (Martin- Donaire et al., 2007). In A. Thaliana, 24% of the dinucleotide repeats were found in untranslated 3’ regions, but their affect on function is unknown. While similar to size in sample number as the dinucleotide, tetranucleotides are more stable and do not generally mutate as readily. Trinucleotide motifs are the most common repeats in exons (Sutherland et al. 1995); this can be beneficial to an organism because while mutation may occur, it will not generally cause a frame-shift mutation. A frame-shift mutation can be detrimental to an organism because it alters the translation product. A trinucleotide repeat has a higher potential to cause positive variation in the genome.

Genomic Region Analysis of Microsatellite Loci in Arabidopsis thaliana Abstract Microsatellite loci frequently occur throughout a genome in various locations

  • View
    221

  • Download
    5

Embed Size (px)

Citation preview

Page 1: Genomic Region Analysis of Microsatellite Loci in Arabidopsis thaliana Abstract Microsatellite loci frequently occur throughout a genome in various locations

Genomic Region Analysis of Microsatellite Loci in Arabidopsis thaliana

Abstract

Microsatellite loci frequently occur throughout a genome in various locations. A correlation between function and location of microsatellites is not completely understood. The completed genome of Arabidopsis thaliana allowed for exploration of the relationships between microsatellite classifications with the various regions of the genome. 203 sample sequences containing microsatellites from the A. thaliana genome were analyzed and classified by canonical motif, the length of the motif, and the location on the genome. After analysis, it was found that the most common motif length in exon regions was the trinucleotide repeat.

Janae Gonzales, Teresa Leslie, Efren Miranda, Mirna Rocha

NIH RISE Genome Discovery Workshop, New Mexico State University, Las Cruces, NM, 88003

IntroductionMicrosatellite loci, or simple sequence repeats (SSRs), consist of 1-6 base pair units and are abundantly interspersed throughout a genome. They are generally variable among individuals within a species, allowing for these sequences to be useful in genetic analysis within a population. The great variability found in microsatellites can be attributed to it’s increased mutation rate. A. thaliana microsatellite loci data has been shown to be capricious (Innan et al. 1997) and copious (Casacuberta et al. 2000). The entire genome of Arabidopsis thaliana has been sequenced and annotated allowing for undemanding genetic research. A. thaliana has become a model organism in plant biology due to it is extensive genetic map and relatively small genome.

Methods

*The DNA information from Arabidopsis thaliana was obtained from C.D. Bailey Collection #69, Cornell University.

Resultso Of the 37 dinucleotide repeats found in the A. Thaliana data, the majority were found in intergenic regions and untranslated 3’ regions (Figure 1). A smaller amount, 14%, were found in exons. o There were 131 trinucleotide repeats found in the A. Thaliana data. The majority, 64%, was found in exon regions of the genome (Figure 2). About 13% was found in intergenic regions. o Of the 33 samples of tetranucleotides, almost half were found in intergenic regions (Figure 3). Around 22% were found in exons. oThe most common type of repeat was a trinucleotide repeat in exons only. For example, this would be an ATC repeat. On the other hand, the di-, tetra-, and pentanucleotide repeats all made up about 14% of the total nucleotide repeats in exons.

AcknowledgementsThe authors acknowledge the National Institute of Health for funding the Genomes Discovery Workshop. Thank you to Dr. Gong Xin Yu and Alexander Tchourbanov for sharing their knowledge in Genomics Analysis and Bioinformatics, and Dr. Donovan Bailey for providing our data source. We would especially like to thank Dr. Brook Milligan, Nabeeh Hasan, and Erin Punke for their advisement and support. Also, a huge thanks to our fellow interns who helped with problem solving, insight, and constructive criticism. This program was supported by NMSU RISE to Excellence (NIH NIGMS MBRS Grant #R25GM061222), MRI: Acquisition of Genomic Sequencing Instrumentation (NSF #0821806), and CREST: Center for Research Excellence in Bioinformatics and Computational Biology (NSF 0420407).

References•Casacuberta, E., P. Puigdomenech and A. Monfort, 2000. Distribution of microsatellites in relation to coding sequences within the Arabidopsis thaliana genome. Plant Sci. 157: 97-104•Eckert, K.A., A. Mowery and S.E. Hile, 2002 Misalignment-mediated DNA polymerase beta mutations: comparison of microsatellite and frame-shift error rates using a forward mutation assay. Biochemistry 41: 10490-10498 •Gertz, E. M. "BLAST Arabidopsis Thaliana Sequences." Http://www.ncbi.nlm.nih.gov/. NCBI, 27 June 2001. Web. 30 June 2010. <http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi?taxid=3702>.•Innan, H., R. Terauchi and N.T. Miyashita, 1997. Microsatellite polymorphism in natural populations of the wild plant Arabidopsis thaliana. Genetics 146: 1441-1452.•Marriage, T. N., Hudman, S., Mort, M. E., Orive, M. E., Shaw, R. G., & Kelly, J. K. (2009). Direct estimation of the mutation rate at dinucleotide microsatellite loci in Arabidopsis thaliana (Brassicaceae). Heredity, 103(4), 310-317. doi: 10.1038/hdy.2009.67•Sutherland, G. R., and Richards, R. I. 1995. Simple tandem DNA repeats and human genetic disease. Proc. Natl. Acad. Sci 92:3636-3641•Symonds, V. V., & Lloyd, A. M. (2003). An analysis of microsatellite loci in Arabidopsis thaliana: Mutational dynamics and application. Genetics, 165(3), 1475-1488.

Figure 1. Percentages of Dinucleotide Repeats in Different Regions of the A. Thaliana Genome

Figure 3. Percentages of Tetranucleotide Repeats in Different Locations of the A. Thaliana Genome

Figure 2. Percentages of Trinucleotide Repeats in Different Regions of the A. Thaliana Genome

Figure 4. Frequency and Percentages of Nucleotide Repeats in exons.

Discussion It is very common for microsatellite loci to increase or even decrease during DNA replication because of polymerase slippage (Eckert et al. 2002). Previous data has shown that dinucleotide microsatellites mutate at a higher rate than other microsatellites (Chakraborty et al. 1997). Only 14% of dinucleotide repeats were found to be in exon regions. This does not mean, however, that the dinucleotides are not affecting the genome. It has been found that dinucleotide repeats in untranslated 3’ regions can be associated with rheumatoid arthritis in humans (Martin-Donaire et al., 2007). In A. Thaliana, 24% of the dinucleotide repeats were found in untranslated 3’ regions, but their affect on function is unknown. While similar to size in sample number as the dinucleotide, tetranucleotides are more stable and do not generally mutate as readily.

Trinucleotide motifs are the most common repeats in exons (Sutherland et al. 1995); this can be beneficial to an organism because while mutation may occur, it will not generally cause a frame-shift mutation. A frame-shift mutation can be detrimental to an organism because it alters the translation product. A trinucleotide repeat has a higher potential to cause positive variation in the genome.