Upload
kate-hertweck
View
187
Download
1
Tags:
Embed Size (px)
Citation preview
Transposable elements ofAgavoideae
Kate L Hertweck (@k8hert)The University of Texas at Tyler
Alexandros BousiosUniversity of Sussex
Michael McKainDonald Danforth Plant Science Center
en.wikipedia.org en.wikipedia.org
Why Agavoideae? (besides the obvious)
● Asparagaceae subfamily Agavoideae: 23 genera, 637 species● agave, yucca, Joshua Tree● Economically important:
● tequila, food starches● biofuels● ornamentals
● interesting morphological, ecological, life history traits● Recent diversification correlated with ecological traits
(Good-Avila, 2006)
gizmodo.com
Hertweck et al., TEs in Agavoideae
commons.wikimedia.org
Agavoideae genomics
● Emerging genomic/transcriptomic resources ● Polyploidy, bimodality (McKain et al., 2012)
● Variation in TEs (Bousios et al., 2007) and genome size (Zonneveld, 2003)
Darlington 1963
Hertweck et al., TEs in Agavoideae
Guadelupe et al., 2008
Transposable elements as a model system
● TEs, mobile genetic elements, or jumping genes● Parasitic, self-replicating, move independently in the genome● Many different types; some similar to or derived from viruses
Class I: Retrotransposons(copy and paste)
LTR (Gypsy, Copia/Sireviruses,Caulimoviruses)
LINESINE
Class II: DNA transposons(cut and paste)
TIR (EnSpm, hAT, MuDR,TcMar, PIF)
MITEHelitron
Hertweck et al., TEs in Agavoideae
● TE proliferation is associated with modifications across the genome,including changes to gene expression and genome size
● TE composition/abundance may interact with organismal changes, likehybridization, polyploidy, phenotype, life history
Mine existing genomic resources across Agavoideae to characterizerepetitive elements
Estimate abundance and diversity of transposable elements (TEs)
Cross validate results from different methods
The big questions:
Is transposon composition in Agavoideae genomes related tohypothesized patterns of genomic evolution?
Do transposon proliferation and other genomic traits correlate with lifehistory traits in Agavoideae?
Hertweck et al., TEs in Agavoideae
Our goals
Ap
hylla
nth
es
Lom
andr
a
Sa
nse
vie
ria
Asp
ara
gus
Lede
bou
ria
Dic
helo
stem
ma
Ag
apa
nth
is
Alli
um
Ha
wor
thia
Hos
ta
Sca
doxu
s
0%
10%
20%
30%
40%
50%
60%
70%
0
5000
10000
15000
20000
25000
Agavoideae includes substantial diversity (even by Asparagales standards)
Unknown contigs
Known repeats
Gen
ome
size
(M
b/1C
)
Per
cent
age
of s
eque
nce
read
s fr
om n
ucle
ar g
eno
me
Hertweck, 2013, Genome
● Genomes are difficult to assemble● Genome size varies
Repeat characterization methods
Genome survey sequences● most from MonAToL
project (Illumina SE, 30-100 bp)
● quality control of fastq fileswith PRINSEQ
● assembled with MaSuRCA v2.3.2 or RepARK v1.3.0
● organellar sequencesfiltered with BLAST
● 0.02-0.38x coverage● 12 taxa, only 8 with
sufficient contigs to analyze
Scripts available:github.com/k8hertweck/REpipe
Hertweck et al., TEs in Agavoideae
Nuclear contigs
● assembled contigs areconsensus of mostabundant TEs in thegenome
● TEs must exist in high copyto have sufficient reads fordetection (assembly)
● the older a TE insertion,the more likely it hasaccumulated mutationswhich will inhibit detection
● data presented aspercentage of TE type innuclear genome (relativeabundance)
en.wikipedia.org
Repeat characterization methods
Genome survey sequences
Scripts available:github.com/k8hertweck/REpipe
Hertweck et al., TEs in Agavoideae
Transcriptomes● various sources, tissues,
coverage, assemblymethods
● downloaded assemblies(no other filtering)
Nuclear contigs
● contigs represent activelytranscribed TEs, whichmay or may not relate toabundance in the genome
● even relatively rare TEsmay be detectable
● data presented aspercentage of transcripts(relative expresseddiversity)
en.wikipedia.org
Repeat characterization methods
Genome survey sequences
Scripts available:github.com/k8hertweck/REpipe
Hertweck et al., TEs in Agavoideae
TranscriptomesNuclear contigs
RepeatMasker● Liliopsida library (mostly
references from grasses)● searches many types of
TEs, including partswithout genes
● some ambiguous results(same contig, multipletypes of TE)
Domain searching● rpstblastn against protein
domain models (CDD)for TE-specific genes
● clustering with CD-HIT-EST
Repeat contigs
Unknown contigsread mapping
WikimediaCommons
Detectable repeats vary across species
Hertweck et al., TEs in Agavoideae
Repeat abundance● percentage of total reads● repeat annotations from
RepeatMasker● most reads map to unannotated
contigs (or remain unmapped)
Repeat diversity● percentage of nuclear contigs● annotations from RepeatMasker● most contigs are LTRs● transcriptomes represent broader
variation in diverse TEs (becauseof the overall number of contigs)
GSS transcriptome
Sampled taxa possess same diversity of DNA TE families, but at different abundance
Hertweck et al., TEs in Agavoideae
GSS data● percentage of nuclear genome● annotations from RepeatMasker● most taxa have a single family
present in high abundance● may reflect karyotype
Transcriptome data● percentage of contigs ● annotations from RepeatMasker● all families present (active?) in all
taxa● minor variation in family-level
diversity for some taxa● not incongruent with GSS data
Patterns of LTR abundance rely on annotation method
Hertweck et al., TEs in Agavoideae
● Gypsy more abundant inmost genomes, althoughproportions vary
● no relationship with LTRabundance and genomesize
● including CDD annotationscan double LTRabundance in somegenomes
● Proportion of Copia:Gypsyremains same for sometaxa (Schoenolirion), butchanges for others (Hosta)
● LTR diversity (numbers ofcontigs) shows similarpatterns
tetraploid, largest (known) genome in dataset
Hertweck et al., TEs in Agavoideae
Conclusions
● Mine existing genomic resources across Agavoideae to characterizerepetitive elements
● Methods matter; bias is not evenly distributed and patterns difficult todiscern
● Low proportion of GSS data assemble for Agavoideae● large numbers of ancestral (inactive) insertions, related to whole
genome duplication event?● low-level diversity in abundant TEs just different enough from available
libraries to remain undetectable● DNA transposon dominance may differ among clades● Gypsy more abundant in most genomes
Hertweck et al., TEs in Agavoideae
Future work
● Future work:● Improve annotations (build custom repeat libraries) and analyze TE
subtaxonomy ● improve quantification of repeats (P-clouds, RepeatExplorer)● validate results using multiple sequencing attempts/data types
● Big questions:● Is transposon composition in Agaviodeae genomes related to
hypothesized patterns of genomic evolution?● Do transposon proliferation and other genomic traits correlate with life
history traits in Agavoideae?
Acknowledgements
MonAToL
Texas Advanced Computing Center (TACC)
National Evolutionary Synthesis Center (NESCent, Duke U)
Research https://sites.google.com/site/k8hertweck
Blog:k8hert.blogspot.com
Twitter @k8hertGoogle+ [email protected]