Upload
surya-saha
View
466
Download
1
Embed Size (px)
DESCRIPTION
The availability of high-throughput next generation sequencing technologies presents an opportunity for in-silico discovery of endosymbionts. We describe a method for mining a whole genome shotgun metagenome to identify members of the endosymbiont community followed by reconstruction and validation of a high-quality draft microbial genome. The Asian citrus psyllid (D. citri Kuwayama or ACP) is host to 7+ bacterial endosymbionts and is the insect vector of Ca. liberibacter asiaticus, causal agent of citrus greening, a disease that has cost the Florida citrus industry $3.63 billion since 2006. DNA from D. citri was sequenced to 108X coverage to produce paired-end and mate-pair Illumina libraries. The sequences were mined for wolbachia (wACP) reads using 4 sequenced Wolbachia genomes as bait. Putative wACP reads were then assembled using Velvet and MIRA3 assemblers. The resulting wACP contigs were annotated using the RAST and compared to the closest sequenced wolbachia from an insect genome, Wolbachia endosymbiont of Culex quinquefasciatus (wPip). MIRA3 was able to reconstruct a majority of the wPip CDS regions and was therefore, selected for scaffolding using large insert mate-pair libraries. The wACP scaffolds were further improved using wPip as reference genome to orient and order the contigs. In order to determine the presence of the core Wolbachia proteins in our wACP scaffold, we compared them to core Wolbachia proteins identified by OrthoMCL. 1164/1213 wACP proteins had matches of which 669 were to core proteins. This number compares favorably to the number of core proteins (670) found in sequenced Wolbachias.
Citation preview
Mining Eukaryotic Meta-Genomes for Endosymbionts
Exploring the Asian citrus psyllid microbiome
Surya Saha 1, Wayne B. Hunter 2 and Magdalen Lindeberg 1
1 Department of Plant Pathology and Plant-Microbe Biology, Cornell
University, Ithaca, NY, USA, 2 USDA-ARS, US Horticultural Research Lab, Fort Peirce, FL, USA
GLBio 2013, Pittsburgh
You are free to: Copy, share, adapt, or re-mix;
Photograph, film, or broadcast;
Blog, live-blog, or post video of;
This presentation. Provided that: You attribute the work to its author and respect the rights and licenses associated with its components.
Slide Concept by Cameron Neylon: http://cameronneylon.net/blog/some-slides-for-granting-permissions-or-not-in-presentations/
Slides : http://www.slideshare.net/suryasaha/surya-saha-glbio2013
Twitter: @SahaSurya
GLBio 2013, Pittsburgh
Acknowledgements
Funding
Magdalen Lindeberg Cornell University
• J. Kent Morgan, USDA-ARS • Mizuri Marutani-Hert, USDA-ARS • Hong Huang, University of South Florida
Wayne Hunter USDA-ARS
Justin Reese Genformatic, LLC.
GLBio 2013, Pittsburgh
Figure 1. Workflow used to identify regions of reference genomes illuminated by
sequence reads from the Diaphorina citri metagenome data set.
HLB found in Florida 2005
Worldwide distribution of Citrus Greening/Huanglongbing
1st scientific report of HLB in
China – 1919
GLBio 2013, Pittsburgh
What is the impact of this disease?
Citrus – a 9.3 billion dollar industry in Florida and 1.2 billion dollars in California
• Leave agricultural practices unchanged - significant loss of production • Begin expensive , labor intensive controls - increase cost of production
healthy infected Increased cost of citrus products for consumers
Between 2006 and 2011 greening had cost Florida $4.5 billion in lost economic output, and 8,257 jobs.
University of Florida report, 2013 http://research.ufl.edu/publications/exploremagazine/spring-2013/citrus-greening.html
GLBio 2013, Pittsburgh
How it gets around:
Ca. Liberibacter asiaticus is vectored by
Asian Citrus Psyllid
Insect transmission greatly enhances potential for rapid long distance spread
Psyllid feeds on many members of the citrus family
Once infected, each plant becomes a disease reservoir from
which uninfected psyllids can pick up bacteria
Ca. Liberibacter asiaticus in citrus phloem cells
GLBio 2013, Pittsburgh
Liberibacter
psyllid citrus
Big picture for molecular interactions
Could lead to identification of metabolically complementary bacteria for co-culturing with Liberibacter
Unculturable
GLBio 2013, Pittsburgh
Why endosymbionts?
Significant impact on diverse host processes Nutritional status Reproduction Lifespan Resistance to insecticides
Means to explore population dynamics and relatedness to other isolates
Known examples of highly-evolved host-symbiont relationship
Buchnera and Aphids
Psyllid microbiome diversity 10 bacteria reported in ACP (Florida) 5 additional bacteria reported in
closely related potato psyllid (B. cockerelli)
Buchnera-derived essential amino acids supporting the growth of pea aphid larvae. [ Gündüz and Douglas 2009]
GLBio 2013, Pittsburgh
Candidate endosymbionts
Description Length (bp) Coverage
Length
(bp)
Klebsiella variicola At-22 chromosome, complete
genome 5,458,505 1.144% 307,111
Salmonella enterica subsp. enterica serovar Typhi str.
Ty2 chromosome, complete genome 4,791,961 1.266% 464,747
Staphylococcus epidermidis ATCC 12228
chromosome, complete genome 2,499,279 3.077% 60,415
Acidovorax avenae subsp. avenae ATCC 19860
chromosome, complete genome 5,482,170 0.473% 71,218
Acinetobacter sp. DR1 chromosome,complete
genome 4,152,543 2.763% 102,446
Herbaspirillum seropedicae SmR1 chromosome,
complete genome 5,513,887 0.949% 23,509
Wolbachia endosymbiont of Culex quinquefasciatus
Pel, complete genome 1,482,455 86.787% 1,140,899
Methylibium petroleiphilum PM1 chromosome,
complete genome 4,044,195 0.318% 20,107
Why Wolbachia?
Known obligate association with up to 16% of insects species
Positive correlation of titer with Ca. liberibacer asiaticus
Associated with Cytoplasmic Incompatibility in hosts
Used as biocontrol in dengue fever and malaria
GLBio 2013, Pittsburgh
Read capture using baits
Wolbachia endosymbiont of Drosophila melanogaster Wolbachia endosymbiont of Culex quinquefasciatus Pel (wPip) Wolbachia sp. wRi Wolbachia endosymbiont strain TRS of Brugia malayi
Dataset Total (bp) 100% 90% 85%
Paired-end 33,708,218,600 127,335,000 228,893,000 237,906,000
Mate-pair 2k 7,020,642,800 11,200 45,200 63,000
Mate-pair 5k 6,000,789,600 10,000 43,000 66,800
Mate-pair 10k 7,307,423,600 2,895,000 4,427,600 4,909,000
Results of mining putative wDi reads from the ACP metagenome
GLBio 2013, Pittsburgh
Assembler Contigs N50 Contigs >
1k
Length of Contigs >
40k
VelvetOptimizer (n50, 83 mer)
111 79935 53 807171
VelvetOptimizer (n50/tbp, 95 mer)
118 84957 25 744357
MIRA3 167 23068 105 247976
Annotation
0 1000 2000
Total CDS
RNA
Functional - shared with wPip
Functional - wPip alone
Functional - not in wPip
wPip
MIRA3
VelvetOptimizer (n50/tbp, 95 mer)
VelvetOptimizer (n50, 83 mer)
Scaffolding
Description Contigs Longest N50 Contigs >
1k
SOPRA / wDi2 101 89608 41594 77
Minimus2 / wDi1
99 106898 41594 76
Minimus2 / wDi3
104 206691 52418 66
Workflow
wDi1 wDi2 wDi3
Total CDS 1176 1190 1183
RNA 36 36 36
Functional - shared with wPip
608 610 608
Functional - wPip alone
24 23 24
Functional - not in wPip
23 22 23
Functional annotation using RAST
GLBio 2013, Pittsburgh
660 core protein clusters per Wolbachia genome
659 core protein cluster representatives found in wDi
Missing core cluster consists entirely of hypotheticals
32 lineage specific genes in wDi w.r.t. wPip
All have unknown function
11 have homologs in other endosymbionts of mosquito
Wolbachia endosymbiont of Drosophila melanogaster
Wolbachia wPip endosymbiont of Culex quinquefasciatus Pel
Wolbachia sp. wRi
Wolbachia endosymbiont strain TRS of Brugia malayi
0
200
400
600
800
1000
1200
1400
Total proteinsCore
Coreparalogous
SharedShared
paralogous Lineagespecific Lineage
specificparalogous
OrthoMCL analysis of pan-proteome
Core / Core paralogous
Shared / Shared paralogous
Lineage specific
GLBio 2013, Pittsburgh
Conclusions
Targeted genome reconstruction strategy for endosymbionts
Recommendations MIRA3 produced more fragmented but qualitatively better assemblies
than Velvet SOPRA and SSPACE are effective for scaffolding when coupled with
Minimus2 Mapped regions on reference should be validated for uniqueness Screening out ribosomal regions prior to mapping reduces false
positives
Caveats
Missing lineage specific regions Low throughput method
GLBio 2013, Pittsburgh
https://github.com/chrishah/MITObim
Questions??
Slides : http://www.slideshare.net/suryasaha/surya-saha-glbio2013