16
Mining Eukaryotic Meta-Genomes for Endosymbionts Exploring the Asian citrus psyllid microbiome Surya Saha 1 , Wayne B. Hunter 2 and Magdalen Lindeberg 1 1 Department of Plant Pathology and Plant-Microbe Biology, Cornell University, Ithaca, NY, USA, 2 USDA-ARS, US Horticultural Research Lab, Fort Peirce, FL, USA GLBio 2013, Pittsburgh

Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequencing

Embed Size (px)

DESCRIPTION

The availability of high-throughput next generation sequencing technologies presents an opportunity for in-silico discovery of endosymbionts. We describe a method for mining a whole genome shotgun metagenome to identify members of the endosymbiont community followed by reconstruction and validation of a high-quality draft microbial genome. The Asian citrus psyllid (D. citri Kuwayama or ACP) is host to 7+ bacterial endosymbionts and is the insect vector of Ca. liberibacter asiaticus, causal agent of citrus greening, a disease that has cost the Florida citrus industry $3.63 billion since 2006. DNA from D. citri was sequenced to 108X coverage to produce paired-end and mate-pair Illumina libraries. The sequences were mined for wolbachia (wACP) reads using 4 sequenced Wolbachia genomes as bait. Putative wACP reads were then assembled using Velvet and MIRA3 assemblers. The resulting wACP contigs were annotated using the RAST and compared to the closest sequenced wolbachia from an insect genome, Wolbachia endosymbiont of Culex quinquefasciatus (wPip). MIRA3 was able to reconstruct a majority of the wPip CDS regions and was therefore, selected for scaffolding using large insert mate-pair libraries. The wACP scaffolds were further improved using wPip as reference genome to orient and order the contigs. In order to determine the presence of the core Wolbachia proteins in our wACP scaffold, we compared them to core Wolbachia proteins identified by OrthoMCL. 1164/1213 wACP proteins had matches of which 669 were to core proteins. This number compares favorably to the number of core proteins (670) found in sequenced Wolbachias.

Citation preview

Page 1: Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequencing

Mining Eukaryotic Meta-Genomes for Endosymbionts

Exploring the Asian citrus psyllid microbiome

Surya Saha 1, Wayne B. Hunter 2 and Magdalen Lindeberg 1

1 Department of Plant Pathology and Plant-Microbe Biology, Cornell

University, Ithaca, NY, USA, 2 USDA-ARS, US Horticultural Research Lab, Fort Peirce, FL, USA

GLBio 2013, Pittsburgh

Page 2: Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequencing

You are free to: Copy, share, adapt, or re-mix;

Photograph, film, or broadcast;

Blog, live-blog, or post video of;

This presentation. Provided that: You attribute the work to its author and respect the rights and licenses associated with its components.

Slide Concept by Cameron Neylon: http://cameronneylon.net/blog/some-slides-for-granting-permissions-or-not-in-presentations/

Slides : http://www.slideshare.net/suryasaha/surya-saha-glbio2013

Twitter: @SahaSurya

GLBio 2013, Pittsburgh

Page 3: Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequencing

Acknowledgements

Funding

Magdalen Lindeberg Cornell University

• J. Kent Morgan, USDA-ARS • Mizuri Marutani-Hert, USDA-ARS • Hong Huang, University of South Florida

Wayne Hunter USDA-ARS

Justin Reese Genformatic, LLC.

GLBio 2013, Pittsburgh

Page 4: Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequencing

Figure 1. Workflow used to identify regions of reference genomes illuminated by

sequence reads from the Diaphorina citri metagenome data set.

Page 5: Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequencing

HLB found in Florida 2005

Worldwide distribution of Citrus Greening/Huanglongbing

1st scientific report of HLB in

China – 1919

GLBio 2013, Pittsburgh

Page 6: Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequencing

What is the impact of this disease?

Citrus – a 9.3 billion dollar industry in Florida and 1.2 billion dollars in California

• Leave agricultural practices unchanged - significant loss of production • Begin expensive , labor intensive controls - increase cost of production

healthy infected Increased cost of citrus products for consumers

Between 2006 and 2011 greening had cost Florida $4.5 billion in lost economic output, and 8,257 jobs.

University of Florida report, 2013 http://research.ufl.edu/publications/exploremagazine/spring-2013/citrus-greening.html

GLBio 2013, Pittsburgh

Page 7: Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequencing

How it gets around:

Ca. Liberibacter asiaticus is vectored by

Asian Citrus Psyllid

Insect transmission greatly enhances potential for rapid long distance spread

Psyllid feeds on many members of the citrus family

Once infected, each plant becomes a disease reservoir from

which uninfected psyllids can pick up bacteria

Ca. Liberibacter asiaticus in citrus phloem cells

GLBio 2013, Pittsburgh

Page 8: Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequencing

Liberibacter

psyllid citrus

Big picture for molecular interactions

Could lead to identification of metabolically complementary bacteria for co-culturing with Liberibacter

Unculturable

GLBio 2013, Pittsburgh

Page 9: Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequencing

Why endosymbionts?

Significant impact on diverse host processes Nutritional status Reproduction Lifespan Resistance to insecticides

Means to explore population dynamics and relatedness to other isolates

Known examples of highly-evolved host-symbiont relationship

Buchnera and Aphids

Psyllid microbiome diversity 10 bacteria reported in ACP (Florida) 5 additional bacteria reported in

closely related potato psyllid (B. cockerelli)

Buchnera-derived essential amino acids supporting the growth of pea aphid larvae. [ Gündüz and Douglas 2009]

GLBio 2013, Pittsburgh

Page 10: Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequencing

Candidate endosymbionts

Description Length (bp) Coverage

Length

(bp)

Klebsiella variicola At-22 chromosome, complete

genome 5,458,505 1.144% 307,111

Salmonella enterica subsp. enterica serovar Typhi str.

Ty2 chromosome, complete genome 4,791,961 1.266% 464,747

Staphylococcus epidermidis ATCC 12228

chromosome, complete genome 2,499,279 3.077% 60,415

Acidovorax avenae subsp. avenae ATCC 19860

chromosome, complete genome 5,482,170 0.473% 71,218

Acinetobacter sp. DR1 chromosome,complete

genome 4,152,543 2.763% 102,446

Herbaspirillum seropedicae SmR1 chromosome,

complete genome 5,513,887 0.949% 23,509

Wolbachia endosymbiont of Culex quinquefasciatus

Pel, complete genome 1,482,455 86.787% 1,140,899

Methylibium petroleiphilum PM1 chromosome,

complete genome 4,044,195 0.318% 20,107

Why Wolbachia?

Known obligate association with up to 16% of insects species

Positive correlation of titer with Ca. liberibacer asiaticus

Associated with Cytoplasmic Incompatibility in hosts

Used as biocontrol in dengue fever and malaria

GLBio 2013, Pittsburgh

Page 11: Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequencing

Read capture using baits

Wolbachia endosymbiont of Drosophila melanogaster Wolbachia endosymbiont of Culex quinquefasciatus Pel (wPip) Wolbachia sp. wRi Wolbachia endosymbiont strain TRS of Brugia malayi

Dataset Total (bp) 100% 90% 85%

Paired-end 33,708,218,600 127,335,000 228,893,000 237,906,000

Mate-pair 2k 7,020,642,800 11,200 45,200 63,000

Mate-pair 5k 6,000,789,600 10,000 43,000 66,800

Mate-pair 10k 7,307,423,600 2,895,000 4,427,600 4,909,000

Results of mining putative wDi reads from the ACP metagenome

GLBio 2013, Pittsburgh

Page 12: Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequencing

Assembler Contigs N50 Contigs >

1k

Length of Contigs >

40k

VelvetOptimizer (n50, 83 mer)

111 79935 53 807171

VelvetOptimizer (n50/tbp, 95 mer)

118 84957 25 744357

MIRA3 167 23068 105 247976

Annotation

0 1000 2000

Total CDS

RNA

Functional - shared with wPip

Functional - wPip alone

Functional - not in wPip

wPip

MIRA3

VelvetOptimizer (n50/tbp, 95 mer)

VelvetOptimizer (n50, 83 mer)

Scaffolding

Description Contigs Longest N50 Contigs >

1k

SOPRA / wDi2 101 89608 41594 77

Minimus2 / wDi1

99 106898 41594 76

Minimus2 / wDi3

104 206691 52418 66

Workflow

wDi1 wDi2 wDi3

Total CDS 1176 1190 1183

RNA 36 36 36

Functional - shared with wPip

608 610 608

Functional - wPip alone

24 23 24

Functional - not in wPip

23 22 23

Functional annotation using RAST

GLBio 2013, Pittsburgh

Page 13: Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequencing

660 core protein clusters per Wolbachia genome

659 core protein cluster representatives found in wDi

Missing core cluster consists entirely of hypotheticals

32 lineage specific genes in wDi w.r.t. wPip

All have unknown function

11 have homologs in other endosymbionts of mosquito

Wolbachia endosymbiont of Drosophila melanogaster

Wolbachia wPip endosymbiont of Culex quinquefasciatus Pel

Wolbachia sp. wRi

Wolbachia endosymbiont strain TRS of Brugia malayi

0

200

400

600

800

1000

1200

1400

Total proteinsCore

Coreparalogous

SharedShared

paralogous Lineagespecific Lineage

specificparalogous

OrthoMCL analysis of pan-proteome

Core / Core paralogous

Shared / Shared paralogous

Lineage specific

GLBio 2013, Pittsburgh

Page 14: Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequencing

Conclusions

Targeted genome reconstruction strategy for endosymbionts

Recommendations MIRA3 produced more fragmented but qualitatively better assemblies

than Velvet SOPRA and SSPACE are effective for scaffolding when coupled with

Minimus2 Mapped regions on reference should be validated for uniqueness Screening out ribosomal regions prior to mapping reduces false

positives

Caveats

Missing lineage specific regions Low throughput method

GLBio 2013, Pittsburgh

Page 15: Mining Eukaryotic Meta-Genomes for Endosymbionts using Next-Generation Sequencing

https://github.com/chrishah/MITObim