Upload
surya-saha
View
203
Download
1
Embed Size (px)
Citation preview
www.citrusgreening.org
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diaphorina
citri Genome: An update
Surya Saha1, Susan J. Brown2 and Lukas Mueller1
1Boyce Thompson Institute; 2Kansas State Univeristy
[email protected] @SahaSurya
PAG XXV
Arthropod Genomics Workshop
www.citrusgreening.org
Acknowledgements
Mueller Lab
Mirella Flores
Prashant Hosmani
Kansas State University
Sue Brown
USDA/ARS
Wayne Hunter
www.citrusgreening.org
Citrus Greening: Huanglongbing• Most significant disease of citrus worldwide
• More than $4.5 billion in lost citrus production and more than 8,200 lost jobs (2006/07 to 2010/11)
• Associated with gram negative bacterium Candidatus Liberibacter asiaticus (CLas)
• Spread by insect vector, Diaphorina citri (Asian citrus psyllid, ACP)
Annie Kruse
www.citrusgreening.org
Asian citrus psyllid (ACP)Diaphorina citri
CLasCandidatus Liberibacter asiaticus
Citrus spp.
The Biological Players
HostVector
Pathogen
www.citrusgreening.org
Current Assembly: Diaci v1.1 Data
Source Library Count SRA
Illumina Paired-end 99.7 mill SRX057205
Illumina 2kb Mate-pair 35.1 mill SRX057204
Illumina 5kb Mate-pair 30 mill SRX058250
Illumina 10kb Mate-pair 30 mill SRX216330
Pacbio 2.7 mill SRX218985
https://citrusgreening.org/organism/Diaphorina_citri/genome
Input: Mixed sample of sexually reproducing heterogeneous individuals
www.citrusgreening.org
Current Assembly: Diaci v1.1 Genome
Scaffold N50:109,898 bpContig N50: 34,407bp
Highly fragmented
Misassemblies
Whitefly scaffold N50: 3.23 Mbp!!
Genome Diaci1.1
Contigs 161,988
Total Length 485,705,082 or 485 Mb
Longest 1,098,238
Shortest 201
Ns 19,337,167
www.citrusgreening.org
530 manually curated genes
MCOT transcriptome to identify genes
Many examples of misassemblies!!
Preprint: http://biorxiv.org/content/early/2017/01/09/099168
www.citrusgreening.org
Pacbio Sequencing
41 SMRT cells70X coverage
Counts
Number of sequences 4,998,464
Total bases 36.1 Gb
DBG2OLC Hybrid Assemblyhttps://github.com/yechengxi/DBG2OLC
NCBI Diaci1.1 and Pacbio assembly
Very fast
Adaptive Threshold
Kmer Cov Threshold
Contigs Total Length Longest Shortest Ns N50 (bp)
0.01 10 4,101 209,890,188 355,110 2,828 0 65,023
0.01 6 4,026 211,371,846 411,664 2,249 18 68,453
0.02 8 3,997 206,583,925 383,133 3,744 19 66,259
Miniasm Assembly (Raw Reads)https://github.com/lh3/miniasm/blob/master/README.md
No error correction
Very fast
Contig N50: 83,490bp (was 34,407bp)
Counts
Number of contigs
8,060
Total bases 458,143,096 or 458 Mb
Longest 1,188,453 bp
Shortest 5, 633 bp
Average length 56,841.6 bp
CANU Assemblyhttp://canu.readthedocs.io/en/stable/
Error correction of 40% longest reads26.5X coverage after correction
Error rate 0.013 Error rate 0.015
Number of contigs
7,832 8,030
Total bases 462,838,769 or 462 Mb
493,169,880 or 493.1 Mb
Longest 1,677,652 bp 1,757,402 bp
Shortest 4,425 bp 5,079 bp
Average length 59,095.9 bp 61,415.9 bp
Contig N50 85,832 bp 92,630 bp
PBJelly Scaffolding of CANU Err 0.013 Assembly
Error rate 0.013 Error rate 0.015 Scaffolded 0.03
Number of contigs
7,832 8,030 8,352
Total bases 462,838,769 or 462 Mb
493,169,880 or 493.1 Mb
591,730,999 bp or 591.7 Mb
Longest 1,677,652 bp 1,757,402 bp 2,096,698 bp
Shortest 4,425 bp 5,079 bp 1,547 bp
Averagelength
59,095.9 bp 61,415.9 bp 70,849.0 bp
Contig N50 85,832 bp 92,630 bp 115,896 bp
5,290 gap extensions; 535 gaps filled; Number of Ns: 0 bp
www.citrusgreening.org
Benchmarking
Complete Fragmented Missing
Diaci 1.1 90% 6% 4%
Diaci 1.9 92% 1% 7%
White fly 98.2% 0.5% 1.3%
PE RNAseq622 Mill reads
Overall alignment rate
Concordant alignment rate
Diaci 1.1 82% 0.62%
Diaci 1.9 88% 79%
Benchmarking sets of Universal Single-Copy Orthologs based on a set of 1,066 single-copy orthologs from 133 arthropods species
www.citrusgreening.org
FALCON-Unzip Assembler
Hierarchical Genome Assembly Process
(HGAP)
Outputs • Primary contigs • Alternate contigs
Haplotype resolution
www.citrusgreening.org
Dovetail Scaffolding
High molecular weight (50kb+) DNA
500ng input DNA from single male psyllid
Illumina paired-end sequencing
Chicago library preparation
www.citrusgreening.org
Haplotyping Contigs with 10X
Long read information from short reads using 14bp bar codes
Very low input DNA (0.625 ng for ACP)
1ng of DNA is split across 100,000 Gel Coated Beads (GEMs)
Chromium instrument
http://www.10xgenomics.com/products/
www.citrusgreening.org
http://www.bionanogenomics.com/technology/why-genome-mapping/
www.citrusgreening.org
Example: Human MHC map
• Sample prep requires very high molecular weight DNA• Nicks at 10 sites / 100kb• Individual molecules are assembles into optical maps (Cmaps)• Optical maps and sequences are merged in a hybrid assembly
http://www.bionanogenomics.com/technology/why-genome-mapping/
ACP molecule N50: 240 kb
1 Mb DNA fragments are ideal
Optimizing enzymes for ACP
www.citrusgreening.org
Twitter: 280 followersFacebook: 71 followers
CitrusGreening.org
Host, Vector and Pathogen(s)
Blast Databases
BioCycs
JBrowse
FTP site
News
Publications
Social Media
www.citrusgreening.org
P0368: Systems Biology Resources for the Citrusgreening Disease Complex
Mirella Flores
www.citrusgreening.org
Official Gene Set v1.0
530 manually curated genes
30,562 MCOT gene models
Pathways:Developmental PhysiologicalRNAi regulatoryImmunity-related
Preprint: http://biorxiv.org/content/early/2017/01/09/099168