23
www.citrusgreening.org Using Long Reads, Optical Maps and Long - Range Scaffolding to improve the Diaphorina citri Genome: An update Surya Saha 1 , Susan J. Brown 2 and Lukas Mueller 1 1 Boyce Thompson Institute; 2 Kansas State Univeristy [email protected] @SahaSurya PAG XXV Arthropod Genomics Workshop

Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diaphorina citri Genome: An update

Embed Size (px)

Citation preview

www.citrusgreening.org

Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diaphorina

citri Genome: An update

Surya Saha1, Susan J. Brown2 and Lukas Mueller1

1Boyce Thompson Institute; 2Kansas State Univeristy

[email protected] @SahaSurya

PAG XXV

Arthropod Genomics Workshop

www.citrusgreening.org

Acknowledgements

Mueller Lab

Mirella Flores

Prashant Hosmani

Kansas State University

Sue Brown

USDA/ARS

Wayne Hunter

www.citrusgreening.org

Citrus Greening: Huanglongbing• Most significant disease of citrus worldwide

• More than $4.5 billion in lost citrus production and more than 8,200 lost jobs (2006/07 to 2010/11)

• Associated with gram negative bacterium Candidatus Liberibacter asiaticus (CLas)

• Spread by insect vector, Diaphorina citri (Asian citrus psyllid, ACP)

Annie Kruse

www.citrusgreening.org

Asian citrus psyllid (ACP)Diaphorina citri

CLasCandidatus Liberibacter asiaticus

Citrus spp.

The Biological Players

HostVector

Pathogen

www.citrusgreening.org

Current Assembly: Diaci v1.1 Data

Source Library Count SRA

Illumina Paired-end 99.7 mill SRX057205

Illumina 2kb Mate-pair 35.1 mill SRX057204

Illumina 5kb Mate-pair 30 mill SRX058250

Illumina 10kb Mate-pair 30 mill SRX216330

Pacbio 2.7 mill SRX218985

https://citrusgreening.org/organism/Diaphorina_citri/genome

Input: Mixed sample of sexually reproducing heterogeneous individuals

www.citrusgreening.org

Current Assembly: Diaci v1.1 Genome

Scaffold N50:109,898 bpContig N50: 34,407bp

Highly fragmented

Misassemblies

Whitefly scaffold N50: 3.23 Mbp!!

Genome Diaci1.1

Contigs 161,988

Total Length 485,705,082 or 485 Mb

Longest 1,098,238

Shortest 201

Ns 19,337,167

www.citrusgreening.org

530 manually curated genes

MCOT transcriptome to identify genes

Many examples of misassemblies!!

Preprint: http://biorxiv.org/content/early/2017/01/09/099168

www.citrusgreening.org

Pacbio Sequencing

41 SMRT cells70X coverage

Counts

Number of sequences 4,998,464

Total bases 36.1 Gb

DBG2OLC Hybrid Assemblyhttps://github.com/yechengxi/DBG2OLC

NCBI Diaci1.1 and Pacbio assembly

Very fast

Adaptive Threshold

Kmer Cov Threshold

Contigs Total Length Longest Shortest Ns N50 (bp)

0.01 10 4,101 209,890,188 355,110 2,828 0 65,023

0.01 6 4,026 211,371,846 411,664 2,249 18 68,453

0.02 8 3,997 206,583,925 383,133 3,744 19 66,259

Miniasm Assembly (Raw Reads)https://github.com/lh3/miniasm/blob/master/README.md

No error correction

Very fast

Contig N50: 83,490bp (was 34,407bp)

Counts

Number of contigs

8,060

Total bases 458,143,096 or 458 Mb

Longest 1,188,453 bp

Shortest 5, 633 bp

Average length 56,841.6 bp

CANU Assemblyhttp://canu.readthedocs.io/en/stable/

Error correction of 40% longest reads26.5X coverage after correction

Error rate 0.013 Error rate 0.015

Number of contigs

7,832 8,030

Total bases 462,838,769 or 462 Mb

493,169,880 or 493.1 Mb

Longest 1,677,652 bp 1,757,402 bp

Shortest 4,425 bp 5,079 bp

Average length 59,095.9 bp 61,415.9 bp

Contig N50 85,832 bp 92,630 bp

PBJelly Scaffolding of CANU Err 0.013 Assembly

Error rate 0.013 Error rate 0.015 Scaffolded 0.03

Number of contigs

7,832 8,030 8,352

Total bases 462,838,769 or 462 Mb

493,169,880 or 493.1 Mb

591,730,999 bp or 591.7 Mb

Longest 1,677,652 bp 1,757,402 bp 2,096,698 bp

Shortest 4,425 bp 5,079 bp 1,547 bp

Averagelength

59,095.9 bp 61,415.9 bp 70,849.0 bp

Contig N50 85,832 bp 92,630 bp 115,896 bp

5,290 gap extensions; 535 gaps filled; Number of Ns: 0 bp

www.citrusgreening.org

Benchmarking

Complete Fragmented Missing

Diaci 1.1 90% 6% 4%

Diaci 1.9 92% 1% 7%

White fly 98.2% 0.5% 1.3%

PE RNAseq622 Mill reads

Overall alignment rate

Concordant alignment rate

Diaci 1.1 82% 0.62%

Diaci 1.9 88% 79%

Benchmarking sets of Universal Single-Copy Orthologs based on a set of 1,066 single-copy orthologs from 133 arthropods species

Diaci v1.9 Interim Assembly

• Blast

• FTP bulk download

www.citrusgreening.org

FALCON-Unzip Assembler

Hierarchical Genome Assembly Process

(HGAP)

Outputs • Primary contigs • Alternate contigs

Haplotype resolution

www.citrusgreening.org

Dovetail Scaffolding

High molecular weight (50kb+) DNA

500ng input DNA from single male psyllid

Illumina paired-end sequencing

Chicago library preparation

www.citrusgreening.org

Haplotyping Contigs with 10X

Long read information from short reads using 14bp bar codes

Very low input DNA (0.625 ng for ACP)

1ng of DNA is split across 100,000 Gel Coated Beads (GEMs)

Chromium instrument

http://www.10xgenomics.com/products/

www.citrusgreening.org

http://www.bionanogenomics.com/technology/why-genome-mapping/

www.citrusgreening.org

Example: Human MHC map

• Sample prep requires very high molecular weight DNA• Nicks at 10 sites / 100kb• Individual molecules are assembles into optical maps (Cmaps)• Optical maps and sequences are merged in a hybrid assembly

http://www.bionanogenomics.com/technology/why-genome-mapping/

ACP molecule N50: 240 kb

1 Mb DNA fragments are ideal

Optimizing enzymes for ACP

www.citrusgreening.org

Twitter: 280 followersFacebook: 71 followers

CitrusGreening.org

Host, Vector and Pathogen(s)

Blast Databases

BioCycs

JBrowse

FTP site

News

Publications

Social Media

www.citrusgreening.org

P0368: Systems Biology Resources for the Citrusgreening Disease Complex

Mirella Flores

www.citrusgreening.org

Official Gene Set v1.0

530 manually curated genes

30,562 MCOT gene models

Pathways:Developmental PhysiologicalRNAi regulatoryImmunity-related

Preprint: http://biorxiv.org/content/early/2017/01/09/099168

www.citrusgreening.org

Thank you!!