13
Assembling the Glanville fritillary genome Panu Somervuo University of Helsinki MRG group & DNA sequencing and genomics lab CSC Conference 2.6.2010 Next generation sequencing data analysis

Assembling the Glanville fritillary genome

  • Upload
    denim

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

CSC Conference 2.6.2010 Next generation sequencing data analysis. Assembling the Glanville fritillary genome. Panu Somervuo University of Helsinki MRG group & DNA sequencing and genomics lab. Next generation sequencing. Roche 454 Illumina Solexa ABI SOLiD. Assembly pipeline. Newbler - PowerPoint PPT Presentation

Citation preview

Page 1: Assembling the Glanville fritillary genome

Assembling the Glanville fritillary genome

Panu Somervuo

University of HelsinkiMRG group & DNA sequencing and genomics lab

CSC Conference 2.6.2010 Next generation sequencing data analysis

Page 2: Assembling the Glanville fritillary genome

Next generation sequencing

• Roche 454• Illumina Solexa• ABI SOLiD

Page 3: Assembling the Glanville fritillary genome

Newbler320Mbp

220K contigsN50: 1700nt

mapping

SOLiD: 40K scaffolds

27M unique

Assembly pipeline

• 454– 10M single reads 400bp

• Illumina Solexa– 52M 2*101 pairend (insertsize 600bp)– 102M 2*76 pairend (insertsize 600bp)– error correction, soap denovo

scaffolds 2M 2*75 matepairs, span 1500 at every 25bp

• SOLiD– 420M 2*50 matepairs (insertsize 1Kbp)filtering 96M

• EST– 26K

Page 4: Assembling the Glanville fritillary genome

Assembly validation 1: contigs vs nr

contig BLASTXhits top5contig00008 216 Bombyx mori (domestic silkworm), Bombyx mori (domestic silkworm), Aedes aegypti (Stegomyia

aegypti), Nasonia vitripennis (jewel wasp), Nasonia vitripennis (jewel wasp)contig00077 2 Acyrthosiphon pisum (pea aphid), Acyrthosiphon pisum (pea aphid)contig00084 63 Apis mellifera (honey bee), Forficula auricularia (European earwig), Forficula auricularia

(European earwig), Forficula auricularia (European earwig), Forficula auricularia (European earwig)contig00094 2 Tribolium castaneum (red flour beetle), Apis mellifera (honey bee)contig00198 203 Tribolium castaneum (red flour beetle), Tribolium castaneum (red flour beetle), Nasonia

vitripennis (jewel wasp), Pediculus humanus corporis (human body louse), Apis mellifera (honey bee)contig00208 68 Acyrthosiphon pisum (pea aphid), Acyrthosiphon pisum (pea aphid), Acyrthosiphon pisum (pea aphid),

Tribolium castaneum (red flour beetle), Strongylocentrotus purpuratuscontig00216 163 Pediculus humanus corporis (human body louse), Culex quinquefasciatus (southern house mosquito),

Aedes aegypti (Stegomyia aegypti), Culex quinquefasciatus (southern house mosquito), Tribolium castaneum (red flour beetle)contig00229 39 Tribolium castaneum (red flour beetle), Culex quinquefasciatus (southern house mosquito),

Pediculus humanus corporis (human body louse), Apis mellifera (honey bee), Drosophila pseudoobscura pseudoobscuracontig00251 76 Acyrthosiphon pisum (pea aphid), Pediculus humanus corporis (human body louse), Nematostella

vectensis (starlet sea anemone), Strongylocentrotus purpuratus, Strongylocentrotus purpuratuscontig00278 90 Aedes aegypti (Stegomyia aegypti), Anopheles gambiae str. PEST, Nasonia vitripennis (jewel wasp),

Drosophila willistoni, Drosophila viriliscontig00279 43 Bombyx mori (domestic silkworm), Culex quinquefasciatus (southern house mosquito), Culex

quinquefasciatus (southern house mosquito), Anopheles gambiae str. PEST, Tribolium castaneum (red flour beetle)contig00302 250 Acyrthosiphon pisum (pea aphid), Salmo salar (Atlantic salmon), Branchiostoma floridae (Florida

lancelet), Ciona intestinalis, Ciona intestinaliscontig00310 26 Tribolium castaneum (red flour beetle), Acyrthosiphon pisum (pea aphid), Nasonia vitripennis

(jewel wasp), Aedes aegypti (Stegomyia aegypti), Aedes aegypti (Stegomyia aegypti)contig00321 218 Acyrthosiphon pisum (pea aphid), Aedes aegypti (Stegomyia aegypti), Aedes aegypti (Stegomyia aegypti),

Tribolium castaneum (red flour beetle), Culex quinquefasciatus (southern house mosquito)contig00471 91 Drosophila virilis, Drosophila mojavensis, Drosophila ananassae, Drosophila yakuba, Drosophila grimshawicontig00507 3 Ostrinia nubilalis (European corn borer), Ostrinia nubilalis (European corn borer),

Ostrinia nubilalis (European corn borer)contig00525 250 Bombyx mori (domestic silkworm), Nasonia vitripennis (jewel wasp), Aedes aegypti (Stegomyia aegypti),

Apis mellifera (honey bee), Apis mellifera (honey bee)contig00533 8 Ostrinia nubilalis (European corn borer), Ostrinia nubilalis (European corn borer), Ostrinia nubilalis

(European corn borer), Bombyx mori (domestic silkworm), Strongylocentrotus purpuratus

Page 5: Assembling the Glanville fritillary genome

52 13

Assembly validation 2: Genomic contigs vs EST contigs

Page 6: Assembling the Glanville fritillary genome

rev_contig310 1 --TTCAGAGAAACAAGTGAATTGAAATTTGATTATTTAtTTTCGTTTCAG 48 |||||||||||||||.|||||||||||||||||||||||||||||.||contig402106 1 TTTTCAGAGAAACAAGTAAATTGAAATTTGATTATTTATTTtCGTTTTAG 50

rev_contig310 49 TATGAAGCAGCAGCGAGAGGTGCAGAAGCACTTGGAAACAGATATGGTAC 98 |||||||||||.||||||||||||||||||||||||||.|||||||||||contig402106 51 TATGAAGCAGCCGCGAGAGGTGCAGAAGCACTTGGAAAAAGATATGGTAC 100

rev_contig310 99 AAAtTATAGAGTAGGAGtTGCCGCAGATATTCtTTGTAAGtTGTTTTTTT 148 ||||||||||||||||||||||||||||||||||||||||||||||||||contig402106 101 AAATTATAGAGTAGGAGTTGCCGCAGATATTCTTTGTAAGTTGTTTTTTT 150

rev_contig310 149 AATCAGTTTAGCtTGCAGCtTTAAGACTATTATTATATATTTTTTTATCG 198 ||||.|||||.||||||||||||||||||||||||||| |||||||||||contig402106 151 AATCGGTTTATCTTGCAGCTTTAAGACTATTATTATAT-TTTTTTtATCG 199

rev_contig310 199 TTGTACAGTAAGAAGCTACATAAtTTTTcCTACCGcCTA--TT-----gg 241 ||||||||||||||||||||||||||||||||||||||| || .|contig402106 200 TTGTACAGTAAGAAGCTACATAATTTTTCCTACCGCCTATTTTGGGGGAG 249

rev_contig310 242 GGGGGGGGATTGTTGAATCAGTTAAGAATTAAAAGATGATGCTAtTTCAG 291 ||||||||||||||.|||||||.||||||| |||||||||||||||||||contig402106 250 GGGGGGGgATTGTTAAATCAGTCAAGAATT-AAAGATGATGCTATTTCAG 298

rev_contig310 292 aATACtTaAACttTTTTTAAGAC--GAC---------T-A-TAA-GTTTA 327 ||.||||.||||||||||||||| ||| | | ||. |||||contig402106 299 AAAACTTCAACTTTTTTtAAGACTAGACTATTTTTAATAATTAGTGTTTA 348

rev_contig310 328 AATAACACTAATTATTaAAAACTTGGTCTATCTTGGTCTTGGtTTTAGGt 377 |||||||||||||||||||||||||.||||||||.||||||||.|.||||contig402106 349 AATAACACTAATTATTAAAAACTTGATCTATCTTCGTCTTGGTCTAAGGT 398

rev_contig310 378 TTTTCCTCTAGTTAATATTACTGTTACAACTACATAAAAACAATAAAATA 427 ||.|||||||||||||.|||||||||||||||||||||||||||||..||contig402106 399 TTGTCCTCTAGTTAATCTTACTGTTACAACTACATAAAAACAaTAAGGTA 448

rev_contig310 428 CTGTATCTTTGCAGATCCTATGAGCGGAACCACTTTTGACTGGGCGAAGA 477 |||||||||||.||||||||||||||||||||||||||||||||||||||contig402106 449 CTGTATCTTTGTAGATCCTATGAGCGGAACCACTTTtGACTGGGCGAAGA 498

478 ATACAACAAATGTCCCATTTTCTTACCTGATTGAATTAAGAGACTTGGGG 527 ||.|||||||||||||||||||||||||||||||||||||||||||||||499 ATGCAACAAATGTCCcATTTtCTTACCTGATTGAATTAAGAGACTtGGGg 548

528 CAATACGGTTTCTTGTTACCAGCAGAACAGATTATTCCAACTAATTTAGA 577 |||||||||||||||||||||||||||||||||||.||||||||||||||549 CAaTACGGTTtCTTGTTACcAGCAGAACAGATTATACCAACTAATTtAGA 598

578 AATAATGGATGCACTCCTGGAGATGGATAATACCGCAAGAACACTAgGG 626 ||||||||||||||||||||||||||||||.|||||||||||||||||.599 AATAaTGGATGCACTCcTGGAGATGGATAACACCGCAAGAACACTAGGA 647

Page 7: Assembling the Glanville fritillary genome

What now? Still more sequencing needed...

• target enrichment: 55K 120nt probes

• 5’ SAGE• longer matepairs longer contigs & scaffolds

? ?

?

?

? ??

annotation

Page 8: Assembling the Glanville fritillary genome

Challenges

• no elegant solution for combining SOLiD colorspace reads with other platforms in denovo assembly

• read quality: filtering vs error correction• difficulties generating long matepairs• how to finish the assembly project: validation

Goal: to get contigs/scaffolds useful for gene prediction

Page 9: Assembling the Glanville fritillary genome

What is the best assembler?

• soap, velvet, Newbler, CLC bio, Celera

• #contigs, contig lengths, accuracy

Page 10: Assembling the Glanville fritillary genome

Assembling Solexa data

number of contigs sum of contig lengths

contig size contig size

52M 2*101 pairend (insertsize 600bp) 102M 2*76 pairend (insertsize 600bp) error correction (soap denovo)

Page 11: Assembling the Glanville fritillary genome

Assembling 454 data, 10M single reads 400bpnumber of contigs sum of contig lengths

contig size

Newbler: all 454 data + 2M 1500nt matepairs from soap scaffoldsCLC bio: all 454 data + all Solexa data

contig size

Page 12: Assembling the Glanville fritillary genome

- read errors- repetitive elements

denovo

assembler

history:

Part I

Page 13: Assembling the Glanville fritillary genome

de Bruijn graph

denovo

assembler

history:

Part I

I