30
Final Results Genome Assembly Team Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick

Final Results Genome Assembly Team

  • Upload
    fisk

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Final Results Genome Assembly Team. Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington , Juliette Zerick. Original Pipeline. 454. Illumina DeNovo Allpaths LG SOAP DeNovo Velvet Taipan SUTTA. - PowerPoint PPT Presentation

Citation preview

Page 1: Final Results Genome Assembly Team

Final ResultsGenome Assembly Team

Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick

Page 2: Final Results Genome Assembly Team

454 raw reads

PRE-PROCESSING

Illumina raw reads

Pre-processing

454 reads

Illumina reads

Statistical analysis

Read stats

Published Genomes from public databases

V. vulnificus

YJ016

V. vulnificus CMCP6

V. vulnificus MO6-24/O

Align Illumina against the reference

FastqcPrinseqNGS QC

Compare mapping statistics

Reference genome

samstats

bwa

REFERENCE SELECTION

Hybrid DeNovo • Ray• MIRA

Illumina/ 454/ Hybrid DeNovo assembly

454 DeNovo• Newbler• CABOG• SUTTA

Illumina DeNovo• Allpaths LG• SOAP DeNovo• Velvet• Taipan• SUTTA

contigs * 3

Align illumina reads against 454 contigs

Unmapped reads

Mac vectorCLC wb

contigs

Unmapped reads

Evaluation

GAGEHawk-eye

Illumina/(454?) reference based

assembly

AMOScmp

contigs

Unmapped reads

DENOVO ASSEMBLY

REFERENCE BASED ASSEMBLY

Draft/ Finished genome

Reference evaluation

Reference evaluation

DNA DiffMUMmer

Parameter optimization

CONTIG MERGING

All possible combinations of the

best 3

MimimusMAIA

PAGITMauve

Finished genomeScaffolds

GAGE

GENOME FINISHING

Gap filling Nulceotide identity

MUMmer

GRASSBuilt-in

Process

454

Illumina

Info.

Chosen Ref.

Assemblers

Assemblers

Illumina454

LEGEND

hybrid

Original Pipeline

Page 3: Final Results Genome Assembly Team

Read Visualization – spot the differences

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Comparison of 454 Reads for 08-2462 (low coverage) and 2541-90 (improved coverage)

Page 4: Final Results Genome Assembly Team

Read Visualization - more is better!

Nav 08-2462 454 reads compared to Nav 08-2462 Illumina reads.

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 5: Final Results Genome Assembly Team

Read Visualization – cousins or siblings?

Nav_2541-90 and Vul_06-2432 (454 and Illumina reads) coverage comparison.

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 6: Final Results Genome Assembly Team

Data Quality

Effect of pre-processing data (using prinseq)

Page 7: Final Results Genome Assembly Team

V. navarensis (454; non-preprocessed|pre-processed)Metric 2423-01 08-2462 2541-90 2756-81

Per Base Seq. Quality

Per Seq. Quality Sc

Per Base Seq. Content

Per Base GC Content

Per Seq. GC Content

Per Base N Content

Seq. Length Dist.

Seq. Dup. Levels

Overrepresented Seqs.

Kmer Content

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 8: Final Results Genome Assembly Team

V. Vulnificus (454; non-preprocessed|preprocessed)

Metric

Metric 2009V_1368

06-2432 08-2435 08-2439 07-2444

Per Base Seq. Quality

Per Seq. Quality Score

Per Base Seq. Content

Per Base GC Content

Per Seq. GC Content

Per Base N Content

Seq. Length Dist.

Seq. Dup. Levels

Overrepresented Seqs.

Kmer Content

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 9: Final Results Genome Assembly Team

V. navarensis (Illumina; non-preprocessed|preprocessed)

Metric 2423-01 08-2462 2541-90 2756-81

Per Base Seq. Quality

Per Seq. Quality Score

Per Base Seq. Content

Per Base GC Content

Per Seq. GC Content

Per Base N Content

Seq. Length Dist.

Seq. Dup. Levels

Overrepresented Seqs.

Kmer Content

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 10: Final Results Genome Assembly Team

V. vulnificus (Illumina; non-preprocessed|preprocessed)Metric 2009V_1368 06-2432 08-2435 08-2439 07-2444

Per Base Seq. Quality

Per Seq. Quality Score

Per Base Seq. Content

Per Base GC Content

Per Seq. GC Content

Per Base N Content

Seq. Length Dist.

Seq. Dup. Levels

Overrepresented Seqs.

Kmer Content

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 11: Final Results Genome Assembly Team

Assembly

Reference-guided and de-Novo

Page 12: Final Results Genome Assembly Team

Reference guided assembly

Comparison of reference guided assembly vs de-novo assembly

Page 13: Final Results Genome Assembly Team

ARE – Assembly Score

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 14: Final Results Genome Assembly Team

Reference-guided vs de-Novo assembly

AMOSC

mp

Newble

r (ref)

CABOG

Newble

r (dn)

SOAP

dnVe

lvet

Ray0

102030405060708090

454 (Vul_06-2432)454 (Nav_2541-90)Illumina (Vul_06-2432)Illumina (Nav_2541-90)

ARE

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 15: Final Results Genome Assembly Team

Summary of Reference-guided assembly Using V. vulnificus (CMCP6) reference strain

84% coverage De-Novo assemblers overall provided higher assembly score

than reference based assembly

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 16: Final Results Genome Assembly Team

40 50 1000

102030405060708090

100Newbler (denovo)

Nav_2541-90Vul_06-2432

K-MER SIZE

ARE

De Novo Assembly

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 17: Final Results Genome Assembly Team

De Novo Assembly

15 22 2505

101520253035404550

CABOG

Nav_2541-90Vul_06-2432

K-MER Size

ARE

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 18: Final Results Genome Assembly Team

De Novo Assembly

20 30 40 50 60 700

0.51

1.52

2.53

3.54

SOAPdenovo

Nav_2541-90Vul_06-2432

K-MER Size

ARE

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 19: Final Results Genome Assembly Team

De Novo Assembly

19 25 310

1

2

3

4

5

6

7

Velvet

Nav_2541-90Vul_06-2432

K-MER Size

ARE

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 20: Final Results Genome Assembly Team

De-Novo Assembler Comparison (Optimal Parameters)

CABOG Newbler (dn)

Ray Ray (hybrid)

SOAPdn Velvet0

10

20

30

40

50

60

70

80

90

100

454 (Vul_06-2432)Illumina (Vul_06-2432)454 (Nav_2541-90)Illumina (Nav_2541-90)Hybrid (Vul_06-2432)Hybrid (Nav_2541-90)

ARE

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 21: Final Results Genome Assembly Team

Final Results – V. vulnificus

Assem

bly Score

Velvet

Graph comparing assemblers on 3 criteria: Assembly Score, Span Ratio, 1/(Break Points). Higher score for all criteria are preferable. Newbler (dn) has been removed to show variance in other tools.

Span Ratio

CABOG

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 22: Final Results Genome Assembly Team

Final Results – V. vulnificus

Graph comparing assemblers on 3 criteria: Assembly Score, Span Ratio, 1/(Break Points). Higher score for all criteria are preferable.

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

1000/(Break Points)

Page 23: Final Results Genome Assembly Team

Summary of de-Novo results OLC assemblers showed considerable differences in ARE than

de-Brujin based assemblers Cabog/Newbler vs Soap de-Novo/Velvet

Hybrid assembler, Ray, did not perform as well in terms of assembly score

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 24: Final Results Genome Assembly Team

Merging-Vul_06-2432AMOScmp CABOG Newbler

(dn;454)Newbler (ref;454)

Newbler ref ill

Ray (454) Ray(Ill) Ray (hybrid)

SOAPdn Velvet

AMOScmp164.00 234.69 6.35 4.69 63.51 55.13 64.51 44.38 67.22

CABOG 164.00 225.12 101.30 62.66 73.23 93.88 98.11 75.98 113.08

Newbler (dn;454) 234.69 221.89 5.48 ND 311.98 ND 419.76 104.46 127.01

Newbler (ref;454) 6.35 99.30 5.48 1.44 67.72 64.99 72.79 35.07 72.34

Newbler (ref;Illumina) 4.69 62.66 ND 1.44 35.28 ND ND ND ND

Ray (454)63.50 72.56 311.99 67.72 35.28 33.81 49.94 22.92 37.68

Ray (Illumina) 55.13 93.88 ND 64.99 ND 33.81 ND ND ND

Ray (hybrid)64.51 97.17 419.76 72.79 ND 49.94 ND ND ND

SOAPdn44.38 75.98 104.46 35.07 ND 22.92 ND ND ND

Velvet67.22 113.08 127.01 72.34 ND 37.68 ND ND ND

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 25: Final Results Genome Assembly Team

Merging-Nav_2541-90AMOScmp Cabog Newblerdn Newbler

(ref;454)Newbler (ref;Illumina)

Ray (454) Ray (Illumina)

Ray (hybrid)

SOAPdn Velvet

AMOScmp133.95 ND 0.03 0.03 15.26 14.00 15.77 11.23 45.32

Cabog133.95 ND 107.60 114.60 82.62 92.44 92.53 80.73 123.02

NewblerdnND ND ND ND 54.21 59.81 60.47 33.17 94.89

Newbler (ref;454) 0.03 107.60 59.94 0.11 11.6 11.78 11.86 10.17 39.2

Newbler (ref;Illumina)

0.03 114.60 ND 0.28 12.66 12.15 12.41 9.6 39.60

Ray (454)15.26 82.62 54.21 11.60 12.66 59.19 76.36 13.65 63.75

Ray (Illumina) 14.01 92.44 59.81 11.78 12.15 33.79 24.21 11.54 39.84

Ray (hybrid)15.77 92.53 60.47 11.86 12.41 40.33 36.79 14.06 ND

SOAPdenovo 11.22 80.73 33.17 10.04 9.54 13.61 11.40 13.91 8.47

Velvet45.32 123.02 94.89 39.20 39.84 64.54 39.84 ND 8.31

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 26: Final Results Genome Assembly Team

Assembler ReviewAssembler Status 454 Illumina Hybrid Algorithm

Allpaths LG Paired-end only DBG

AMOScmp BB

CABOG OLC

MIRA ZEBRA

Newbler OLC

Ray DBG

SOAPdenovo DBG

SUTTA Unresolved errors BB

Velvet DBG

BB = branch-and-bound; OLC = overlap consensus; DBG = de Bruijn Graph; ZEBRA

Mira worked as good as our merged contigs but it is impractical – 40hr run time

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 27: Final Results Genome Assembly Team

454 raw reads

PRE-PROCESSING

Illumina raw reads

Pre-processing

454 reads Illumina reads

Statistical analysis

Read stats

FastqcPrinseq

Hybrid DeNovo • Ray• Mira

Illumina/ 454/ Hybrid DeNovo assembly

454 DeNovo• Newbler• CABOG

Illumina DeNovo• Velvet

contigs

Align illumina reads against 454 contigs

contigs

DENOVO ASSEMBLY

CONTIG MERGING

Merge Ray –hyb/ Newbler Merge CABOG/Velvet

MIRA-hyb

Mimimus

Draft genome

Process

454

Illumina

Info.

Assemblers

Assemblers

Illumina

454

LEGEND

hybrid

Final Pipeline

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 28: Final Results Genome Assembly Team

Splinter

Pipeline 1 Pipeline 2

NUM AVG N50Assembly Size

Assembly Score

Nav_2423-01 106 42657.2 156064 4.52 136.53Nav_08-2462 149 25736.8 51230 3.83 19.48Nav_2541-90 166 26172.5 130386 4.34 62.57Nav_2756-81 107 42939.4 131591 4.59 122.31Vul_2009v-1368 83 57787.2 401973 4.80 345.03Vul_06-2432 57 85122.7 322525 4.85 419.76Vul_08-2435 111 42872.9 230373 4.76 144.01Vul_08-2439 98 50885.7 250789 4.99 210.94Vul_07-2444 70 73255.1 492706 5.13 656.10

NUM AVG N50Assembly Size

Assembly Score

Nav_2423-01 125 35357.0 164305 4.42 111.36Nav_08-2462 451 311.9 2253 0.14 0.09Nav_2541-90 106 40547.5 169781 4.30 123.02Nav_2756-81 111 41840.8 132119 4.64 124.55Vul_2009v-1368 97 49705.8 228408 4.82 170.81Vul_06-2432 167 28489.7 78353 4.76 32.53Vul_08-2435 193 24903.7 204178 4.85 75.19Vul_08-2439 114 44047.9 180889 5.02 134.64Vul_07-2444 143 35905.1 130942 5.13 85.93

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 29: Final Results Genome Assembly Team

Visualization

Merged

Newbler Ray Hybrid

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Page 30: Final Results Genome Assembly Team

Demo