19
Error Correction and Assembly of Oxford Nanopore Sequencing James Gurtowski

Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

Embed Size (px)

Citation preview

Page 1: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

Error Correction and Assembly of Oxford Nanopore Sequencing

James Gurtowski

Page 2: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

Assembly Complexity

A" R"

B"

C"

A" R" B" R" C" R"

Page 3: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

Assembly Complexity

A" R" B" C"

A" R" B" R" C" R"

R" R"

A" R" B" R" C" R"

The advantages of SMRT sequencing Roberts, RJ, Carneiro, MO, Schatz, MC (2013) Genome Biology. 14:405

Page 4: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

Oxford Nanopore MinION •  Thumb drive sized sequencer

powered over USB

•  Senses DNA by measuring changes to ion flow

•  Reads both DNA Strands (2D)

Page 5: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

Nanopore Basecalling

Basecalling currently performed at Amazon with frequent updates to algorithm

Page 6: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

Our Data - Yeast W303 Best: 446Mb

Mean Read Length: ~6kb (10kb shear)

New Flowcells 7k Avg Read Length

0

50

100

150

200

250

300

350

400

450

500

0

2000

4000

6000

8000

10000

12000

14000

16000

Yiel

d (M

egab

ases

)

Read

Len

gth

(bp)

Date

Oxford Flowcell Yields

Flowcell YieldRead Length

R6 R7 R7.3

Page 7: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

Nanopore Alignments

Alignment Statistics (BLASTN) Mean alignment length at ~7kbp

Shearing targeted 10kbp 255k reads align (64%)

174x coverage

7kb mean alignment length

68% mean identity

Page 8: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

0

2000

4000

6000

8000

10000

12000

14000

40 50 60 70 80 90 100

Freq

uenc

y

Percent Identity

templatecomplement

two-directions

Nanopore Accuracy Alignment Quality (BLASTN) Of reads that align, average ~65% identity “2D base-calling” improves to ~77% identity

65% AVG ID

77% AVG ID

Page 9: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

ONT Read Alignments

0 20 40 60 80 100 120 140 160Read Length (kb)

10

20

30

40

50

60

70

80

90

100

Perc

enta

ge o

f Rea

d Al

igne

d

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Perc

enta

ge o

f Rea

ds in

Len

gth

Bin

Full Length Alignments

Partial Alignments

Unaligned

Nanopore Alignment Summary 64% of the data map using BLASTN

Page 10: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

Long Read Correction Algorithms PacBioToCA & ECTools

Hybrid Error Correction

Koren, Schatz, et al (2012) Nature Biotechnology. 30:693–700

HGAP & Quiver

LR-only Correction & Polishing

Chin et al (2013) Nature Methods. 10:563–569

PBJelly

Gap Filling and Assembly Upgrade

English et al (2012) PLOS One. 7(11): e47768

< 5x > 50x Long Read Coverage

Page 11: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

NanoCorr: Nanopore-Illumina Hybrid Error Correction

1.  BLAST Miseq reads to all raw Oxford Nanopore reads

2.  Select non-repetitive alignments

○  First pass scans to remove “contained” alignments

○  Second pass uses Dynamic Programming (LIS) to select set of high-identity alignments with minimal overlaps

3.  Compute consensus of each Oxford

Nanopore read ○  Currently using Pacbio’s pbdagcon

https://github.com/jgurtowski/nanocorr

Page 12: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

40 50 60 70 80 90 100

Freq

uenc

y

Percent Identity

UncorrectedCorrected

Mean: 97%

Mean: 68%

Post Correction Identity

Page 13: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

ONT vs Illumina Assembly

Oxford N50 : 585kb

Illumina N50 : 58kb

Page 14: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

ONT Assembly Completeness

Page 15: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

ONT Assembly Completeness

1

10

100

1000

10000

CDS (1282 bp)

gene (1344 bp)

rRNA (1393 bp)

gene cassette (2951 bp)transposable elem

ent (3201 bp)telom

ere (4396 bp)

LTR retrotransposon (5836 bp)

Freq

uenc

y

Genomic Features (Average Length in bp)

S288CNanopore

Miseq

Page 16: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

Long Read Assembly S288C Reference sequence •  12.1Mbp; 16 chromo + mitochondria; N50: 924kbp

Oxford Nanopore 28x corrected reads > 7kb NanoCorr + Celera Assembler •  95 non-redundant contigs •  N50: 585kbp >99.78% id

Illumina MiSeq 30x, 300bp PE (Flashed) Celera Assembler •  6953 non-redundant contigs •  N50: 59kb >99.9% id

Pacific Biosciences 25x corrected reads > 10kb HGAP + Celera Assembler •  21 non-redundant contigs •  N50: 811kb >99.8% id

Page 17: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

E. Coli K12 Single Contig Assembly with MinION

Sequencing Data From: A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer Joshua Quick, Aaron R Quinlan and Nicholas J Loman

Single Contig Assembly 99.99% Identity (Pilon polishing)

0

5000

10000

15000

20000

25000

30000

35000

40 50 60 70 80 90 100

Num

ber o

f Rea

ds (F

requ

ency

)

Percent Identity

E. coli Error Correction with Nanocorr

UncorrectedCorrected

Nanocor Correction Results 145x Oxford Nanopore X 35x MiSeq

Page 18: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

Future of Oxford Nanopore

Page 19: Error Correction and Assembly of Oxford Nanopore …schatzlab.cshl.edu/presentations/2015/2015.02.28.AGBT Nanocorr... · Post Correction Identity . ONT vs Illumina Assembly Oxford

Acknowledgements

Michael Schatz

Dick McCombie

Sara Goodwin

Schatz Lab

Oxford Nanopore Sequencing and de novo Assembly of a Eukaryotic Genome Sara Goodwin , James Gurtowski , Scott Ethe-Sayers , Panchajanya Deshpande , Michael Schatz , W Richard McCombie doi: http://dx.doi.org/10.1101/013490