39
Long read sequencing Torsten Seemann VLSCI LSCC Lab Talk - Melbourne, AU - Fri 5 June 2015 The good, the bad, and the really cool.

Long read sequencing - LSCC lab talk - fri 5 june 2015

Embed Size (px)

Citation preview

Page 1: Long read sequencing - LSCC lab talk - fri 5 june 2015

Long read sequencing

Torsten Seemann

VLSCI LSCC Lab Talk - Melbourne, AU - Fri 5 June 2015

The good, the bad, and the really cool.

Page 2: Long read sequencing - LSCC lab talk - fri 5 june 2015

Why do we need long reads?

Page 3: Long read sequencing - LSCC lab talk - fri 5 june 2015

Repeats!

Page 4: Long read sequencing - LSCC lab talk - fri 5 june 2015

Long reads untangle graphs

Page 5: Long read sequencing - LSCC lab talk - fri 5 june 2015

Completed genomes

Page 6: Long read sequencing - LSCC lab talk - fri 5 june 2015

Phased haplotypes

Page 7: Long read sequencing - LSCC lab talk - fri 5 june 2015

Structural variationThe missing heritability - not just SNPs & indels

Page 8: Long read sequencing - LSCC lab talk - fri 5 june 2015

Long read instruments

Page 9: Long read sequencing - LSCC lab talk - fri 5 june 2015

Pacific Biosciences RSII

2015 ARC LIEFw/ Tim Stinear

Installed this week.

Passed testing!

Page 10: Long read sequencing - LSCC lab talk - fri 5 june 2015

Oxford Nanopore MinION MkI

Successor to Mk0

MinION Access Program Round 2

The up & comer!

Page 11: Long read sequencing - LSCC lab talk - fri 5 june 2015

PacBio It’s already here and it works.

Page 12: Long read sequencing - LSCC lab talk - fri 5 june 2015

PacBio - the device∷ It’s big!

∷ Three chunks: compute (left): robotics (top): sequencing (bottom)

∷ A cushion of N2 gas

Page 13: Long read sequencing - LSCC lab talk - fri 5 june 2015

PacBio - technology∷ Polymerase bound to

bottom of ZMW μ-well

∷ Fluorescent nucleotide incorporation measured in real time

∷ 3 hour “movies”

Page 14: Long read sequencing - LSCC lab talk - fri 5 june 2015

PacBio: read lengths

Needs careful library prep to ensure DNA is

not overly fragmented!

Page 15: Long read sequencing - LSCC lab talk - fri 5 june 2015

PacBio: error rate

Single read: 86% 30x Consensus: 99.999%

Page 16: Long read sequencing - LSCC lab talk - fri 5 june 2015

PacBio: main applications

∷ Finished microbial genomes

∷ Full length cDNA (mRNA isoforms)

∷ Extreme GC sequence

∷ HLA / MHC / KIR haplotyping

∷ Base modifications (methylation)

Page 17: Long read sequencing - LSCC lab talk - fri 5 june 2015

PacBio: bioinformatics

∷ All in GitHub∷ SMRT Portal

: Nice GUI: Cloud ready: Linux backend: Cluster ready

∷ Cmdline too!

Page 18: Long read sequencing - LSCC lab talk - fri 5 june 2015

Oxford NanoporeThe new kid on the block.

Page 19: Long read sequencing - LSCC lab talk - fri 5 june 2015

MinION - the device

Page 20: Long read sequencing - LSCC lab talk - fri 5 june 2015

PromethION - large scale

∷ 48 separate

flow cells

∷ On board ASIC

∷ Runs Python

Page 21: Long read sequencing - LSCC lab talk - fri 5 june 2015

Nanopore - technology

Page 22: Long read sequencing - LSCC lab talk - fri 5 june 2015

Nanopore - types of reads“1D reads”

∷ Template 1D﹕ only fwd stran

∷ Complement 1D﹕ only rev strand

“2D reads”

∷ Normal 2D﹕ mostly fwd, some rev

∷ Full 2D﹕ most of fwd & rev﹕ these are high quality

Page 23: Long read sequencing - LSCC lab talk - fri 5 june 2015

Nanopore - read lengths

Read length is not limited by technology but by library preparation.

Can get >100kbp reads.

Read length

Page 24: Long read sequencing - LSCC lab talk - fri 5 june 2015

Nanopore - error rate

∷ 5-mer errors∷ Not modelling

base mods yet∷ Basically

where PacBio was a few years ago!

Percent identity (aligned)

Page 25: Long read sequencing - LSCC lab talk - fri 5 june 2015

MinION - applications

∷ Same as PacBio plus....

∷ Portable sequencing: in the field eg. Josh Quick in Guinea for Ebola: in hospitals - infection control: monitoring - water/food supply, production facilities: at the GP - pathogen test in 10 min from blood prick?: spit in a home device every morning?

Page 26: Long read sequencing - LSCC lab talk - fri 5 june 2015

MinION - bioinformatics

∷ Event space -vs- base space: MinION MkI - base calling in cloud (Metrichor): MinION MkII - on device?: PromethION - can choose on-device add-on

∷ Mostly 3rd-party tools - lots of activity: poretools, poRe : minoTour, nanoPolish

Page 27: Long read sequencing - LSCC lab talk - fri 5 june 2015

Disruptive technologyJust another sequencer?

Page 28: Long read sequencing - LSCC lab talk - fri 5 june 2015

“Run until” Dynamically adjust sequencing yield

Page 29: Long read sequencing - LSCC lab talk - fri 5 june 2015

“Read until”

∷ Can access events/bases during reading: remember reads are long 40 kbp: examine first 100 bp say: can decide to stop reading and eject molecule!

∷ This is a killer app!: only want pathogens? eject if human DNA: only want exome? eject if not exonic looking: controlled with Python code

Page 30: Long read sequencing - LSCC lab talk - fri 5 june 2015

VolTRAX - library prep

Page 31: Long read sequencing - LSCC lab talk - fri 5 june 2015

A new business model

∷ No capital or reagent costs: Instrument will be free: Flow cells will be free: Only pay for what you want to sequence: Min. $20 and ~$1000 for a 100x human genome

∷ But I’ll scam the system!: Flowcell stats sent back to base: Won’t send you new flow cells if they look unused

Page 32: Long read sequencing - LSCC lab talk - fri 5 june 2015

How will our job change?

Page 33: Long read sequencing - LSCC lab talk - fri 5 june 2015

Some things never change

∷ Don’t worry!: 50% of our job will always be converting file formats ☺

∷ But things are improving: Pacbio: HDF5: MinION: HDF5 / FAST5

∷ Can convert .h5/.hd5 to .fastq easily

Page 34: Long read sequencing - LSCC lab talk - fri 5 june 2015

Read alignment

∷ PacBio: BLASR - Basic Local Alignment + Successive Refinement: BWA MEM - bwa mem -x pacbio

∷ MinION: MarginAlign - sum over possible alignments, HMMs: BWA MEM - bwa mem -x ont

∷ Need to modify variant caller parameters

Page 35: Long read sequencing - LSCC lab talk - fri 5 june 2015

De novo assembly

∷ Pacbio: HGAP, HGAP2, Falcon, Spades, Celera Assembler

∷ MinION: Spades, Celera Assembler, NanoPolish

∷ Lots of convergence: Similar error models (indels): Long reads, lower coverage - back to the future!

Page 36: Long read sequencing - LSCC lab talk - fri 5 june 2015

Streaming analysis

∷ We are not going to keep all this data

∷ Extract info we need and discard

∷ Cheaper to resequence?

∷ Need to think streaming analyses

∷ Lots of new applications

Page 37: Long read sequencing - LSCC lab talk - fri 5 june 2015

Conclusion

Page 38: Long read sequencing - LSCC lab talk - fri 5 june 2015

Exciting times!

∷ Genomics is changing all the time: new technologies: changing attributes/properties of current technology

∷ Bioinformaticians need to be able to adapt: focus on key skills not specific apps

∷ Pipelines are often short lived: except maybe clinical / accredited ones

Page 39: Long read sequencing - LSCC lab talk - fri 5 june 2015

Contact

∷ tseemann.github.io

[email protected]

∷ @torstenseemann