37
Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, Ithaca, NY [email protected] // Twitter:@SahaSurya BTI Plant Bioinformatics Course 2014 http://www.acgt.me/blog/2014/3/7/next-generation-sequencing-must-die

Next Generation Sequencing

Embed Size (px)

DESCRIPTION

This was presented on Mar 11, 2014 at Boyce Thompson Institute, Ithaca, NY at the 3rd BTI Bioinformatics Course http://btiplantbioinfocourse.wordpress.com/

Citation preview

Page 2: Next Generation Sequencing

19

53

DNA Structure discovery

19

77

20

12

Sanger DNA sequencing by chain-terminating inhibitors

19

84

Epstein-Barr virus

(170 Kb)

19

87

Abi370

Sequencer

19

95

20

01

Homo sapiens (3.0 Gb)

20

05

454

Solexa

Solid

20

07

20

11

Ion Torrent

PacBio

Haemophilus influenzae (1.83 Mb)

20

13

Slide credit: Aureliano Bombarely

Sequencing over the Ages

Illumina

Illumina Hiseq X

454

3/12/2014 BTI Plant Bioinformatics Course 2014 2

Pinus taeda

(24 Gb)

Page 3: Next Generation Sequencing

First generation sequencing

3/12/2014 BTI Plant Bioinformatics Course 2014 3

Page 4: Next Generation Sequencing

Sanger method

3/12/2014 BTI Plant Bioinformatics Course 2014 4

Frederick Sanger 13 Aug 1918 – 19 Nov 2013 Won the Nobel Prize for Chemistry in 1958 and 1980. Published the dideoxy chain termination method or “Sanger method” in 1977

http://dailym.ai/1f1XeTB

Page 5: Next Generation Sequencing

Sanger method

3/12/2014 BTI Plant Bioinformatics Course 2014 5

http://bit.ly/1g6Cudq

http://bit.ly/1lcQO4J

Page 6: Next Generation Sequencing

Maxam-Gilbert method

3/12/2014 BTI Plant Bioinformatics Course 2014 6

Page 7: Next Generation Sequencing

Maxam-Gilbert method

3/12/2014 BTI Plant Bioinformatics Course 2014 7

http://bit.ly/1noY0fu http://bit.ly/1lGvJCA

Page 8: Next Generation Sequencing

First generation sequencing

• Very high quality sequences (99.999%)

• Very low throughput

3/12/2014 BTI Plant Bioinformatics Course 2014 8

Run Time Read Length Reads / Run

Total

nucleotides

sequenced

Cost / MB

Capillary

Sequencing

(ABI3730xl)

20m-3h 400-900 bp 96 or 386 1.9-84 Kb $2400

http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd

Page 9: Next Generation Sequencing

Next generation sequencing

3/12/2014 BTI Plant Bioinformatics Course 2014 9

Page 12: Next Generation Sequencing

454 Pyrosequencing

One purified DNA fragment, to one bead, to one read.

3/12/2014 BTI Plant Bioinformatics Course 2014 12

http://bit.ly/1ehwxWN

GS FLX Titanium

http://bit.ly/1ehAcEh

Page 13: Next Generation Sequencing

Illumina

3/12/2014 BTI Plant Bioinformatics Course 2014 13

Output 15 Gb 120 GB 1000 GB 1800 GB

Number of Reads

25 Million 400 Million 4 Billion 6 Billion

Read Length

2x300 bp 2x150 bp 2x125 bp (2x250 update mid-2014)

2x150 bp

Cost $99K $250K $740K $10M

Source: Illumina

Page 14: Next Generation Sequencing

Illumina

3/12/2014 BTI Plant Bioinformatics Course 2014 14

Output 15 Gb 120 GB 1000 GB 1800 GB

Number of Reads

25 Million 400 Million 4 Billion 6 Billion

Read Length

2x300 bp 2x150 bp 2x125 bp (2x250 update mid-2014)

2x150 bp

Cost $99K $250K $740K $10M

Source: Illumina

$1000 human genome??

Page 15: Next Generation Sequencing

Illu

min

a

3/12/2014 BTI Plant Bioinformatics Course 2014 15 http://1.usa.gov/1fP9ybl

Page 17: Next Generation Sequencing

Pacific Biosciences SMRT sequencing

Single Molecule Real Time sequencing

3/12/2014 BTI Plant Bioinformatics Course 2014 17

http://bit.ly/1naxgTe

Page 18: Next Generation Sequencing

Pacific Biosciences SMRT sequencing Error correction methods

3/12/2014 BTI Plant Bioinformatics Course 2014 18

Hierarchical genome-assembly process (HGAP)

PB

Jelly

Enlish et al., PLOS One. 2012

PBJelly

Page 19: Next Generation Sequencing

3/12/2014 BTI Plant Bioinformatics Course 2014 19

Pacific Biosciences SMRT sequencing Read Lengths

http://www.igs.umaryland.edu/labs/grc/

Mean Read Length: 8391 bp Maximum Subread Length: 24585 bp

Page 20: Next Generation Sequencing

Others

• Ion Torrent Proton/PGM

• Oxford Nanopore

• Nabsys

• SOLiD

3/12/2014 BTI Plant Bioinformatics Course 2014 20

Page 21: Next Generation Sequencing

Comparison

3/12/2014 BTI Plant Bioinformatics Course 2014 21

Page 22: Next Generation Sequencing

Next generation sequencing

3/12/2014 BTI Plant Bioinformatics Course 2014 22

Run Time Read Length Quality

Total

nucleotides

sequenced

Cost /MB

454

Pyrosequencing 24h 700 bp Q20-Q30 0.7 GB $10

Illumina Miseq 27h 2x250bp > Q30 15 GB $0.15

Illumina Hiseq

2500 11days 2x125bp >Q30 1000 GB $0.05

Ion torrent 2h 400bp >Q20 50MB-1GB $1

Pacific

Biosciences 2h 5.5-8.5kb

>Q30 consensus

>Q10 single

400-800MB

/SMRT cell $0.33-$1

http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd

Page 23: Next Generation Sequencing

Summary

• Microbial genomes

• Eukaryotic genomes

• Resequencing genomes

• RNAseq and other XXXseq methods

3/12/2014 BTI Plant Bioinformatics Course 2014 23

http://bit.ly/1ko9Kgh

Page 24: Next Generation Sequencing

http://omicsmaps.com/

Next Generation Genomics: World Map of High-throughput Sequencers

BTI Plant Bioinformatics Course 2014 3/12/2014 24

Page 25: Next Generation Sequencing

3/12/2014 BTI Plant Bioinformatics Course 2014 25

http://bit.ly/18pfUId

Page 26: Next Generation Sequencing

3/12/2014 BTI Plant Bioinformatics Course 2014 26

http://bit.ly/18pfUId

Page 27: Next Generation Sequencing

Real cost of Sequencing!!

Sboner, Genome Biology, 2011

BTI Plant Bioinformatics Course 2014 3/12/2014 27

Page 28: Next Generation Sequencing

Library Types

Single end

Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2)

Mate pair (MP, 2Kb to 20 Kb)

3/12/2014 BTI Plant Bioinformatics Course 2014 28

F

F R

F R 454/Roche

F R Illumina

Illumina

Slide credit: Aureliano Bombarely

Page 29: Next Generation Sequencing

Implications of Choice of Library

3/12/2014 BTI Plant Bioinformatics Course 2014 29 Slide credit: Aureliano Bombarely

Consensus sequence

(Contig)

Reads

Scaffold

(or Supercontig)

Pair Read information

NNNNN

Pseudomolecule

(or ultracontig)

F

Genetic information (markers)

NNNNN NN

Page 30: Next Generation Sequencing

Multiplexing Libraries

Use of different tags (4-6 nucleotides) to identify different samples in the same lane/sector.

3/12/2014 BTI Plant Bioinformatics Course 2014 30 Slide credit: Aureliano Bombarely

AGTCGT

TGAGCA

AGTCGT AGTCGT

AGTCGT AGTCGT

TGAGCA TGAGCA

TGAGCA TGAGCA

AGTCGT

AGTCGT

AGTCGT

AGTCGT

TGAGCA TGAGCA

TGAGCA

TGAGCA

Sequencing

Page 31: Next Generation Sequencing

Fasta files:

It is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes.

-Wikipedia

File Formats

3/12/2014 BTI Plant Bioinformatics Course 2014 31 Slide credit: Aureliano Bombarely

Page 32: Next Generation Sequencing

Fastq files:

FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores.

-Wikipedia

• Single line ID with at symbol (“@”) in the first column.

• Sequences can be in multiple lines after the ID line

• Single line with plus symbol (“+”) in the first column to represent the quality line.

• Quality ID line may contain ID

• Quality values are in multiple lines after the + line but length should be identical to sequence

3/12/2014 BTI Plant Bioinformatics Course 2014 32 Slide credit: Aureliano Bombarely

File Formats

Page 33: Next Generation Sequencing

3/12/2014 BTI Plant Bioinformatics Course 2014 33

Quality control: Encoding Fastq files:

!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)

KLMNOPQRSTUVWXYZ[\]^_`abcdefgh Offset by 64 (Phred+64)

Page 34: Next Generation Sequencing

Quality control: Encoding

3/12/2014 BTI Plant Bioinformatics Course 2014 34

!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)

KLMNOPQRSTUVWXYZ[\]^_`abcdefgh Offset by 64 (Phred+64)

Page 35: Next Generation Sequencing

3/12/2014 BTI Plant Bioinformatics Course 2014 35

Quality control: Encoding

http://bit.ly/N28yUd

Phred score of a base is: Qphred = -10 log10 (e)

where e is the estimated probability of a base being wrong

Page 36: Next Generation Sequencing

Quality control: Error correction

3/12/2014 BTI Plant Bioinformatics Course 2014 36

Page 37: Next Generation Sequencing

Thank you!!

3/12/2014 BTI Plant Bioinformatics Course 2014 37