23
Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing IO520 Bioinformatics Jim Lund Assigned reading: Service 2006 review paper Assigned listening: Ecic Lander genomics lecture

Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Embed Size (px)

Citation preview

Page 1: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Genome Characterization

DNA sequence-ULTIMATE Map

DNA sequencing-methods

Assembly/sequencing

BIO520 Bioinformatics Jim Lund

Assigned reading: Service 2006 review paperAssigned listening: Ecic Lander genomics lecture

Page 2: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

DNA Sequence Project Size/Type

• 500 bases

• 2500 bases

• 10 kbp

• 150 kbp

• 3 Mbp– simple

– repeats

• 3 Gbp

• 31 Gbp

• 1 EST,STS

• whole cDNA/EST

• Gene, virus

• BAC, big virus

• Bacterial genome, YAC-size

• Human, mouse

• Salamander

Page 3: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Metazoan genome sizes

Nematode (Caenorhabditis elegans): 100 Mb

Thale cress (Arabidopsis thaliana): 160 Mb

Fruit fly (Drosophila melanogaster): 180 Mb

Puffer fish (Takifugu rubripes): 400 Mb

Rice (Oryza sativa): 490 Mb

Human (Homo sapiens): 3.5 Gb

Leopard frog (Rana pipiens): 6.5 Gb

Onion (Allium cepa): 16.4 Gb

Mountain grasshopper(Podisma pedestris): 16.5 Gb

Tiger salamander (Ambystoma tigrinum): 31 Gb

Easter lily (Lilium longiflorum): 34 Gb

Marbled lungfish (Protopterus aethiopicus): 130 Gb

Page 4: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

DNA Sequencing Methods

• Chain termination/Dideoxy/Sanger– Fluorescence paradigm, ABIABI– Main method

• Next generation sequencing– Polymerase addition sequencing– 454 Sequencing, Illumina– Chips: AffymetrixAffymetrix

Page 5: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Dideoxy / Chain Terminator / Sanger

• Template• Primer• Extension Chemistry

– polymerase

– termination

– labeling

• Separation• Detection

Page 6: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Chain Terminator Basics

TargetTemplate-Primer

ExtendddA

ddG

ddC

ddTLabeled Terminators

ddA

AddC

ACddG

ACG ddT

TGCA

dN : ddN100 : 1

Page 7: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Electrophoresis

Sequencing Reaction products

Polyacrylamide Gel Electrophoresis(PAGE)

Page 8: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

DNA sequencing trace file

Page 9: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Separation

• Gel Electrophoresis• Capillary Electrophoresis

– suited to automation• rapid (2 hrs vs 12 hrs)

• re-usable

• simple temperature control

• 96 well format

500-mer 30 cm7198 seconds

501-mer 29.99 cm7200 seconds

Page 10: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Paradigm Instrument

• Applied Biosystems• http://www.appliedbiosystems.com/

– ABI3730XL (2002, 96 samples, 1000 base reads, ~$350,000, higher sensitivity, lower reagent cost, ~$1/reaction)

– 700 Kbp / 24 hours.

• 384 capillary sequencers– 5700 sequences / 24 hr day– 2.8 Mbp / 24 hours.

Page 11: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

384-well capillary sequencing

Results are shown as an electropherogram showing a peak for each base. From the peakheights and widths, a Phred score is assigned to each individual base. A high Phredscore indicates a high certainty as to the identity of that particular base.

Page 12: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Sample Output

1 lane

Page 13: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

• 1 trace=1000 bases or less– ABI: 1000 bp reads

– Illumina: 50-100 bp reads

– 454 Sequencing: 300-400 bp reads

• How do we cover a genome?– DIVIDE AND CONQUER: assemble

these short sequence fragments.

Page 14: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Assembly/Trace Editing

• Consed – UNIX

• EBI’s Phusion

• EditView (ABI PRISM)– Mac

• Chromas (free/pay versions)– Windows

Page 15: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Sequencing Strategies

• Ordered– Divide and Conquer

• Random Sequence– Brute Force

The random approach now predominates for big projects

Page 16: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Random Method (details for Sanger seq)

• Shear DNA (nebulize)– finish ends, ligate into vector

• Produce template• Sequence to 8X – 10X coverage

– Sequence both ends of templates.

– Read length (1,000bp typical)

– Accuracy (99% good)

Page 17: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Assembly Problem

CONTIG

Page 18: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Contigs, Islands

contigs

Island

Page 19: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Assembling random sequences

No coverage

Only 1 strand

DISAGREEMENT

T

T

C

Page 20: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Assembly programs

•Celera Assembler (Eugene Myers et al.)

•Arachne (Serafim Batzoglou et al.)

•PCAP (Xiaoqiu Huang, Iowa State University)

•Phusion (EBI)

Page 21: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

Continuing rapid improvement in sequencing technology

Page 22: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

•1990’s: Human genome 3Gbps, $300 million (just sequencing)•Current: Mammalian genome (3 Gbps): $1 million•Goal: $100,000 genome, 10X cheaper (and faster) likely 2012!

•New goal! $1,000 genome. UK’s sequencing center has one:http://www.uky.edu/Centers/AGTC/

Page 23: Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006

454 Sequencing’s Genome Sequencer FLX

• Pyrosequencing (sequencing by detection of nucleotides added during DNA synthesis.

• 350-400 million bases per run (10 hrs.).• 400 bp sequence reads.• 1,000,000 reads per run.• $6,600 per run, 60kb/$1, or $0.00165/bp.