12

SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer

Embed Size (px)

Citation preview

Page 1: SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer
Page 2: SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer

Whol e Genome S hot gun S equencingWhol e Genome S hot gun S equencing

Page 3: SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer

SIZE SELECTSIZE SELECT

SHEARSHEAR

Shotgun DNA Sequencing (Technology)Shotgun DNA Sequencing (Technology)

DNA target sampleDNA target sample

LIGATE & LIGATE & CLONECLONE

VectorVector

End Reads (Mates)End Reads (Mates)

SEQUENCESEQUENCE

PrimerPrimer

Page 4: SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer

Shotgun DNA Sequencing Shotgun DNA Sequencing (Computation)(Computation) Unknown Unknown ““TargetTarget”” DNA Sequence DNA Sequence

LayoutLayout

ConsensusConsensus

Fragment Assembly Softw are

ContigsContigs

FragmentsFragments

Randomly Sample Randomly Sample ((““ShotgunShotgun””) Fragments) Fragments

UNKNOWN ORIENTATIONUNKNOWN ORIENTATION SEQUENCING ERRORS SEQUENCING ERRORS INCOMPLETE COVERAGEINCOMPLETE COVERAGE CONSTRAINTS (MATES) CONSTRAINTS (MATES) REPEATSREPEATS

G = 100KbpG = 100Kbp Target LengthTarget Length (e.g., BAC, P1, PAC)(e.g., BAC, P1, PAC)

F = 1600F = 1600 # of Fragments# of FragmentsL = 500L = 500 Avg. Fragment LengthAvg. Fragment LengthN = FL = 800KbpN = FL = 800Kbp Total Bases SequencedTotal Bases Sequencedc = N/G = 8c = N/G = 8 Avg. CoverageAvg. Coverage

Page 5: SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer

Physical MappingPhysical Mapping

Whole Genome Sequencing ApproachesWhole Genome Sequencing Approaches Hierarchical HGP ApproachHierarchical HGP Approach::

TargetTarget

– – 2 separate processes2 separate processes– – maps very hard to complete, libraries unstablemaps very hard to complete, libraries unstable– – must make shotgun library of each BACmust make shotgun library of each BAC+ infrastructure is already developed+ infrastructure is already developed+ quality of outcome is known+ quality of outcome is known

Minimum Minimum Tiling SetTiling Set

(~(~30,000 BACs30,000 BACs for human)for human)Shotgun SequencingShotgun Sequencing

Page 6: SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer

– Early simulations showed that if repeats were considered black boxes, Early simulations showed that if repeats were considered black boxes, one could still cover 99.7% of the genome unambiguously.one could still cover 99.7% of the genome unambiguously.

BAC 5BAC 5’’ BAC 3BAC 3’’

– Collect Collect 10-10-15x BAC15x BAC inserts and end sequence: inserts and end sequence: ~~ 300K300K pairs for Human. pairs for Human.

– Collect Collect 10x10x sequence in a sequence in a 1 -11 -1 ratio of two types of read pairs: ratio of two types of read pairs: ~~ 70million70million reads reads for Human.for Human.

ShortShort LongLong

2Kbp2Kbp 10Kbp10Kbp

Whole Genome Shotgun SequencingWhole Genome Shotgun Sequencing::

Whole Genome Sequencing ApproachesWhole Genome Sequencing Approaches

+ single process, three library constructions+ single process, three library constructions– – assembly is much more difficultassembly is much more difficult

Page 7: SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer

Sequencing FactorySequencing Factory

300 ABI 3700 DNA Sequencers installed300 ABI 3700 DNA Sequencers installed

50 Production Staff50 Production Staff

40 Support Staff (R&D, QC/QA, Service)40 Support Staff (R&D, QC/QA, Service)

20,000 sq. ft. of wet lab20,000 sq. ft. of wet lab

20,000 sq. ft. of sequencing space20,000 sq. ft. of sequencing space

800 tons of A/C (160,000 cfm)800 tons of A/C (160,000 cfm)

4,000 amps electrical service4,000 amps electrical service

Page 8: SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer
Page 9: SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer

True vs. Repeat-Induced OverlapsTrue vs. Repeat-Induced Overlaps

impliesimplies

AA

BB

AA

BB

TRUETRUE

OROR

AA BBREPEAT-REPEAT-INDUCEDINDUCED

Page 10: SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer

Assembly PipelineAssembly PipelineScreenerScreener Mask heterochromatin and ribo-DNA,Mask heterochromatin and ribo-DNA,

Tag known interspersed repeats.Tag known interspersed repeats.

OverlapperOverlapper Find all overlaps Find all overlaps 40bp allowing 6% mismatch. 40bp allowing 6% mismatch. (1000X Blast)(1000X Blast)

UnitigerUnitigerASSEMBLER COREASSEMBLER CORE::• Compute all consistent sub-assemblies = Compute all consistent sub-assemblies = unitigsunitigs• Identify those that cover unique DNA = Identify those that cover unique DNA = U-unitigsU-unitigs• Scaffold U-unitigs with confirmed shorts & longsScaffold U-unitigs with confirmed shorts & longs• Then with BAC endsThen with BAC ends• Fill repeat gaps with:Fill repeat gaps with:

I. Doubly anchored matesI. Doubly anchored mates

ScaffolderScaffolder

Repeat Rez IRepeat Rez I

8:378:37

86:2586:25

38:2938:29

4:124:12

5:44+4:21+19:535:44+4:21+19:53

ConsensusConsensus Bayesian Bayesian ““SNPSNP”” consensus using quality values. consensus using quality values. Occurs throughout assembler core.Occurs throughout assembler core. (~25)(~25)

167:41 cpu hrs. for Dros167:41 cpu hrs. for Dros

Repeat Rez I, II, IIIRepeat Rez I, II, IIIII. O-path confirmed singly-anchored matesII. O-path confirmed singly-anchored matesIII. Greedy path completion using QVsIII. Greedy path completion using QVs

Page 11: SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer

Assembly ProgressionAssembly Progression(Macro View)(Macro View)

Page 12: SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer

Proteomics Discovery (insert Proteomics Discovery (insert browser slide) 3.0browser slide) 3.0

Homology based exon predictions

Consensus genestructure (both strands)

start and stop site

predictions

Splice site predictions

computational exon

predictions

Tracking information

Unique identifiers