24
MGI Reference Genomes Workshop Vince Magrini February 10 th 2016

AGBT 2016 Workshop Magrini

Embed Size (px)

Citation preview

Page 1: AGBT 2016 Workshop Magrini

MGI Reference Genomes Workshop

Vince MagriniFebruary 10th 2016

Page 2: AGBT 2016 Workshop Magrini

Sequencing Plan

• PacBio Large Insert Library Construction• Linked Reads with 10X Genomics• Physical Map contiguity using BioNano IRYS

Page 3: AGBT 2016 Workshop Magrini

Pacific Biosciences

Page 4: AGBT 2016 Workshop Magrini

The NA19240 Large Insert Library Experience

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 7610,000

10,500

11,000

11,500

12,000

12,500

13,000

13,500

14,000

14,500

15,000

ROI Length

Lib4-ROI length Lib2-ROI length Lib3-ROI length Lib5-ROI lengthLib6-ROI length Lib7-ROI length LibF-ROI length Lib8-ROI length

SMRT Cell

ROI L

engt

h (b

p)

Page 5: AGBT 2016 Workshop Magrini

Considerations for PacBio WGS

• High molecular weight genomic DNA• DNA must be of sufficient quality to allow for >30 kb

shearing to produce PacBio Continuous Long Reads (CLR)• Consistent shearing >30 kb

• Shearing genomic DNA >30 kb is challenging and requires a consistent technology

• Preferred method: Diagenode Megaruptor• Alternate method: Covaris g-Tube

• Sufficient DNA for PacBio sample prep• A single PacBio sample prep reaction requires 5 μg sheared

DNA• One library is composed of 8-10 sample prep reactions• At least 2-4 libraries are required for 60x coverage

Page 6: AGBT 2016 Workshop Magrini

NA19240 Sheared DNA Comparison

Library Shear Type Shear Settings2 g-Tube 5500 rpm3 g-Tube 4800 rpm4 g-Tube 4800 rpm5 g-Tube 4500 rpm

6 MegaRuptor Menlo Park 30 kb7 MegaRuptor Menlo Park 30 kb8 MegaRuptor MGI 30 kb

30kb MGI 30kb MP

G-Tube 4800 G-Tube 4500

✜ ✪

Page 7: AGBT 2016 Workshop Magrini

PacBio Workflow

DNA Shear

DNA Repair

Ligation/Exonuclease

BluePippin >18kb Sizing

DNA Repair

AMPure PB

AMPure PB

3x AMPure PB

Rinse wells

AMPure PB

AMPure PB

Seq. Primer Anneal

P6 Polymerase Bind

MagBead Bind

Sequencing

30 minutes or 4 hours

20 minutes to 2 hours

Denature primer prior to use

4 to 6 hour collection time

• Adding DNA Damage Repair after BluePippin sizing increased the average Reads of Insert length by ~1 kb.

• Extending the P6 Polymerase Binding time from 30 minutes to 4 hours improved library complex loading per SMRT cell

Page 8: AGBT 2016 Workshop Magrini

Standard PacBio protocol (sample prep & complex)

Titration

• No Post-BluePippin DNA Damage Repair

• 30 min P6 polymerase bind

6 hourMovies

4 hourMovies

125 pM “on plate” loading concentration

G-Tube 4800✜

Page 9: AGBT 2016 Workshop Magrini

DNA Damage Repair & extended P6 bind

• No Post-BluePippin DNA Damage Repair

• 30 min P6 polymerase bind

• Post-BluePippin DNA Damage Repair

• 4 hour P6 polymerase bind

G-Tube 4500✪

Page 10: AGBT 2016 Workshop Magrini

Menlo Park 30 kb MegaRuptor

Titration

4 hr P6 bind8Pac lot #231848

30 min P6

bind8Pac lot #231848

4 hr P6 bind8Pac lot #

231818

4 hr P6 bind8Pac lot #

231848 4 hr P6 bind8Pac lot #

231818

• Post-BluePippin DNA Damage Repair

• 4 hour or 30 minute P6 polymerase bind

30kb MP

Page 11: AGBT 2016 Workshop Magrini

MGI 30 kb MegaRuptor

Titration 125 pM “on plate” loading concentrationClear cell-to-cell variability

Failed

cell

30kb MGI

Page 12: AGBT 2016 Workshop Magrini

PacBio NA19240 Sequencing Statistics

Sample 8 Packs Reads Mbp (Pol) RL Mbp(ROI) RL ROI Mbp/CellNA19240 37 16,088,050 214,621 13,605 195,619 12,487 661HG00733 30 15,858,313 209,619 13,193 190,430 11,958 793HG00514 40 20,707,629 311,500 13,473 277,690 13,473 868

NA12878* 22 11,029,811 165,153 14,949 146,833 13,174 962

Assembly Stats will be highlighted in Tina’s presentation.

Page 13: AGBT 2016 Workshop Magrini

PacBio Sequencing ObservationsHG00514: 4h v 6h movie lengths

Instrument Movie Time Avg. ROI (bp) ROI Mb/Cell # Cells

00116 240 13,502 803 119

42274 240 13,036 881 95

00116 360 14,324 998 56

42274 360 13,282 1,063 24

• DNA Input and Sizing• The library DNA >18 kb is fractionated using the Sage Science BluePippin• DNA Damage Repair enzyme mix used post BluePippin (increased read

length)• Chemistry

• (+) DNA Damage Repair/4 hr bind: 970.2 Mbases/cell• Instruments

• Longer average Reads • Increased Loading Efficiency

• What about long term storage?

Page 14: AGBT 2016 Workshop Magrini

10X Genomics

Page 15: AGBT 2016 Workshop Magrini

Reconfigured Oligo- Uses inline index sequence- No P5 index – HiSeq X single index compatible

10X Genomics Overview

Page 16: AGBT 2016 Workshop Magrini

10X Chromium Workflow

Page 17: AGBT 2016 Workshop Magrini

• HiSeq 4000• 2x150, 200 pmol

loading• 2 lanes

Chromium NA19240 Library Sequencing Statistics

Post Gem: Isothermal Amp size dist.

Library Size Distribution

The spike at 0 in that graph is due to the N's in the reference assembly.

Page 18: AGBT 2016 Workshop Magrini

NA19240 (MGI)NA12878 (10X)

Molecule Length (kb): 26,768 (±33,673) 94,923 (±64,103)DNA in Molecules > 10kb 50.85 %

95.0%DNA in Molecules > 100kb 1.38% 36.4%SNPs Phased: 99.1%

97.8%Longest Phase Block: 9.6 Mbp

34.7 MbpN50 Phase Block: 1.9 Mbp

9.5 Mbp

Chromium Molecule and Phasing Statistics

Page 19: AGBT 2016 Workshop Magrini

BioNano

Page 20: AGBT 2016 Workshop Magrini

Harvest Cells

Dissociate Tissue

Embed Cells in Gel

Plugs

Lyse Cells,Digest Protein

Melt and Digest

Agarose Plugs

Sample Cleanup

Labeling Reaction

BioNano Overview

Page 21: AGBT 2016 Workshop Magrini

10-500kb 100-500kb 150-500kb 200-500kb 250-500kb >500kb0%

10%

20%

30%

40%

50%

60%

70%

80%

90%100%

192401923919238

19240 19239 19238Mapped Molecule Quantity (Mb) 189,138.79 256,281.33 226,854.88Mapped Avg Size (Kb) 232 280 289Avg Label Density (per 100 Kb) 9.6 8.7 8.8Number of Consensus Genome Maps 3051 2565 2798Consensus Genome Maps Size (Mb) 2833.045 2965.972 2933.294Consensus Genome Maps N50 (Mb) 1.276 1.685 1.477Avg Depth of Mol Coverage 59.1 56.1 50.6

BioNano: Yoruban Trio Statistics

Molecule Length Bin

Mol

ecul

es/B

in (%

)

Page 22: AGBT 2016 Workshop Magrini

PacBio Assembly Contig

BioNano Genome Map Contigs

Sequencing Plan

Add 10X Linked Read information

Add Dovetail Hi-C/chiCago Data

Page 23: AGBT 2016 Workshop Magrini

Summary• Goal: Generate robust data sets for additional high-

quality reference genome enhancing the full range of genetic diversity in humans

• These long read (long range) sequencing/mapping applications vary in approach and will provide synergistic data sets to help accomplish our goal.

• Each system possesses unique challenges and requires optimization of protocols and running conditions specific to our needs.

• Experience and communication is key. • Increasing applications and utility

• Polymerase read = read of insert• BAC Pooling• Low input SNV • Multicolor labeling

Page 24: AGBT 2016 Workshop Magrini

AcknowledgementsThe McDonnell Genome Institute at Washington University in St. Louis

Rick Wilson

Sean McGrathAmy LyRyan Demeter

Dave LarsonKaryn Meltz SteinbergTina GravesBob Fulton

Derek AlbrachtMilinn KremitzkiSusan RockDebbie Scheer

Wes WarrenChad Tomlinson

10X GenomicsCassandra JabaraMichael Schnall-LevinDrew KebbelRob TarboxDeanna Church

BioNano GenomicsAndrew AnforaPalak ShethAlex Hastie

Pacific BiosciencesPaul PelusoNick Sisneros