26
The ABRF Next Generation Sequencing Study: Multi-Platform and Cross- Methodological Reproducibility of RNA and DNA Profiling Genome in a Bottle Consortium Workshop January 2014 Don A. Baldwin, Ph.D. CSO, Pathonomics LLC

140127 abrf interlaboratory study proposal

Embed Size (px)

Citation preview

Page 1: 140127 abrf interlaboratory study proposal

The ABRF Next Generation Sequencing Study:Multi-Platform and Cross-Methodological Reproducibility of RNA and DNA Profiling

Genome in a Bottle Consortium WorkshopJanuary 2014

Don A. Baldwin, Ph.D.CSO, Pathonomics LLC

Page 2: 140127 abrf interlaboratory study proposal

ABRF is an international organization of over 700 scientists from shared research resource core facilities and biotechnology laboratories.

Members represent over 250 core labs in academic and research institutions, government, and industry.

“Yellow pages” and “MarketPlace” databases of members at www.ABRF.org Electronic discussion group facilitates sharing of technical advice and core facility

networking.

The Journal of Biomolecular Techniques covers genomics, proteomics, imaging, and other biotechnologies, and core facility operational management.

Page 3: 140127 abrf interlaboratory study proposal
Page 4: 140127 abrf interlaboratory study proposal
Page 5: 140127 abrf interlaboratory study proposal

www.abrf.org

Page 6: 140127 abrf interlaboratory study proposal

The ABRF Next Generation Sequencing (NGS) Study:

• Produce reference data sets to establish baseline performance• Promote the use of standard samples• Provide public access to data for self-evaluation, performance

monitoring and methods development

Phase I: RNA-Seq and degraded RNA-Seq (2011-2013)Phase II: DNA-Seq and hard-to-sequence regions (2014-2016)Phase III: Clinical genetics sequencing panelsPhase IV: Asteroid and Martian surface sequencing

Phase I1. Cross Platform: HiSeq/MiSeq, 454, PGM, Proton, PacBio2. Cross Protocol: ribo-depletion, stranded, degraded3. Cross Site: 3 sites for each platform, replicates at each site

Page 7: 140127 abrf interlaboratory study proposal

SeQC, ABRF, ENCODE, others

• Provide reference data resources • Best practices for

– gene quantification, – isoform characterization,– dynamic range comparisons, – managing inter-site and intra-site

variation, – analysis pipelines, and – cross-platform testing of

transcriptome hypotheses • To address some other aspects of

RNA-seq, including – variant detection, – allele-specific expression, – RNA editing, and gene fusions.

• And more …

Page 8: 140127 abrf interlaboratory study proposal

Phase I Study Design

Page 9: 140127 abrf interlaboratory study proposal

Sequence mismatches with hg19

Q10 – Q60, most variation at read starts and ends

Higher alignment rates with platform-specific algorithms vs. STAR

Higher single-base mismatch and indel rates with platform-specific algorithms vs. STAR

Page 10: 140127 abrf interlaboratory study proposal

454 ILMN PAC PGM PRO

Gene body coverageRNA:polyA rRNA-depleted total polyA polyA polyA

degraded

degraded

5’

3’

Page 11: 140127 abrf interlaboratory study proposal

Inter-site CV

Inter-site R2

Variation and correlation between laboratory sites

Page 12: 140127 abrf interlaboratory study proposal

Transcript splice junction detection

Long reads provide efficient junction detection

Most junctions are detected by three or more platforms

Page 13: 140127 abrf interlaboratory study proposal

POLYA(11,820)

RIBO (11,294)

PRO(13,797)

PGM(12,572)

454 (7579)

1112

266696

680

1207

65

486

8923044

59

93

5566

359 83

37439

71 317

366 79

7451957330

410 867

39

157

179

46

928

Total (18,002)

DEGs detected by three or more methods: 61.4%

DEGs detected by twomethods: 16.6%

Unique DEGs:454 1.5%POLYA 6.2%RIBO(-) 3.9%PRO 6.7%PGM 3.8%Total unique 22.0%

Sets containing more than 1000 genes are indicated in red; 100-999 in yellow.

Detection of Differentially Expressed Genes,sample A vs. B

Page 14: 140127 abrf interlaboratory study proposal

Transcript abundance measurements using polyA-enrichment or rRNA-depletion library preparation methods

Page 15: 140127 abrf interlaboratory study proposal

Correlation with RT-qPCR

PolyA vs. ribo-depletion for detection of differential gene expression

Page 16: 140127 abrf interlaboratory study proposal

Correlations of measured transcript abundances for high-quality vs. degraded total RNA

- rRNA-depleted- Illumina HiSeq

Corr

elati

on c

oeffi

cien

ts

Samples compared

A A A dA dA dA BdA dA dA dA dA dA dB

1.0

0.9

0.8

0.7

Page 17: 140127 abrf interlaboratory study proposal

Illum

ina

PGM

SVA:Leek JT, Storey JD.PLoS Genet. 2007

Surrogate Variable Analysis to remove cross-platform and cross-site variation

ABRF NGS Study

FDA SeQC

Page 18: 140127 abrf interlaboratory study proposal

Funded by:Vendor donations of sample preparation and sequencing reagents

Participating laboratories

ABRF

Manuscript in review:6 figures, 2 tables37 supplementary figures, 7 supplementary tables

The ABRF NGS Study, Phase I

26 primary scientists34 contributing scientists21 research institutions

4.3 billion reads447 billion nucleotides

Page 19: 140127 abrf interlaboratory study proposal

The ABRF NGS Study, Phase II

DNA sequencing topics were brainstormed and prioritized by the study consortium

Samples were chosen based on the August 2013 Genome in a Bottle Workshop

Page 20: 140127 abrf interlaboratory study proposal

Phase II DNA sequencing aims

Reference data sets• Intra- and inter-lab replication to model the range of performance

expected under normal service laboratory conditions Reference samples• Easily accessible for self-evaluation by comparison to the reference data• Standardized, stably reproduced, suitable for methods development Immediate utility• Performance metrics and data applicable to methods used now or in the

near future by core sequencing facilities

Page 21: 140127 abrf interlaboratory study proposal

Phase II projectsin no particular order, with project scope and sequencing coverage to be prioritized by interest and funding:

Performance using different platforms and technical protocols• NIST GiaB designated human genomic DNA• Measure sequencing accuracy and coverage Performance using damaged DNA and chimeric cell populations• DNA from formalin-fixed, paraffin embedded cell mixtures• Measure sequencing accuracy, coverage, and limits of detection for

somatic mutations

Performance on small genomes over a range of GC content• NIST GiaB (with FDA) designated bacterial genomic DNA• Measure sequencing accuracy and coverage

Page 22: 140127 abrf interlaboratory study proposal

Sample ID DNA source ProjectA Ashkenazim Jew, maternal 1B Ashkenazim paternal 1C Ashkenazim child 1M pool of mutant Horizon Dx lines #1, #3 plus Acrometrix lines #2, #4:

1-2 48% each, 3-4 2% each by cell count M1 50% C, 50% M cells in FFPE (each target’s copy number = 24% or 1%) 2M2 80% C, 20% M cells in FFPE (targets = 9.6% or 0.4%) 2M3 90% C, 10% M cells in FFPE (targets = 4.8% or 0.2%) 2M4 95% C, 5% M cells in FFPE (targets = 2.4% or 0.1%) 2M5 99% C, 1% M cells in FFPE (targets = 0.48% or 0.02%) 2M6 99.5% C, 0.5% M cells in FFPE (targets = 0.24% or 0.01%) 2M7 99.9% C, 0.1% M cells in FFPE (targets = 0.048% or 0.002%) 2M8 99.99% C, 0.01% M cells in FFPE (targets = 0.0048% or 0.0002%) 2Sta Staphylococcus aureus 3Sae Salmonella enterica 3Psa Pseudomonas aeruginosa 3Cls Clostridium sporogenes 3P pooled metagenomic sample with all four bacterial genomes 3

Phase II samples

Page 23: 140127 abrf interlaboratory study proposal

Species Genome (bp)

Avg % GC

Reference strain Distributor

Staphylococcus aureus 2.8x10^6 33 NRS77 (NCTC 8325)

NARSA #NRS77

Salmonella enterica subsp. enterica serovar Typhimurium

4.9x10^6 52 LT2 ATCC #700720

Pseudomonas aeruginosa 6.7x10^6 67 PA01 ATCC #47085Clostridium sporogenes 4.1x10^6 28 Metchnikoff ATCC #15579

Small genomes project: sizes and GC content

Page 24: 140127 abrf interlaboratory study proposal

Platform Project 1 Samples

Project 2 Samples Project 3 Samples

Illumina HiSeq 2000 A, B, C Sta, Sae, Psa, Cls, PIllumina HiSeq 2500 C M1-M8

Illumina 2500 RapidTrack C Illumina MiSeq C for long-

read scaffold Sta, Sae, Psa, Cls, PIllumina Moleculo A, B, C

Life Technologies Proton A, B, C M1-M8 Sta, Sae, Psa, Cls, PLife Technologies PGM Sta, Sae, Psa, Cls, P

Pacific Biosciences C for long-read scaffold Sta, Sae, Psa, Cls, P

New platforms? (Illm X10, NextSeq 500; Qiagen GeneReader, Oxford MinION…) ? ? ?

Library Protocol Nextera on HiSeq C M1 Sta, Sae, Psa, ClsNuGEN on HiSeq C M1 Sta, Sae, Psa, Cls

New England Biolabs on HiSeq C M1 Sta, Sae, Psa, ClsSigma WGA on Proton C M1 Sta, Sae, Psa, Cls

NuGEN WGA on Proton C M1 Sta, Sae, Psa, ClsQiagen WGA on Proton C M1 Sta, Sae, Psa, Cls

Platforms and library methods

Page 25: 140127 abrf interlaboratory study proposal

An ABRF – GiaB collaboration

NIST• Extract high-quality genomic DNA from cultured cells for A, B, C, Sta, Sae,

Psa and Cls• Prepare equimolar blend of bacterial DNA for pool P• Procure somatic mutation cell lines, create pools M1-M8 titrated by cell

counts• Extract genomic DNA from FFPE blocks of cell suspensions• Distribute aliquots of DNA reference stocks to participating study labs ABRF• Assemble platform groups with at least 3 labs per instrument or method• Each platform group will determine a consensus protocol for library

preparation and sequencing• Sequence one library per sample per site (intra-lab replicates encouraged)• Collect and annotate data in a central repository• Analyze sequencing performance

Page 26: 140127 abrf interlaboratory study proposal

Name email Contact regarding:Baldwin, Don [email protected] study designGrills, George [email protected] vendor and partner relationsMason, Chris [email protected] data analysisNicolet, Charlie [email protected] sequencing methodsTighe, Scott [email protected] logistics

The ABRF NGS Study leadership groupin alphabetical order, with level of participation and devotion to be prioritized by alcoholic intake: