Characterization of Bacterial Genomic Reference Materials · 2017-06-23 · • Bioinformaticist -...

Preview:

Citation preview

Genomic Reference Materials @ The National Institute of Standards and Technology (NIST)

Scott A. JacksonGroup Leader

Complex Microbial Systems11-20-2015

NIST- Who we are today?

“Industry’s National Laboratory”partnering/serving industry to help maintain US leadership in science and technology products

Department of Commercedeveloping standards to support international trade and commerce

The National Metrology Instituteworking toward global harmonization and traceability to the SI

The FDA has also been active in addressing other regulatory issues surrounding personalized medicine. Along with authorizing the Illumina technology for marketing, the FDA recognized the need for reference materials and methods that would permit performance assessment. As a result, the FDA collaborated with the National Institute for Standards and

Technology (NIST) to develop reference materials consisting of whole human genome DNA, together with the best possible sequence interpretation of such genomes.

The FDA based its decision to grant marketing authorization for the Illumina instrument platform and reagents on their demonstrated accuracy across numerous genomic segments, spanning 19 human chromosomes. Precision and

reproducibility across instruments, users, days, and reagent lots were also demonstrated.

Justin ZookMarc Salit

Justin ZookMarc Salit

NIST Microbial Genomic Reference Materials

Microbial Genomic Reference Materials at NIST

Opinions expressed in this paper are the authors and do not necessarily reflect the policies and views of NIST or affiliated venues. Certain commercial equipment, instruments, or materials are identified in this paper only to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the NIST, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose. Official contribution of NIST; not subject to copyrights in USA.

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Traceability:

How do I gain confidence that my result it correct?

Why do I care if my answer is right?

High Stakes Decisions

Presenter
Presentation Notes
FDA required NIST to make reference material to certify NGS, GIAB - also tasked to make microbial RMs Regulatory FDA clearance of sequencing platforms Method validation Sequencing Bioinformatics

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Material Selection and Acquisition

Strain Selection

Presenter
Presentation Notes
Community Feedback decided in conjunction with the FDA Strains selected based on relevance to foodborne illness and clinical relevance Range of %GC to challenge sequencing platforms

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Produced by local vendorFor each strain

• pure culture• single batch of DNA • ~ 1500 vials• 3μg per vial

RM Production

Presenter
Presentation Notes
include approximate # genomes per tube For each strain single large batch of genomic DNA ~ 1500 vials

NIST Microbial Genomic Reference Materials

RM Characterization

Presenter
Presentation Notes
Application of bioinformatic methods Reproducible RM development

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Experimental Design

* OpGen Optical Genome Mapping

Presenter
Presentation Notes
randomly sampled vials then sequenced 8 tubes sequenced at XYZ for each platform http://www.clker.com/ tube clip art source

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

• Genome Assembly• Base Level Purity• Genomic Contaminants• DNA Stability

Characterized Properties

Presenter
Presentation Notes
Go through and describe each of them along with methods and results from the LT2 RM

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

• Genome Assembly• Base Level Purity• Genomic Contaminants• DNA Stability

Characterized Properties

Presenter
Presentation Notes
Go through and describe each of them along with methods and results from the LT2 RM

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Genome Assembly

Chin et al. 2013

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Assembly Validation OpGen Optical Mapping

"Optical mapping" by Fong Chun Chan and Kendric Wang http://commons.wikimedia.org/wiki/File:Optical_mapping.jpg#/media/File:Optical_mapping.jpg

Presenter
Presentation Notes
Genome Assembly Validation

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Genome Assembly ConfirmationOpGen Optical Mapping

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

High Confidence Assembly

Presenter
Presentation Notes
Optical mapping is a way to validate the genome structure mention gently don’t go too far off on a tangent use graphics - series of optical map pacbio alignment - showing disagreement bioinformatic error that we are looking into extra slides with additional information about inversion

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

High Confidence Assembly

*in progress

Presenter
Presentation Notes
Optical mapping is a way to validate the genome structure mention gently don’t go too far off on a tangent use graphics - series of optical map pacbio alignment - showing disagreement bioinformatic error that we are looking into extra slides with additional information about inversion

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Genome AssemblyBase Level Purity

Single Base HomogeneityVial-to-vial Homogeneity

Genomic ContaminantsDNA Stability

Characterized Properties

Presenter
Presentation Notes
Characterized properties- remove analysis

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Base Level Purity: MethodsSequencing Reads

Calculate Purity

Presenter
Presentation Notes
differences are sequencing errors or genomic diversity

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Base Level Purity: Results MG001● 19 out of 4.8 Mb have

purity values less than 0.98 for both platforms

● 5 positions with purity less than 0.95

Presenter
Presentation Notes
different instruments different inherent biases- systematic error series 1 1 - highlight single position - show facet plot for that position 2 - highlight 5 low pure 3 - 5 panel facet showing trend holds

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Base Level Purity: Results MG001

Presenter
Presentation Notes
different instruments different inherent biases- systematic error series 1 1 - highlight single position - show facet plot for that position 2 - highlight 5 low pure 3 - 5 panel facet showing trend holds

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Base Level Purity: Results MG001

Presenter
Presentation Notes
Five genome positions Sequenced 8 tubes, 2 miseq runs, pgm 1 run Change to disagreement

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Base Level Purity: Conclusions

● Low genomic diversity within RM lot○ 19 out of 4.8 Mb low purity (95-98%) values for both

platforms● Purity variability due to:

○ Platform specific biases○ Run-to-run variability○ Bioinformatic errors○ NOT material vial-to-vial heterogeneity

Presenter
Presentation Notes
Change to disagreement

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Genome AssemblyBase Level PurityGenomic Contaminants

Presence of contaminant DNADNA Stability

Characterized Properties

* Think metagenomic analysis of data from a pure isolate

Presenter
Presentation Notes
Characterized properties- remove analysis

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Genomic Contaminants: MethodsTaxonomic Read Assignment

Hong et al. 2014 Microbiome

Presenter
Presentation Notes
Find a simpler method to present pathoscope software out of GW and BU uses WGS to determine the tax id of a sample

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Genomic Contamination: Results MG001Contaminants most likely NOT from RM

Presenter
Presentation Notes
Tax figure - bar plot ranked by abundance

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Likely contaminant sourcesSequencing reagentsBioinformatic errors

Fit for purpose99.995% minimum genomic purity

Genomic Contaminants: Conclusions

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

• Genome Assembly• Base Level Purity• Genomic Contaminants• DNA Stability

Characterized Properties

Presenter
Presentation Notes
Go through and describe each of them along with methods and results from the LT2 RM

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

DNA Stability

Presenter
Presentation Notes
need replicate numbers remove handling conditions

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

DNA Stability: Methods

Control

Ladder

2 weeks

8 weeks

194 kb 48.5 kb 6.5 kb

Control

Ladder

2 weeks

8 weeks

194 kb 48.5 kb 6.5 kb

37℃ Treatment 4℃ Treatment

Presenter
Presentation Notes
show gel without lines point out conclusion - did see degradation at 37C large molecule susceptible to degradation over time

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Automated Gel Image Processing

Presenter
Presentation Notes
show gel without lines point out conclusion - did see degradation at 37C large molecule susceptible to degradation over time

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Automated Gel Image Processing

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Stable at -20OC and 4OC

Don’t store your DNA at 37OC

DNA Stabilty Conclusions

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Pipeline for Evaluating Prokaryotic References

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Computational ReproducibilityCode EnvironmentData

Presenter
Presentation Notes
Quick aside Data, code, and computing environment all publicly available Generic slide what is computation reproducibility Code Repositories DNA Stability Analysis https://github.com/usnistgov/peprDnaStability Bioinformatic Pipeline https://github.com/usnistgov/pepr Data Analysis https://github.com/usnistgov/peprr Sequence data NIH SRA Bioproject PRJNA252728 http://www.ncbi.nlm.nih.gov/bioproject/PRJNA252728 Data Repository https://github.com/nate-d-olson/NIST_Micro_Genomic_RM_Data Genome Sequences Optical Mapping Data DNA Stability Data

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Computational ReproducibilityBioinformatic Pipeline

https://github.com/usnistgov/peprhttps://hub.docker.com/r/natedolson/pepr/https://hub.docker.com/r/natedolson/docker-pathoscope/

Data Analysis and Reportinghttps://github.com/usnistgov/peprr

Code

Sequence data NIH SRA Bioproject PRJNA252728

http://www.ncbi.nlm.nih.gov/bioproject/PRJNA252728

Presenter
Presentation Notes
Quick aside Data, code, and computing environment all publicly available Generic slide what is computation reproducibility Code Repositories DNA Stability Analysis https://github.com/usnistgov/peprDnaStability Bioinformatic Pipeline https://github.com/usnistgov/pepr Data Analysis https://github.com/usnistgov/peprr Sequence data NIH SRA Bioproject PRJNA252728 http://www.ncbi.nlm.nih.gov/bioproject/PRJNA252728 Data Repository https://github.com/nate-d-olson/NIST_Micro_Genomic_RM_Data Genome Sequences Optical Mapping Data DNA Stability Data

NIST Microbial Genomic Reference Materials

Conclusions● Microbial genomic RMs characterized:

○ Genome Assembly○ Base Level Purity○ Genomic Contaminants

● RMs and associated data will help validate sequencing and bioinformatic processes.

● Reproducible and transparent characterization

Presenter
Presentation Notes
Assembly validated aside from inversion material fit-for-purpose next steps - uncertainty assessments, sampling, sequencing, power analysis

NIST Microbial Genomic Reference Materials

AcknowledgementsNate Olson• Microbial Genomic Reference Materials

• Microbiologist

• Bioinformaticist - PEPR Pipeline

• UMD PhD Student

NIST Microbial Genomic Reference Materials

AcknowledgementsFDA:

○ Heike Sichtig○ Marc Allard○ Tim Muruvanda○ Shashi Sharma○ Nagarajan Thirunavukkarasu

NIST:Nate OlsonMarc SalitJustin ZookScott JacksonNancy LinJenny McDanielLindsay VangDavid CatoeSteven Lund

This work was supported by the Department of Homeland Security (DHS) Science and Technology Directorate under the Interagency Agreement HSHQPM-12-X-00078 with NIST and by two interagency agreements with the FDA.

NIST Microbial Genomic Reference Materials

A Mixed Pathogen DNA Reference Material for NGS-Based Pathogen Detection

NIST Microbial Genomic Reference Materials

An Unbiased Approach for Pathogen Detection:Shotgun Metagenomics via Next-Gen Sequencing

Clinical Sample Total

DNAShotgun Library Metagenomic

Sequence Data

Bioinformatically Search for Pathogen-Specific Signatures

NIST Microbial Genomic Reference Materials

There is a Need for Standards for NGS-Based Pathogen Detection Assays/Devices

• Industry, Academics, and Government Regulators (FDA) have expressed a need for standards for the purpose of validating biothreat/pathogen detection devices

• Primary users/adopters of these standards would be device developers and research laboratories who wish to assess analytical sensitivity, specificity and relative performance of their DNA sequence-based pathogen detection device/assay.

NIST Microbial Genomic Reference Materials

Pool

Pathogen #1

Pathogen #2

Pathogen #3

Pathogen #4

Pathogen #5

Pathogen #6

Human DNA

10-1

101

10-2

10-3

10-4

10-5

10-6

Abundance*Source

Abundance* is genome copy number relative to human reference DNA

NIST Microbial Genomic Reference Materials

1.0E-07

1.0E-06

1.0E-05

1.0E-04

1.0E-03

1.0E-02

1.0E-01

1.0E+00

1.0E+01Relative Genome Abundance

Expected Copy Number

qPCR Copy Number

NGS Copy Number

Pool

Pathogen #1

Pathogen #2

Pathogen #3

Pathogen #4

Pathogen #5

Pathogen #6

Human DNA

10-1

101

10-2

10-3

10-4

10-5

10-6

Abundance*Source

Abundance* is genome copy number relative to human reference DNA

Targeted (PCR) Based Detection and NGS-Based Metagenomic

Detection

Quantitative and

Qualitative

NIST-FDA Pathogen Detection Workshop

NIST Microbial Genomic Reference Materials

Pool

Pathogen #1

Pathogen #2

Pathogen #3

Pathogen #4

Pathogen #5

Pathogen #6

Human DNA

10-1

101

10-2

10-3

10-4

10-5

10-6

Abundance*Source

Abundance* is genome copy number relative to human reference DNA

NIST Microbial Genomic Reference Materials

Questions?

External RNA Controls

• RNA Spike-Ins to provide confidence in gene expression experiments

• Serves both Microarray-based and RNA-Seq (NGS)-Based Technologies

NIST Microbial Genomic Reference Materials

NIST Standard Reference Material (SRM) 2374DNA Sequence Library for External RNA Controls

Marc SalitSarah Munro

The ERCC Standard Reference Material has Been Widely Adopted

Sarah MunroScott PineMarc Salit

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

GC Bias

Presenter
Presentation Notes
Here is why GC was a driver in genome selection Coverage (number of sequencing reads) that map to a region varies by GC content due to the biochemical nature of the bonds between the nucleotides important to show GC introduces bias find a different image of bias related to sequence data can also just show ecoli Figure source http://www.biomedcentral.com/content/pdf/gb-2011-12-2-r18.pdf Seq answers - GC bias http://seqanswers.com/forums/showthread.php?t=22525 http://nar.oxfordjournals.org/content/40/10/e72.long

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Genome Assembly

Chin et al. 2013

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Short Read Data: Assembly Validation

Walker et al. 2014http://www.broadinstitute.org/software/pilon/

Presenter
Presentation Notes
listen or read - trim figure

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Results MG001

Optical Mapping: Assembly Validation

Presenter
Presentation Notes
Optical mapping is a way to validate the genome structure mention gently don’t go too far off on a tangent use graphics - series of optical map pacbio alignment - showing disagreement bioinformatic error that we are looking into extra slides with additional information about inversion

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Experimental Design

Presenter
Presentation Notes
randomly sampled vials then sequenced 8 tubes sequenced at XYZ for each platform http://www.clker.com/ tube clip art source

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Genome AssemblyBase Level PurityGenomic ContaminantsDNA Stability

Characterized Properties

Presenter
Presentation Notes
Go through and describe each of them along with methods and results from the LT2 RM

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Genome AssemblyOverall chromosome structure

Base Level PurityGenomic ContaminantsDNA Stability

Characterized Properties

Presenter
Presentation Notes
Characterized properties- remove analysis

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Conclusions:●Closed genome assembly from long read

data●Assembly confirmation with orthogonal

methods●Evaluation of candidate errors may require

additional analysis

Assembly Validation

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Genome AssemblyBase Level Purity

Strain Diversity: within lotHomogeneity: vial-to-vial

Genomic ContaminantsDNA Stability

Characterized Properties

Presenter
Presentation Notes
Characterized properties- remove analysis

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Base Level Purity: Conclusions

● Low genomic diversity within RM lot○ 19 out of 4.8 Mb low purity values for both platforms

● Purity variability due to: ○ Platform specific biases○ Run-to-run variability○ Bioinformatic errors○ NOT material heterogeneity

Presenter
Presentation Notes
Change to disagreement

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Genome AssemblyBase Level PurityGenomic Contaminants

Presence of contaminant DNADNA Stability

Characterized Properties

Presenter
Presentation Notes
Characterized properties- remove analysis

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Two Pronged approach1.NIST microbial genomic RMs2.Methods allowing USERS to evaluate in-

house materials

Presenter
Presentation Notes
select strains with public health relevance genomic characteristics that are challenging to sequence in general

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Two Pronged approach1.NIST microbial genomic RMs2.Methods allowing USERS to evaluate in-

house materials

Presenter
Presentation Notes
Material Objectives underpinning confidence in the microbial space something for the field to use in benchmarking comparability between labs, and time enables refinement of the technology This is a first step!

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Two Pronged approach1.NIST microbial genomic RMs2.Methods allowing USERS to evaluate in-

house materials

Presenter
Presentation Notes
talk about NBACC has no use for the reference material select strains with public health relevance genomic characteristics that are challenging to sequence in general

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

ShotgunGenome Sequencing

Modified From Loman et al. 2012 Nature Reviews Microbiology 10(9)

Presenter
Presentation Notes
Missing shotgun sequencing prior to shotgun - map first then sequence fragmentation - used orthogonal methods Don’t get caught up on technical details Pacbio flourescent movies PGM - ph changes MiSeq flourescent images Key points - overall process, fragmented DNA Orthogonal sequencing methods Modified from Loman, N. J., Constantinidou, C., Chan, J. Z. M., Halachev, M., Sergeant, M., Penn, C. W., … Pallen, M. J. (2012). High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity. Nature Reviews Microbiology, 10(9), 599–606. doi:10.1038/nrmicro2850 GC bias - impacts fragmentation seq - systematic sequencing errors

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Base Level Purity: Results MG001

Presenter
Presentation Notes
different instruments different inherent biases- systematic error series 1 1 - highlight single position - show facet plot for that position 2 - highlight 5 low pure 3 - 5 panel facet showing trend holds

Background Material Selection Characterization

NIST Microbial Genomic Reference Materials

Base Level Purity: MethodsSequencing Reads

Calculate Purity

Presenter
Presentation Notes
differences are sequencing errors or genomic diversity

Recommended