Upload
externalevents
View
502
Download
0
Embed Size (px)
Citation preview
“GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New
Way Forward in the Microbiological Testing & Traceability for Foodborne
Pathogens”Eric W. Brown, Ph.D.DirectorDivision of MicrobiologyCenter for Food Safety & Applied NutritionU.S. Food & Drug AdministrationCollege Park, Maryland 20740
2
“Whole Genome Sequencing Is The Biggest Thing To Happen To Food Microbiology Since Pasteur Showed Us How To Culture Pathogens…”
~Dr. Jorgen SchlundtExec Director and Founder
The Global Microbial Identifier
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 680
5
10
15
20
25
30
35
40
Representative* Timeline for Foodborne Illness Investigation Using Whole Genome Sequencing
Contaminated food enterscommerce
FDA, CDC, FSIS, and States use WGS in real-time and in parallel on clinical, food,
and environmental samples
Source of contaminationidentified early through WGS combined database queries
Averted Illnesses
Num
ber o
f Cas
es
Days
*Data is for illustrational purposes and does not represent an actual outbreak
Comparison of Nut Butter Outbreaks*
• Salmonella Tennessee, Con Agra, Peter Pan Peanut Butter, – 2006/2007: 715 cases, 129 hospitalizations, no deaths
• Salmonella Typhimurium, multiple peanut products, Peanut Corporation of America,– 2008/2009: 714 cases, 166 hospitalizations, may have contributed to 9
deaths
Post GenomeTrakr Network –Whole Genome Sequencing
• Salmonella Braenderup, nSpired Natural Foods, multiple almond and peanut butters, 2014: 6 cases, 1 hospitalization, no deaths
* Source: CDC’s Foodborne Outbreak Online Database (FOOD Tool)
GenomeTrakr Network• Genometrakr was established to accelerate the source tracking and
tracing of foodborne outbreaks through the use of next generation whole genome sequencing (wgs)
• It is a network of State and Federal laboratories with whole genome sequencing capability, established by FDA in 2012
• The network provides high resolution genomic sequences of food pathogens, ex. Salmonella, Listeria, STEC’s, others
• Partnership with NCBI for all storage and sharing of sequence and metadata in public domain
• Partnered with CDC in 2013 to study all clinical and environmental isolates of Listeria monocytogenes
• Today the network consists of labs at FDA, CDC, FSIS, 14 state labs and 9 international labs.http://www.fda.gov/Food/FoodScienceResearch/WholeGenomeSequencingProgramWGS/default.htm
WGS is BIG data! The GenomeTrakr database is about 17 terabytes large currently. 1 terabyte = 17,000 hours of normal human speech. Thus, the GenomeTrakr database is equivalent in size to about 289 thousand hours of words spoken or about 32 years worth of continuous speech.
OR;
One terabyte = 2000 file cabinets worth of papers.
You would need 34,000 four drawer standard file cabinets for GenomeTrakr data if it were printed.
*The hubble telescope has collected 45 terabytes worth of data since its launch in the early 90s. GenomeTrakr has been live since 2012 and is more than a third of the way there in data storage.
GenomeTrakr Strategy• Develop a distributed sequencing based network, rather
than a centralized model• Public access to data • Focus on collaborative efforts• Provide sequence and minimal metadata in a publicly
accessible database– Partner with NCBI for storage and serving data– Cost prohibitive for FDA to establish its own high capacity data site– Industry (food, pharma, and methods development), academia,
hospitals, clinical public health laboratories, and other government agencies have access to data for their individual needs
FDA GenomeTrakr websitehttp://www.fda.gov/Food/FoodScienceResearch/WholeGenomeSequencingProgramWGS/ucm363134.htm
Num
ber o
f Seq
uenc
es(a
s of
the
last
day
of t
he q
uarte
r)
Total Number of Sequences in the GenomeTrakr Database
2013 2014 2015
Average Number of SequencesAdded Per Month in 2013 = 169
Average Number of SequencesAdded Per Month in 2014 = 1,076
First sequences uploaded in Feb 2013
Public Health Englanduploads more than 8,000 Salmonella sequences
Average Number of SequencesAdded Per Month in 2015 = 2,362
2016
How do we use the GenomeTrakr information?
Environmental sampling
Post inspection
Interpretation
SNP Distance
How close are the isolates? No single threshold for all species/types: rough, conservative guides
1. Inclusion: <=20 SNPs match, virtually identical2. Inconclusive: 20-100 SNPs3. Exclusion: > 100 SNPs exclude
BootstrappingDo the isolates form a unique cluster w/ >= 95% support?Is the cluster distinct from other isolates in the tree?
Data AnalysisSNPs wgMLST
Unit of Measure Single Nucleotide Substitutions (other types of mutations are excluded)
Allelle - variant of a gene. Variation could arise form a number of sources, including SNPs, insertions, deletions,
etc.
Requirements Complete or high-quality reference genome for mapping
Database of named alleles, must be actively maintained
Pros Extremely High Resolution, Methods have been published and validated
Relatively Fast, not directly dependent upon reference genome
ConsRequires reference genome,
computationally intense, requires local bioinformatics expertise
Allele database must be centralized, cannot compute novel wgMLST types locally. wgMLST schemas not publicly
available at this time.
CFSAN SNP Pipeline
• Documentation: http://snp-pipeline.rtfd.org
• Source Code: https://github.com/CFSAN-Biostatistics/snp-pipeline
• PyPI Distribution: https://pypi.python.org/pypi/snp-pipeline
Pettengill JB, Luo Y, Davis S, Chen Y, Gonzalez-Escalona N, Ottesen A, Rand H, Allard MW, Strain E. (2014) An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with Salmonella. PeerJ 2:e620 http://dx.doi.org/10.7717/peerj.620
Intended for use by bioinformaticists (Linux)
Data Submission
Do not need to be associated with GenomeTrakr
Public Health England
Establish Bioproject
Upload data and metadata
Link to surveillance pipeline – kmer tree
Lessons Learned• WGS provides accurate, informative information in every case we have applied it, and
the distributed model for a WGS network has proved the most effective means of acquiring sequence data.
• WGS can be used to mitigate trace backs and delimit the scope of food contamination events unlike ever before – not just a regulatory tool - numerous offshoot applications exist (i.e., supply chain management, quality assurance, process evaluation, etc.)
• The development of international open source databases is critical due to the global nature of the food supply.
• Genome sequences are agnostic, portable, and instantly cross-compatible. One technology approach irrelevant of organism.
• WGS, unlike PFGE, is more than simply a “Molecular Epi-Machine”. It provides information on AMR, Virulence, serotype, and other critical factors in one assay, including historical reference to pathogen emergence. Significant lab cost savings with one approach!
• The need for increased number of well characterized environmental (food, water, facility, etc.) sequences may outweigh need for extensive clinical isolates
Next-Generation sequencing faces several large challenges as it deploys to a global public health tool:
How much metadata?
Will all share data?
Administration, coordination, and oversight?
Who pays?
Who owns the IP?
Quality concerns and curation?2014 2015
2017
2020
2016
WG
S
Looking ahead Capacity building – hardware, software and people (bioinformatics)
Slow transition from PFGE to WGS
Different authorities and distinct mandates with some overlap
Bioinformatics - training IFSH workshops/CDC/IFSTL-JIFSAN-UMD
“Hands on”
Sample submission by industry
Understanding the supply chain
Facility and transportation sanitation – resident pathogens
Prevention
Spoilage organisms
One microbiology workflow for bacterial pathogens – FDA FOODS
PROGRAM
Multiple Tests for Strain Characterization
Species Resistance
Virulence Subtype
Serotype Adaptations
ONE MICROBIOLOGICAL WORKFLOW: ONE MICROBIOLOGICAL TOOL BOX All AT YOUR FINGERTIPS
IN THE NOT SO DISTANT FUTURE….. APPs ON YOUR SMARTPHONE
Acknowledgements
• FDA• Center for Food Safety and Applied Nutrition• Center for Veterinary Medicine• Office of Regulatory Affairs• Office of Food & Veterinary Medicine
• National Institutes of Health• National Center for Biotechnology Information
• State Health and University Labs• Alaska• Arizona• California• Florida• Hawaii• Maryland• Minnesota• New Mexico• New York• South Dakota• Texas• Virginia• Washington
• USDA/FSIS• HQ and The Eastern Laboratory
• CDC• Enteric Diseases Laboratory
• INEI-ANLIS “Carolos Malbran Institute,” Argentina
• Centre for Food Safety, University College Dublin, Ireland
• Food Environmental Research Agency, UK
• Public Health England, UK
• WHO
• Illumina
• Pac Bio
• CLC Bio
• MANY other independent collaborators