Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
National Center for Emerging and Zoonotic Infectious Diseases
NCBI Pathogen Detection Pipeline Epi Work Flow
Rashida Hassan, MSPHFoodborne Outbreak Response TeamOutbreak Response and Prevention BranchCenters for Disease Control and Prevention
May 13, 2019
What is it?
NCBI Pathogen Detection project
Centralized system which integrates sequences for bacterial pathogens from food, the environment, and human patients
Agencies submit sequencing data to NCBI, which analyzes the sequences to identify closely related sequences
NCBI Pathogen Detection Isolates Browser: web-basesd portal that integrates available information with the SNP cluster information
https://www.ncbi.nlm.nih.gov/pathogens/
NCBI Pathogen Detection project
Pathogens• Current focus on Campylobacter, Escherichia coli, Shigella, Listeria, and
Salmonella• >20 other pathogens added, with more expected to follow
Contributing Agencies• Routine submissions from state PulseNet Laboratories, CDC, FDA,
USDA-FSIS, Public Health England• Additional countries and institutions may submit sequences as well
How we use it at CDC
Active Multistate Clusters• Objective: determine relatedness between
isolates included in a multistate cluster & see if there are any additional related isolates
Active Multistate Clusters• Objective: determine relatedness between
isolates included in a multistate cluster & see if there are any additional related isolates
• Step 1: Go to NCBI Pathogen Detection Isolates Browser https://www.ncbi.nlm.nih.gov/pathogens/
• Select “Find isolates now!” or explore data for your pathogen
• Step 2: Paste WGS IDs for your cluster isolates into search box, click search
CopyPaste Click
Active Multistate Clusters
• Step 3: Select generated SNP cluster
Select
11 matched isolates were found, 7 clinical and 4 environmental73 total isolates in NCBI’s SNP clusterMinimal SNP difference for all isolates within this search is 0
Active Multistate Clusters
Isolates you searched for will be selected/highlighted in red
Left click and “select” any blue isolates to add them to your selection, or check box in table above
Isolates you searched for will be selected/highlighted in red
Left click and “select” any blue isolates to add them to your selection, or check box in table above
Differences between isolates selected
Min-same = Minimum SNP difference within same isolation source typeMin-diff = Minimum SNP difference across different isolation types
Minimum distance between different source types
Selects all isolates that fall within a designated SNP distance of your originally selected isolate(s)
Selects all isolates that fall within a designated SNP distance of your originally selected isolate(s) Specify SNP distance
and select “Add”
Change info displayed; can add AMR info, PFGE, etc.
Add columns to line list here
Remove columns from line list here
Click ok
Filter by time or isolate type
Filter by source type here
Filter by time by selecting area on timeline
Click arrow to close
Export image of tree
Download line list of selected isolates
Create alert for new closely related sequencesCreate name for your alert
Select SNP distance for alert
• Step 3: OR do nothing if you get this message
Sequences have not been uploaded/analyzed yet, or they have not met NCBI’s quality checks
Active Multistate Clusters
Queries and Searches
Terms for NCBI PDP Searches & Queries• taxgroup_name: select the organism name
– “Salmonella enterica”– “E.coli and Shigella”– “Campylobacter jejuni”– “Listeria monocytogenes”
• new: 1 - specify only new added isolates• mindiff: [0 to 5] - specify the range of SNP differences between any clinical
or food/environmental isolate (brackets for ranges, or just the number)• minsame: 0 - specify SNP difference between isolates of the same type• Geo_loc_name: - specify geographic location (usually country)• AMR_genotype: - specify AMR genes• “epi_type” – “clinical” or “environmental/other”
Terms for NCBI PDP Searches & Queries• If you forget the search terms, hover over the name of that column in the
isolates browser
Terms for NCBI PDP Searches & QueriesExample search: • taxgroup_name:"Salmonella enterica" AND mindiff:[0 TO 3] AND
geo_loc_name:"USA" AND new:1 AND epi_type:"clinical"
• Will find any new Salmonella clinical isolates from the USA less than 4 SNPs to a food/environmental isolate
Enter search terms hereSelect to save your search
Select search
Terms for NCBI PDP Searches & Queries• “Watch” and “save” search options are only available if you create an NCBI
account
Click here at top of NCBI PDP webpage
Terms for NCBI PDP Searches & Queries• “Watch” and “save” search options are only available if you create an NCBI
account
Click “sign up” to create your account
Some caveats…
• Analyses from NCBI must be confirmed by PulseNet– NCBI uses SNP analysis vs. PulseNet uses wgMLST, cgMLST, and hqSNP
• NCBI PDP most useful during WGS transition, until PulseNet allele codes are available – After the transition, NCBI PDP can supplement information in PulseNet: matches in other countries (clinical
and food), non-PulseNet pathogens, etc.• Delays: once submitted to NCBI, could take up to 1 week for Salmonella sequences to be posted• NCBI has their own guidelines for what would be considered “good quality”
– Some sequences by state labs could be rejected/not posted even if they pass PulseNet’s quality checks• Some analyses for AMR, especially on older isolates, might be out of date
– A null value does not indicate a negative result– For confirmation on any AMR, please contact NARMS
• NCBI cluster range is 50 SNPs, so many things in their trees will not really be closely related– Double check alerts for saved queries- isolates may fall within NCBI’s 50 SNP range but not within the
smaller range for your cluster/outbreak– Saved searches may result in very many notifications!
• Vibrio cluster detection by WGS based on NCBI PDP SNP analysis, since no schema is developed yet
NCBI Pathogen Detector Pipeline (PDP) Caveats
For more information, contact CDC1-800-CDC-INFO (232-4636)TTY: 1-888-232-6348 www.cdc.gov
The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.
Questions?
For more information, contact:Rashida Hassan ykm6@@cdc.gov404-639-1727