Upload
torsten-seemann
View
625
Download
3
Embed Size (px)
Citation preview
Approaches to analysing 1000s of bacterial isolates
A/Prof Torsten Seemann
Victorian Life Sciences Computation Initiative (VLSCI)Microbiological Diagnostic Unit Public Health Laboratory (MDU PHL)
Doherty Centre for Applied Microbial Genomics (DCAMG)The University of Melbourne
ICEID 2015 - Atlanta, USA - Mon 24 Aug 2015
Microbiological Diagnostic Unit
∷ Oldest public health lab in Australia: established 1897 in Melbourne: large historical isolate collection back to 1950s
∷ National reference laboratory: Salmonella, Listeria, EHEC
∷ WHO regional reference lab: vaccine preventable invasive bacterial pathogens
The shift to WGS
∷ New director: Prof Ben Howden - clinician, microbiologist, pathologist
∷ New building: Doherty Institute for Immunity and Infection
∷ Mandate: modernise service delivery: nationally lead the conversion to WGS: enhance research output and collaboration
Does it deliver?
Yes!Bioinformatics Epidemiology
Technology
Microbiology
This meansscientists
not just software
Domain expertise
Always changing...
Sequencing a sample
∷ Simple preparation∷ Multiplexing∷ Robotics∷ Reliable instruments
Isolation is a bottleneck→ Direct sequencing?
What data do we really have?
Isolate genome Sequenced reads
Other isolates in sequencing run
Contamination
Unsequenced regions
Compare to already assembled genomes
AGTCTGATTAGCTTAGCTTGTAGCGCTATATTATAGTCTGATTAGCTTAGAT
ATTAGCTTAGATTGTAG
CTTAGATTGTAGC-C
TGATTAGCTTAGATTGTAGC-CTATAT
TAGCTTAGATTGTAGC-CTATATT
TAGATTGTAGC-CTATATTA
TAGATTGTAGC-CTATATTAT
SNP Deletion
Reference
Reads
Best practice
■ Use both approaches□ reference-based + de novo
■ Best of both worlds□ and worst of both worlds - interpretation is non-trivial
■ Still need□ good epidemiology, metadata and domain knowledge!
Backward compatibility
∷ MLST∷ NG-MAST∷ Resistome∷ Virulome
∷ MLVA∷ VNTR∷ Serotyping
∷ PFGE∷ Phage typing
New assays
∷ Species identification: build a “signature” from k-mer/oligos in the reads: compare to database of known signatures: strain level accuracy
∷ Features
: very fast screening, < 1 minute per isolate: identify contamination, mixed samples: discover wrongly labelled samples!
Reference based analysis
∷ Implies you have a “close” reference: need to be careful with draft genomes
∷ Very sensitive: single mutation precision
∷ May not be complete: ignores novel DNA in your isolate
Core
∷ Common DNA∷ Vertical evolution
∷ Genotyping∷ Phylogenetics
∷ Novel DNA∷ Lateral transfer∷ Plasmids∷ Mobile elements
∷ Partly unexploited
Pan
Bioinformatics challenges
∷ Metagenomics: avoiding the isolation bottleneck
∷ Incremental update of data analyses: both core and pan genome: phylogenetic trees
∷ Data distribution: finding and getting appropriate comparator isolates
Open science
∷ Crowd-sourcing provably works: EHEC outbreak 2011: Ebola: MERS
∷ But only if people share: sequencing data: metadata: software source code for analysis