Pathogen Profiling Pipeline

Preview:

DESCRIPTION

Metagenomc sample analysis pipeline.Talk at M3 SIG at ISMB 2009 in Stockholm Sweden.

Citation preview

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

1

Pathogen Profiling Pipeline

Tom MatthewsNational Microbiology LaboratoryPublic Health Agency of Canada

thomas_matthews@phac-aspc.gc.ca

A Metagenomics Tool for RapidIdentification of Pathogens from Clinical

Specimens

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

2

Introduction

● With novel/emerging disease classical pathogen identification may not always produce results

● Advances in next-gen sequencing technology● Characterize samples at genomic level

● Pathogen Profiling Pipeline● Bioinformatics pipeline ● Analysis of host and microbial nucleic acids

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

3

Features

● Nucleotide and protein BLAST analysis● Unbiased analysis of input reads● Clustered execution● Web front-end● Custom analysis pipelines● Easily viewed results

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

4

Filtering Overview● BLAST analysis performed against reference

sequence database● Assigns hits according to cut-off criteria● Calculate equivalent hits● Clustered BLAST and filtering

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

5

Last Common Ancestor Estimation

● Uses equivalent hits for LCA calculation

● User specifies equivalent hit percentage cutoff

● NCBI taxonomy database for ancestor lookup

● Walks up taxonomy tree to find lowest intersection of all leaf nodes

● Unbiased approach

Vaccinia

Camelpox

Taterapox

VariolaOrthopoxvirus

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

6

Filtering Outputs

● Hits – High scoring reads passing filtering values

● Equivalent Hits – BLAST hits matching to within an assigned percentage of the top hit's bitscore

● Last Common Ancestors – Calculated (estimated) LCA of all the equivalent hits

● Unassigned – Passed to the next pipeline step

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

7

Example Analysis Method

● BLAST reads against host database

● Remove host reads

● BLAST unassigned against reference database

● Filter hits vs. unassigned

● Repeat...

● Post analysis

Samplereads

BLAST andFiltering

Hostgenome

Viralgenome

Bacterialgenome

Protozoangenome

Fungalgenome

Non-hostreads

BLAST andFiltering

BLAST andFiltering

Non-hostreads

BLAST andFiltering

BLAST andFiltering

Poolresults

UniqueorganismsIn sample

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

8

Pipeline Construction

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

9

Pipeline Construction

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

10

Pipeline Construction

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

11

Pipeline Construction

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

12

Pipeline Construction

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

13

Pipeline Execution

● Custom execution manager● Computes dependencies and monitors running

jobs● Distribute jobs across Linux cluster● Facilitates unattended clustered executions

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

14

Reports

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

15

Drill Down Reports

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

16

Abundance View

● Displays abundance of taxonomic hits

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

17

Example Run

● Mouth swab input samples● Two pools:

● Samples spiked with Vaccinia and Influenza A● Background reference sample

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

18

Example Run

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

19

Example run

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

20

Example Run

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

21

Example Run

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

22

Wrap-up

● Unbiased analysis of input reads● Custom analysis pipelines● Last common ancestor calculation● Clustered execution● Multiple report views● Exportable results

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

23

Acknowledgements

● Gary Van Domselaar● Morag Graham● Shaun Tyler● Heather Kent● Kim Melnychuk● Christine Bonner● Geoff Peters● Philip Mabon

Recommended