Upload
jennomics
View
554
Download
7
Tags:
Embed Size (px)
DESCRIPTION
Presentation at a workshop conducted by the UC Davis Bioinformatics Core Facility: Using the Linux Command Line for Analysis of High Throughput Sequence Data, September 15-19, 2014
Citation preview
“Scientists often have a naïve faith that if only they could discover enough facts about a problem, these facts would somehow arrange themselves in a compelling and true solution.”Theodosius
Dobzhansky
Jenna Morgan Lang
postdocJonathan Eisen’s Lab
UC Davis
email: [email protected]
Twitter: @jennomics
websites: jennomics.com seagrassmicrobiome.org phylogenomics.wordpress.com
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
16S ribosomal RNA PCR surveys
Metagenomics
Typical laboratory workflow
• Extract DNA with MoBio PowerSoil Kit
• Amplify 16S rDNA with barcoded primers
• Pool samples and sequence on the MiSeq– 15 million reads, 250bp PE– 50-200(?) samples– Sample drop out
Typical bioinformatic workflow
• Demultiplex and QC sequence data
• Process using QIIME
• Stare at graphs and wait for a revelation
zSequence
data
inputs pre-processing under the hood analysis
Meta-data
Sequencepre-
processing
Clustersequences
Build OTU table
Build phylogenetic
tree
Assign taxonomy
Alpha diversity
Betadiversity
Hypothesistesting
Data visualization
QII
ME
You can do lots of things with a .biom table produced by QIIME
• METAGENassist• interactive web tool that will do lots of stats and make
pretty pictures
• PICRUSt (google: picrust metagenomes)• infers functional potential based on your 16S data
• STAMP (google: stamp bioinformatics)
• flexible python tool (with a GUI) that will do statistical analysis of taxonomic and functional profiles on the fly
• R (phyloseq package)• If you are familiar with R, this will bridge the gap between
QIIME and Rstats
• Phinch• Interactive web-based visualization tool
METAGENassist
• Input is .biom table and “mapping file”
• can input matrix of taxonomy or functional assignments
• many options for statistical analysis
• easily generate nice plots
Some examples of METAGENassist output:
PICRUSt(Phylogenetic Investigation of Communities by Reconstruction of Unobserved States)
• .biom table input from QIIME• normalize by copy number• predict metagenome• .biom table output (with functional
categories)
Zaneveld, J.R., Lozupone, C., Gordon, J.I. & Knight, R. Ribosomal RNA diversity predicts genome diversity in gut bacteria and their relatives. Nucleic Acids Res. 38, 3869–3879 (2010)
Martiny, A.C., Treseder, K. & Pusch, G. Phylogenetic conservatism of functional traits in microorganisms. ISME J. 7, 830–838 (2013)
PICRUSt accuracy across various environmental microbiomes
PICRUSt can produce results that make sense!
Tributary contaminated by old sulfur mine
Sulfur Metabolism
STAMP
• Input is .biom table and “mapping file”• Can input matrix of taxonomy or
functional assignments• powerful statistical options• Can subsample data on the fly• Generates OK plots
Using STAMP to identify SEED subsystems which are differentially abundant between Candidatus Accumulibacter phosphatis sequences obtained from a pair of enhanced biological phosphorus removal (EBPR) sludge metagenomes(data originally described in Parks and Beiko, 2010).
phyloseq R package
• Create a phyloseq object– .biom table– “mapping file”– phylogenetic tree
• google: phyloseq demo• do stats and make plots that you can
prettify with ggplot2
phinch.org
• Add metadata to biom table
• Upload to phinch
Phinch allows you to manipulate and explore your data
Lots of data cannot compensate for a poorly designed
experiment
Bioinformatics cannot save
a poorly designed experiment
Design your experiment.
replication
controls
biases
233 sequences
Read number distribution for 60 samples on one MiSeq run
Read number distribution for 95 samples on one MiSeq run
318 sequences
Figure 3. Predicted and observed frequencies of sequence reads from each organism.
Morgan JL, Darling AE, Eisen JA (2010) Metagenomic Sequencing of an In Vitro-Simulated Microbial Community. PLoS ONE 5(4): e10209. doi:10.1371/journal.pone.0010209http://www.plosone.org/article/info:doi/10.1371/journal.pone.0010209
Standardize collection, storage, and laboratory procedures
Beware the chimera
The How:
The Why:
• too many cycles
• extension time too short
• close relatives in the mix
• less abundant taxa
Include kit / negative
controls
16S rRNA gene sequencing of a pure Salmonella bongori culture
16S rRNA gene sequencing of a pure Salmonella bongori culture
Child nasopharyngeal samples from Thailand, appears to show age-related clustering
Child nasopharyngeal samples from Thailand, extraction kit lot # explains the pattern better
Child nasopharyngeal samples from Thailand,
loss of clustering after excluding contaminant OTUs
Schloss reducingartifacts
Last Bit of Ugly Data
mock community consisting of 21 taxa3 different regions amplified4 different sequencing centers
Fecal sample
“Perfection is the enemy of progress”
WORDS OF WISDOM
Consult an expert.
WORDS OF WISDOM
Include replicates and controls.
Design your experiment!
WORDS OF WISDOM
Have a specific question.
Seek to answer THAT question.
(no pilots!)
WORDS OF WISDOM
Do microbes differ between your treatments?
Yes.
WORDS OF WISDOM
Know the answer to the question:
So now what?
(follow-up experiments)
WORDS OF WISDOM
Avoid metagenomics.