Upload
madsalbertsen
View
2.224
Download
0
Tags:
Embed Size (px)
Citation preview
Metagenomics- Potentials and pitfalls
Mads AlbertsenMEWE 2013
CENTER FOR MICROBIAL COMMUNITIES
Agenda
Introduction
Pitfalls
Potentials
Recommendations
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Introduction
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Genome = Parts list of a single genome
Introduction
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Metagenome = Parts list of the community
Photo: D. Kunkel; color, E. Latypova
Introduction
”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”
- J. Handelsman et al., 1998
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Introduction
PubMed: metagenom*[Title/Abstract]
”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”
- J. Handelsman et al., 1998
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Introduction
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”
- J. Handelsman et al., 1998
PubMed: metagenom*[Title/Abstract]
Sequencing costs
http://www.genome.gov/sequencingcosts/
Introduction
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Metagenomics ≠ Amplicon sequencing
Sequencing and assembly
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
≈3.000.000 bppr. genome
≈1000 bp+contigs
150 bp reads
Assigning information
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Contigs
Function
Taxonomy
Databases
Binning
What have metagenomics been used for?
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Rusch et al., 2007 Plos Biology
Exploration
Qin et al., 2010 Nature
• 6.3 Gbp of sequence (2x Human genomes, 2000 x Bacterial genomes)
• Most sequences were novel compared to the databases
• 127 Human gut metagenomes• 600 Gbp sequence (200 x Human genomes)• 3.3 million genes identified• Minimal gut metagenome definded
What have metagenomics been used for?
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
• A characteristic microbial fingerprint for each of the nine different ecosystem types
Dinsdale et al., 2008 Nature
Comparative Specific functions
Hess et al., 2011 Science
• Identified 27.755 putative carbohydrate-active genes from a cow rumen metagenome
• Expressed 90 candidates of which 57% had enzymatic activity against cellulosic substrates
What have metagenomics been used for?
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
• Genome extraction from low complexity metagenome
• Candidatus Accumulibacter phosphatis• The first genome of a polyphosphate
accumulating organism (PAO) with a major role en enhanced biological phosphorus removal
Extracting genomes
• Genome extraction of low abundant species (< 0.1%) from metagenomes
• First complete TM7 genome• Access to genomes of the ”uncultured
majority”
Garcia Martin et al., 2006 Nat. Biotechnol. Albertsen et al., 2013 Nat. Biotechnol.
Pitfalls
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Metagenomics made easy
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Great resources – but use with care
MG-RAST example
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Contigs
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Dataset overview
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
FunctionTaxonomy
Taxonomy and Function overview
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Compare with other samples
Samples Functional categories
Pitfalls
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
You always get billions of data!
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Pitfalls
Is your DNA extraction OK?... and the samples you want to compare with?
Did you sequence enough?Did you know the GC bias of your protocol?Did you normalize for sequencing depth?Did you use the same sequencing platform?
Assembly = data not quantitative!Are you comparing assembled data with reads?
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Databases
Contigs
Databases
...you only see what is in the database
Annotated metagenome
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
What is in the databases?
PhylaClassOrderSpecies
2946
1001268
90249405
99322
Genomes 16S
Finshed Genomes in IMGVs.
Greengenes 16S rRNA database
Note: only including 1 strain pr. species
*97% clustering
*
MG-RAST example
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Contigs
650.000 EBPR proteins with taxonomy assigned
How similar are they to the genomes in the database?
Sludge microbes vs. Database genomes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
650.000 EBPR proteins
Note: not abundance weighted
Sludge microbes vs. Database genomes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
650.000 EBPR proteins1.260.000 Human gut
Qin et al., 2010 NatureRAST ID: 4448044.3
Note: not abundance weighted
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Sludge microbes vs. Database genomes
The 7 genera with most EBPR proteins assigned
Effect of missing genomes
What is the effect of not having closely related genomes in the database?
1. Remove a genome from the database
2. Search the removed genome against the database
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Effect of missing genomes
Best hit
Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5
Accumulibacter phosphatis
blastp
Related genomes
4326 proteins
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Effect of missing genomes
Best hit
Accumulibacter phosphatis
blastp
Related genomes
4326 proteinsAzoarcus
Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Effect of missing genomes
MEGAN LCA
Accumulibacter phosphatis
blastp
Lowest common ancester (LCA) approach:Hit 1: Beta-proteobacteria 80% IDHit 2: Gamma-proteobacteria 79% IDHit 3: Actinobacteria 59% ID
Assigned to Proteobacteria
Related genomes
4326 proteins
Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Effect of missing genomes
MEGAN LCA
Accumulibacter phosphatis
blastp
Genus
No hits 261
Bacteria 325
Proteobacteria 860
Beta- 853
Rhodocyclaceae 1149
4326 proteins:• 27% correctly
classified on genus level
• 54% not assigned the correct class
• 101 genera identified
Related genomes
Lowest common ancester (LCA) approach:Hit 1: Beta-proteobacteria 80% IDHit 2: Gamma-proteobacteria 79% IDHit 3: Actinobacteria 59% ID
Assigned to Proteobacteria
4326 proteins
Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Effect of missing genomes
MEGAN LCA
Nitrospira defluvii
Bacteria 1268Nitrospirae 3
blastp
Related genomes
4268 proteins:• 1% correctly
classified on phylum level
Phylum
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Effect of missing genomes
MEGAN LCA+
KEGG
Nitrospira defluvii
blastp
Related genomesBacteria 1268Nitrospirae 3
What about function?
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Effect of missing genomes
MEGAN LCA+
KEGG
Nitrospira defluvii
blastp
Related genomesBacteria 1268Nitrospirae 3
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Effect of missing genomes
Nitrospira defluvii
blastp
Related genomes
MEGAN LCA+
KEGG
Bacteria 1268Nitrospirae 3
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Implication of missing genomes
Function A
Function B
Function C
Function D
Pitfalls
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
You always get billions of data!
Potentials
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Potentials
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
1. Hunting novel antibiotic resistance genes
2. Extracting genomes from metagenomes
Hunting novel antibiotic resistance genes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
What if you want to find something that is not in the
database?
Hunting novel antibiotic resistance genes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Functional metagenomics
M. Sommer, DTU, Denmark (in prep)
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Hunting novel antibiotic resistance genes
89 different antibiotic resistance genes
19 novel
M. Sommer, DTU, Denmark (in prep)
Hunting novel antibiotic resistance genes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
How abundant are the antibiotic genes in the
environment?
Hunting novel antibiotic resistance genes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
The number of metagenome reads
reflect the abundance of the bacteria.
Bacteria Reads
Hunting novel antibiotic resistance genes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Bacteria Reads
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Hunting novel antibiotic resistance genes
Bacteria Reads
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Hunting novel antibiotic resistance genes
Metagenomes
Antib
iotic
gen
es
89 different antibiotic resistance genes
M. Sommer, DTU, Denmark (in prep)
Extracting genomes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
≈3.000.000 bppr. genome
≈1000 bp+contigs
150 bp reads
Why not full genomes?
Extracting genomes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
≈3.000.000 bppr. genome
≈1000 bp+contigs
150 bp reads
Why not full genomes?
1. Micro-diversity
2. Separation of genomes (Binning)
Extracting genomes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Not 1 strain
Many closely related strains
AAAAAAAAAAAAAA
AAAAAAAAATAAAA
AAAAAAAAACAAAA
AAAAAAAAA
TAAAA
CAAAA
What you get
AAAAA
Assembly
Extracting genomes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Extracting genomes
Metagenome assembly is not quantitative!
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Reduce microdiversity
Low micro-diversityHigh micro-diversity
Short term enrichment
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
≈3.000.000 bppr. genome
≈1000 bp+contigs
150 bp reads
Why not full genomes?
1. Micro-diversity
2. Separation of genomes (Binning)
Extracting genomes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
BinningGenomic signatures:- GC / Codon usage- Tetranucleotide frequency + statistical method
Complex sample
PhD student
”Binning”
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
BinningGenomic signatures:- GC / Codon usage- Tetranucleotide frequency + statistical method
Complex sample
PhD student
”Binning”
Problems:- Short pieces of sequence (1-10kbp)- Local sequence divergence
Sequence composition-independent binning
Sample 1
Abun
danc
e
Sample 2
Abun
danc
e
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Binning
Sequence composition-independent binning
Sample 1 Sample 2
Abundance Sample 1
Abun
danc
e Sa
mpl
e 2
Abun
danc
e
Abun
danc
e
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Binning
1. Reduce micro-diversity
2. Use multiple related samples
Abundance Sample 1
Abun
danc
e Sa
mpl
e 2
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Binning
1. Reduce micro-diversity
2. Use multiple related samples
Abundance Sample 1
Abun
danc
e Sa
mpl
e 2
Abundance Sample 1
Abun
danc
e Sa
mpl
e 2
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Binning
Simple reactors
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYH. Daims & C. Dorninger, DOME, University of Vienna
• Nitrospira enrichment running for years
• 3 dominant species
• No micro-diversity
Short term enrichment
Full-scale EBPR plantSBR reactor
Days 1. Reduction of (micro)-diversity
Competibacter
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
Short term enrichment
Full-scale EBPR plantSBR reactor
2. Two different
DNA extraction methods
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
Colored using a set of 100 phylogenetic marker genes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
Colored using a set of 100 phylogenetic marker genes
TM7-1 (1.6%)
TM7-2 (0.7%)
TM7-3 (0.2%)
TM7-4 (0.06%)
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
Zoom on target
TM7-2 (0.7%)
Colored using a set of 100 phylogenetic marker genes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
Zoom on target
PC2
PC1
TM7-2
PCA on genomic signatures
TM7-2 (0.7%)
Colored using a set of 100 phylogenetic marker genes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
Colored using a set of 100 phylogenetic marker genes
TM7-1 (1.6%)
Candidate phylum TM7
Saccharibacteria
Candidatus Saccharimonas aalborgensis
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
CandidatusCompetibacter denitrificans
(10.6%)
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
Poster by S. McIlroy
Genome assembly validation
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
Phyla
Genes (HMM model)
Essential single copy genesAssembly inspection
Multi-metagenome
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
http://madsalbertsen.github.io/multi-metagenome/Short: goo.gl/0ctA3
• Guides• Workflow scripts• Example data• All the code• Reccomendations
Multi-metagenome
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Highly complex environments...
...add more samples!
Talk by SM. Karst
Potentials
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Metabolites
Proteins
mRNA
DNA
Meta-bolomics
Meta-proteomics
Meta-transcriptomics
Meta-genomics
Data integration
In Situ methods
Community structure Microbial functions
Extraction
P-Removal:
N-Removal:
-Removal:
Foaming:
Ethanol production:
Microbial needsEcology
Recommendations
• Do you really need metagenomics?
• Are the databases usefull in your environment?• Unless human related they are not...
• Metagenomics is just the parts list ... of the DNA that could be extracted... and the functions that could be annotated
• Validation, validation validation!• Bioinformatic• In situ
• Genome extraction from simple reactors is possible• Enables comprehensive transcriptomics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Metagenomics is pretty...
...but not always informative