Microbiome Analysis from sample to data MGL Users Group June
18, 2014
Slide 2
Assaying Microbial Content One of the most common approaches is
to sequence 16S ribosomal RNA amplicons. Another option is shotgun
sequencing of the community, assembling the sequences, and
assigning the identified genes to metabolic pathways. If a finer
level of detail is required, most often 16S is sequenced followed
by generalized sequencing for a finer species resolution (or
validation).
Slide 3
16S rRNA Sequencing Databases of all known 16S sequences have
been compiled (Silva, GreenGenes, others). Either targeted
amplicons of variable regions or whole 16S sequencing. Isolate
gDNA, PCR amplify using universal 16S primers. imers Primer pair 1
Primer pair 2 Primer pair 3
Slide 4
16S rRNA Sequencing Shear amplicon using Covaris focused
acoustics
Slide 5
The Microbiome Library Libraries have adaptor sequences at both
ends used for PCR and sequencing priming. P1 is the universal
Forward primer sequence. P2 has an embedded barcode sequence.
Between the two adapter ends is the DNA which will be sequenced
from the P1 forward, and Barcode regions (green arrows). Note:
Adapter sequences DIFFER from Illumina if other preparations are to
be adapted to this platform.
Slide 6
Bead Preparation from Libraries The pool of libraries is
subjected to emulsion PCR to populate beads. Oil micro-reactors are
titrated such that each bead is populated by a single template.
Unpopulated beads are removed in subsequent cleanup.
Slide 7
16S rRNA Sequencing Bead Preparation from Libraries Nick
translate Amplify Quantitate
Slide 8
Slide Deposition of enriched beads Beads are flowed into, and
then adhered to, the FlowChip lanes. Optimum density is 160 million
beads per lane.
Slide 9
ABI SOLiD 5500xl 16S rRNA Sequencing
Slide 10
The resulting library is sequenced. We do 75 bp on one end
(Exact Call Chemistry; most commonly done on a long-read platform
[454, MiSeq, etc.]). We generate millions of reads (most commonly
generate thousands). Reads are aligned to the database of 16S
sequences to the possible level of resolution. We keep only
uniquely aligned reads.
Slide 11
Data Analysis - OTUs Sequences are often reported in OTUs
(Operational Taxonomic Units) Due to high levels of identity in
related 16S sequences, typically some identity threshold is applied
and similar sequences are collapsed into OTU sequences (commonly at
97% identity) As a result, the level of taxonomic resolution for
individual OTU sequences can vary, even at the same identity
threshold.
Slide 12
OTU examples 367523k__Bacteria; p__Bacteroidetes;
c__Flavobacteriia; o__Flavobacteriales; f__Flavobacteriaceae;
g__Flavobacterium; s__ 187144k__Bacteria; p__Firmicutes;
c__Clostridia; o__Clostridiales; f__; g__; s__ 836974k__Bacteria;
p__Cyanobacteria; c__Chloroplast; o__Cercozoa; f__; g__; s__
310669k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales;
f__; g__; s__ 823916k__Bacteria; p__Proteobacteria;
c__Gammaproteobacteria; o__Pseudomonadales; f__Moraxellaceae;
g__Enhydrobacter; s__ 878161k__Bacteria; p__Acidobacteria;
c__Acidobacteriia; o__Acidobacteriales; f__Acidobacteriaceae;
g__Terriglobus; s__ 3064251k__Bacteria; p__Verrucomicrobia;
c__Opitutae; o__Puniceicoccales; f__Puniceicoccaceae;
g__Puniceicoccus; s__ 1138555k__Bacteria; p__Firmicutes;
c__Clostridia; o__Clostridiales; f__Caldicoprobacteraceae;
g__Caldicoprobacter; s__ 3918k__Bacteria; p__Spirochaetes;
c__Spirochaetes; o__Spirochaetales; f__Spirochaetaceae;
g__Treponema; s__ 339472k__Bacteria; p__Proteobacteria;
c__Alphaproteobacteria; o__Rhodospirillales; f__Rhodospirillaceae;
g__; s__ 4457583k__Bacteria; p__; c__; o__; f__; g__; s__
(k=kingdom; p=phylum; c=class; o=order; f=family; g=genus;
s=species) Typically lack species-level resolution (as seen in the
example subset), but some get down to Order, Family, or even Genus.
A few really not identifiable.
Slide 13
Gut microbiota Primarily comprised of Firmicutes and
Bacteroidetes. Balance between these populations has been linked to
obesity in mice. Example using public dataset, treated to simulate
short 75bp random reads 3 Simulations on same dataset
TestLib1TestLib2 TestLib1 TestLib3
Slide 14
Initial trial run
Slide 15
... Hundreds of lines
Slide 16
Initial run results 4 mice run: 2 WT; 2 KO Phylum resolution
N=7.5 Million E1E2 E3E4
Slide 17
Initial run results 4 mice run: 2 WT; 2 KO Class resolution
N=7.5 Million
Slide 18
Initial run results 4 mice run: 2 WT; 2 KO Order resolution
N=7.4 Million
Slide 19
Initial run results 4 mice run: 2 WT; 2 KO Family resolution
N=2.2 Million
Slide 20
Initial run results 4 mice run: 2 WT; 2 KO Genus resolution
N=1.1 Million (Species N=340K; More complex)
Slide 21
For more information: Website: mgl.nichd.nih.gov List serv:
MGL-USERS-L Email: [email protected] Phone:
301-402-4563 Walk-in: Bldg 10/Rm 9D41
Slide 22
Slide 23
Typical Approach Use long reads from a whole, intact amplicon
(A few thousand reads typically used) Perform trimming, remove
chimeric sequences, join overlaps in paired ends, etc. Compare
sequences to database through BLAST or comparison to a prepared
multi-sequence alignment. Compare / clean data Assign resolve
taxonomy, describe distribution Compare populations across
conditions, etc. (Statistical digging)
Slide 24
Alternative approach using short reads Amplify 16S or amplicon
as normal. Randomly shear to construct a typical short read library
comprising random starts/ends. Generate millions of reads. Assign
reads that only map unambiguously to OTUs using short read
aligners. Analyze normally from OTU populations. A more wasteful
approach, but in practice performs just as well. Utilizes higher
throughput instruments vs lower capacity long-read platforms.