Guide to microbiome analysis

Microbiome analysisFrom experimental design to integrate other omics data

Yuan-Ming Yeh, Ph.D.

Genomic Medicine Core Laboratory,

Chang Gung Memorial Hospital, Linkou

2019.05.17 @ CGU

Microbiota, Metagenome, and Microbiome

2

a. 微生物群(Microbiota)b. 宏基因組(Metagenome)c. 微生物組(Microbiome)

First Peek at Microbes

Bacteria Morphological Diversity: fmhttp://ag.arizona.edu/plp/courses/plp329/micdivintro.ppt 3

http://microorganismsltur0629.weebly.com/introduction.html 4

Types of Microbes

https://www.slideshare.net/MohammedInzamamuddin/microbes-of-extreme-environment 5

There are 5 main types of microbes

Microbes are Small

http://www.mansfield.ohio-state.edu/~sabedon/lectures/index.html 6

Bacterial Anatomy

https://slideplayer.com/slide/13122023/ 7

What is a Taxonomy?

8https://www.kullabs.com/classes/subjects/units/lessons/notes/note-detail/1469

9https://byjus.com/cbse/taxonomy-nomenclature/

Taxonomy Nomenclature

Microbes Run the World


•The Earth is a “microbial planet” – microorganisms predate other life forms (have evolved for some 3.8 billion years) – they are the most abundant -- both in terms of numbers and distribution – Microbial activities have profound influence on the integrity and functioning of global ecosystems.

• “..The diversity and range of their environmental adaptations indicate that microbes long ago ‘solved’ many problems for which scientists are still actively seeking solutions.”(Microbial genome program, US. Department of Energy; http://microbialgenomics.energy.gov/)

Microbes and Ecology


Microbes and Ecology


Microbes and You

13

• Every part of your body that normally comes in contact with outside world (deep lungs and stomach are exceptions)

• You are “what you eat” – Human gut microbes

• “Good” and “bad” microorganisms

Microbes and Industry

14

• Industry: Fermentation products (ethanol, acetone, etc.)

• Food: Wine, cheese, yogurt, bread, half-sour pickles, etc.

• Biotech: Recombinant products (e.g., human insulin, vaccines)

• Environment: Bioremediation

• Bugs+Plus: to digest oil and other petroleum derivatives.

Potential Microbial Applications

15

• Cleanup of toxic-waste sites worldwide. • Production of novel therapeutic and preventive agents and

pathways. • Energy generation and development of renewable energy sources

(e.g., methane and hydrogen). • Production of chemical catalysts, reagents, and enzymes to

improve efficiency of industrial processes. • Management of environmental carbon dioxide, which is related to

climate change. • Detection of disease-causing organisms and monitoring of the

safety of food and water supplies. • Use of genetically altered bacteria as living sensors (biosensors) to

detect harmful chemicals in soil, air, or water. • Understanding of specialized systems used by microbial cells to

live in natural environments with other cells. (http://microbialgenomics.energy.gov/)

Metagenomics history

Craig Venter

Celera GenomicsThe Institute for Genomic ResearchThe J. Craig Venter Institute

Global Ocean Sampling Expedition (GOS)

The pilot project, conducted in the Sargasso Sea, found DNA from nearly 2000 different species, including 148 types of bacteria never before seen.

shotgun sequencing

The big picture

Explore the relationship between microbes and their habitat

To accomplish this, we use a series of experimental and computational techniques to make inferences about the community:- Marker genes - Metagenomes- Metatranscriptomes- Metaproteomes- Metametabolomes- “Culturomes”

Bioinformatics.ca

Human gut microbiome: 2-3 million genes

Typically > 160 “species” at any given sampling time

Host: ~25,000 genes

Qin et al., Nature (2010)

The Human Microbiome

Bioinformatics.ca


HMP 19


Darryl Leja, NHGRI 20

There are both tremendous similarities and differences among the bacterial species that predominate at different sites on the human body. Colors represent different phyla and families of bacteria.

From Molecular Biology to Microbiome

Garrett 2015; Claesson et.al. 2017 21

Hypothesis:• changes in the microbiome (longitudinal analysis) • whether microbiome differences correlate with clinical phenotypes

(cross‐sectional or cohort analysis)

How many samples per group ?

22

Statistics: • suggestion 5x samples, minimum 3x samples

Biologics: • plant, soil, water => 10x samples, • human gut => 20x samples

Oxygen

Diet

PH

Moisture

Exercise

Light

Supplement

Experimental design considerations

23Knight et.al. 2018

Sample types / collection

Claesson et.al. 2017 24

correlate with clinical phenotypes (cross‐sectional or cohort analysis)

Sample collection (Human Microbial Project)

Claesson et.al. 2017 25

Accelerated by NGS, predominately 454 sequencing because of the longer read length, now more with Illumina based chemistry.Organism no longer needs to be cultivated and cloned — Culture independent insightDirect sequencing from environment as a “community”You can pool multiple samples together

NGS and metagenomics

Not all microbes can be cultured

Sequencing Run Capacity

ER Mardis. Nature 470, 198-203 (2011) doi:10.1038/nature09796

Sequencing platforms for Microbiome studies

28Contreras et.al. 2016

Sequencing platforms

Contreras et.al. 2016 29

Metagenomic reads vs 16S rRNA for microbial diversity identification

Metagenome

DNA Isolation

Fragmentation of DNA

Metagenomic Reads

Amplification of 16S rRNA

16S rRNA from multiple species

Microbial diversity

Microbial diversity

30

Bioinformatic analysis

31Contreras et.al. 2016

16S rRNA – a “gold standard” for microbial molecular identification

• Universal • Highly conserved• Long enough (~1500 bp) to provide significant discrimination

between many species• Structural information can guide alignment and phylogenetic

reconstruction• Many species now represented in the database

16S rRNA gene sequencing

Earlier By sequencing whole gene

Now By sequencing short variable regions

Limitations:

• Insufficient and underestimated diversity

16S ribosomal protein• highly conserved between different species of bacteria and archaea

• whereas the rest of genetic content varies greatly across species

• 16S RNA can be used for taxonomical classification

16S gene coverage (RDP release 11.1 )

From Wang et. al., AEM, 2007

Classifier Accuracy on 200 bp Regions

Analysis approach

Taxonomy independent analysisReads are group into operational taxonomic units

(OTU) based on a specified sequence variation.

Taxonomy dependent analysisAssignment at the level of domain, phylum, class,

order, family, genus, and speciesRequire a reference database

Taxonomy independent analysis Group reads into OTU based on certain imposed similarity threshold

In study of bacteria, 97% seems like a good starting pointSpecies dependent, genes dependent, threshold may

vary1 OTU = 1 organism

Extract a OTU representative sequence

Most common sequenceSequence that has minimum difference to all other

sequences in the same OTU

Taxonomy dependent analysis Classify sequences BLAST Simply BLAST what you have (phylotyping) MEGAN

Online RDP classifier (Ribosomal Database Project ) RDP 10.26 (Release 11, Update 5 consists of 3,356,809 aligned

and annotated 16S rRNA sequences Limited by number of reads you can submit

Online Greengenes classifier based on NAST alignment Require pre-aligned dataset Limited by number of reads you can submit

Greengenes

http://greengenes.lbl.gov/

http://greengenes.lbl.gov/

SILVA

http://www.arb-silva.de/

http://www.arb-silva.de/

RDP

http://rdp.cme.msu.edu/

http://rdp.cme.msu.edu/

One common phylotyping workflow

• Run the blast aligners with the reads against the NCBI bacterial database (can be very time consuming)

• Use MEGAN – Metagenome Analyzer to process the results

http://ab.inf.uni-tuebingen.de/software/megan/

http://ab.inf.uni-tuebingen.de/software/megan/

The mothur package

• Primarily OTU based but it has phylotyping functionality built in

http://www.mothur.org/


The QIIME package

• Takes users data easy to OTU picking, taxonomic assignment, and construction of phylogenetic trees from representative sequences of OTUs, and through downstream statistical analysis, visualization, and production of publication-quality graphics.

http://qiime.org/


NCBI taxonomy

EBI Metagenomics

Analysis tools for microbiome

47

Major concerns in metagenomic analysis

48

Data Quality• Sequencing errors

• Introduced in workup• Error rates, error type (PacBio: 10% random, Illumina – 0.1%

substitution) • Chimeras

• Amplification artifacts, cloning of restriction fragments• Metadata Acquisition and Availability

• Studying complex ecosystem – multifactorial• Public datasets, often metadata embedded in publications or

simply not available

Bioinformatics.ca

Comparability / Reproducibility• 16S: different V regions give different results

• Different sequencing platforms / sampling conditions ALSO give different results

• Eisen paper about different recoveries under different conditions

• Workflow complexity / plethora of tools• Difficult to evaluate tools for microbiome analysis • Ground truth hard to establish for microbiome samples• Use of mock communities or simulated data

Bioinformatics.ca

Linkage and resolution• Strain-level diversity in metagenomes will often be missed due to

difficulties in differentiating minor variants from sequencing errors

• Should you assemble metagenomic reads? • Longer sequences have more information• By assembling the reads, you could create chimeric contigs

consisting of DNA fragments from different (non-clonal) organisms

Bioinformatics.ca

Taxonomy and OTUs

RDP taxonomic predictions+

taxonomy in general

OTUs – arbitrary, quasi-phylogenetic

Seed sequences

???

De novo

97%

Bioinformatics.ca

Analysis workflow

Knight et.al. 2018 53

high-level workflows

Bioinformatics.ca

Analysis Pros and cons

55

Knight et.al. 2018

16S /18S, ITS

WMGS

Meta-transcriptome

Metagenomics Exercise

57

58

Chimeras

59

https://doi.org/10.1101/074252

Chimeras are sequences formed from two or more biological sequences joined together. Amplicons with chimeric sequences can form during PCR. Chimeras are rare with shotgun sequencing, but are common in amplicon sequencing when closely related sequences are amplified. Although chimeras can be formed by a number of mechanisms, the majority of chimeras are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence.

http://en.wikipedia.org/wiki/Polymerase_chain_reaction

UNOISE

60

https://doi.org/10.1101/081257http://dx.doi.org/10.1038/nmeth.2604

Schematic of the UNOISE2 denoisingstrategy

UPARSE

SINTAX: a simple non-Bayesian taxonomy classifier

61

MiSeq 2x250 16S V4 Exercise• Description

This example shows a typical analysis pipeline for MiSeq paired reads. There are four samples: Human, Mouse, Soil and Mock with ~4k reads each. Human and Mouse are fecal samples. Data is from Kozich et al. 2013.

62

Prepare material

• “Usearch” program upload

• Reads:• wget https://drive5.com/downloads/ex_miseq_reads.tar.gz

• Sintax reference database:• wget https://drive5.com/sintax/rdp_16s_v16.fa.gz

https://doi.org/10.1128/AEM.01043-13

https://drive5.com/downloads/ex_miseq_reads.tar.gz

STEP1

63

• # Merge paired reads• # Add sample name to read label (-relabel option)• # Pool samples together in raw.fq (Linux cat command)

for Sample in Mock Soil Human Mousedo

$usearch11 -fastq_mergepairs ../data/${Sample}*_R1.fq \-fastqout $Sample.merged.fq -relabel $Sample.

cat $Sample.merged.fq >> all.merged.fqdone

STEP2

65

• # Strip primers (V4F is 19, V4R is 20)

$usearch11 -fastx_truncate all.merged.fq -stripleft 19 -stripright 20 \-fastqout stripped.fq

• # Quality filter

$usearch11 -fastq_filter stripped.fq -fastq_maxee 1.0 \-fastaout filtered.fa -relabel Filt

STEP3

66

• # Find unique read sequences and abundances

$usearch11 -fastx_uniques filtered.fa -sizeout -relabel Uniq -fastaoutuniques.fa

• # Make 97% OTUs and filter chimeras

$usearch11 -cluster_otus uniques.fa -otus otus.fa -relabel Otu

• # Denoise: predict biological sequences and filter chimeras

$usearch11 -unoise3 uniques.fa -zotus zotus.fa

################################################### Downstream analysis of OTU sequences & OTU table# Can do this for both OTUs and ZOTUs, here do# just OTUs to keep it simple.##################################################• # Make OTU table$usearch11 -otutab all.merged.fq -otus otus.fa -otutabout otutab_raw.txt

• # random subsampling to 0.5k reads / sample$usearch11 -otutab_rare otutab_raw.txt -sample_size 500 -output otutab.txt

• # Alpha diversity$usearch11 -alpha_div otutab.txt -output alpha.txt

• # Make OTU tree$usearch11 -calc_distmx otus.fa -tabbedout mx.txt -maxdist 0.2 -termdist 0.3 $usearch11 -cluster_aggd mx.txt -treeout otus.tree

68

• # Beta diversitymkdir beta/$usearch11 -beta_div otutab.txt -tree otus.tree -filename_prefix beta/

• # Rarefaction!!$usearch11 -alpha_div_rare otutab.txt -output rare.txt

• # Predict taxonomy$usearch11 -sintax otus.fa -db rdp_16s_v16.fa -strand both \-tabbedout sintax.txt -sintax_cutoff 0.8

• # Taxonomy summary reports$usearch11 -sintax_summary sintax.txt -otutabin otutab.txt -rank g -output genus_summary.txt$usearch11 -sintax_summary sintax.txt -otutabin otutab.txt -rank p -output phylum_summary.txt

• # Find OTUs that match mock sequences$usearch11 -uparse_ref otus.fa -db ../data/mock.merged.fq -strand plus \-uparseout uparse_ref.txt -threads 1

70

Documents

Guide to microbiome analysis