Computational metagenomics and the human microbiome Curtis
Huttenhower 01-21-11 Harvard School of Public Health Department of
Biostatistics
Slide 2
What to do with your metagenome? 2 (x10 10 ) Diagnostic or
prognostic biomarker for host disease Public health tool monitoring
population health and interactions Comprehensive snapshot of
microbial ecology and evolution Reservoir of gene and protein
functional information Whos there? What are they doing? What do
functional genomic data tell us about microbiomes? What can our
microbiomes tell us about us? * * Using terabases of sequence and
thousands of experimental results
HMP Organisms: Everyone and everywhere is different 4 Body
sites + individuals Organisms (taxa) ear gutnosemouthvaginaarm
mucosapalategingivatonsilssalivasub. plaq.sup. plaq.throattongue
Every microbiome is surprisingly different Most organisms are rare
in most places Even common organisms vary tremendously in abundance
among individuals Aerobicity, interaction with the immune system,
and extracellular medium appear to be major determinants There are
few organismal biotypes in health
HUMAnN: Community metabolic and functional reconstruction 6
Pathway coveragePathway abundance
Slide 7
HUMAnN: Validating gene and pathway abundances on synthetic
data 7 Validated on individual genes, module coverage + abundance
False negatives: short genes (
Gene expression SNP genotypes Metagenomic biomarker discovery
11 Healthy/IBD BMI Diet Taxa & pathways Batch effects?
Population structure? Niches & Phylogeny Test for correlates
Multiple hypothesis correction Feature selection p >> n
Confounds/ stratification/ environment Cross- validate Biological
story? Independent sample Intervention/ perturbation
Slide 12
LEfSe: Metagenomic class comparison and explanation 12 LEfSe
http://huttenhower.sph.harvard.edu/lefse Nicola Segata LDA + Effect
Size
Slide 13
LEfSe: Evaluation on synthetic data 13
Slide 14
Microbes characteristic of the oral and gut microbiota 14
Slide 15
Aerobic, microaerobic and anaerobic communities High
oxygen:skin, nasal Mid oxygen:vaginal, oral Low oxygen:gut
Slide 16
LEfSe: The TRUC murine colitis microbiota 16 With Wendy
Garrett
Slide 17
MetaHIT: The gut microbiome and IBD 17 WGS reads Pathways/
modules 124 subjects:99 healthy 21 UC + 4 CD ReBLASTed against KEGG
since published data obfuscates read counts Taxa Phymm Brady 2009
Genes (KOs) Pathways (KEGGs) Qin 2010 With Ramnik Xavier, Joshua
Korzenik
Slide 18
MetaHIT: Taxonomic CD biomarkers 18 Firmicutes
Enterobacteriaceae Up in CD Down in CD UC
Slide 19
MetaHIT: Functional CD biomarkers 19 Motility Transporters
Sugar metabolism Down in CD Up in CD Subset of enriched modules in
CD patientsSubset of enriched pathways in CD patients
Growth/replication
Slide 20
Sleipnir C++ library for computational functional genomics Data
types for biological entities Microarray data, interaction data,
genes and gene sets, functional catalogs, etc. etc. Network
communication, parallelization Efficient machine learning
algorithms Generative (Bayesian) and discriminative (SVM) And its
fully documented! Sleipnir: Software for scalable functional
genomics Massive datasets require efficient algorithms and
implementations. 20 Its also speedy: microbial data integration
computation takes
HMP: Metabolism, host-microbiome interactions, and microbial
taxa 24 >3200 gene families differential in the mucosa >1500
upregulated outside the mucosa and not in any Actinobacterial
genome 16S WGS