Computational metagenomics and the human microbiome Curtis Huttenhower 01-21-11 Harvard School of...
Preview:
Citation preview
- Slide 1
- Computational metagenomics and the human microbiome Curtis
Huttenhower 01-21-11 Harvard School of Public Health Department of
Biostatistics
- Slide 2
- What to do with your metagenome? 2 (x10 10 ) Diagnostic or
prognostic biomarker for host disease Public health tool monitoring
population health and interactions Comprehensive snapshot of
microbial ecology and evolution Reservoir of gene and protein
functional information Whos there? What are they doing? What do
functional genomic data tell us about microbiomes? What can our
microbiomes tell us about us? * * Using terabases of sequence and
thousands of experimental results
- Slide 3
- The Human Microbiome Project 3 2007 - ongoing 300 normal
adults, 18-40 16S rDNA + WGS 5 sites/18 samples + blood Oral
cavity: saliva, tongue, palate, buccal mucosa, gingiva, tonsils,
throat, teeth Skin: ears, inner elbows Nasal cavity Gut: stool
Vagina: introitus, mid, fornix Reference genomes (~200+800) All
healthy subjects; followup projects in psoriasis, Crohns, colitis,
obesity, acne, cancer, antibiotic resistant infection Hamady, 2009
Kolenbrander, 2010
- Slide 4
- HMP Organisms: Everyone and everywhere is different 4 Body
sites + individuals Organisms (taxa) ear gutnosemouthvaginaarm
mucosapalategingivatonsilssalivasub. plaq.sup. plaq.throattongue
Every microbiome is surprisingly different Most organisms are rare
in most places Even common organisms vary tremendously in abundance
among individuals Aerobicity, interaction with the immune system,
and extracellular medium appear to be major determinants There are
few organismal biotypes in health
- Slide 5
- HUMAnN: Community metabolic and functional reconstruction 5 WGS
reads Pathways/ modules Genes (KOs) Pathways (KEGGs) Functional
seq. KEGG + MetaCYC CAZy, TCDB, VFDB, MEROPS BLAST Genes Genes
Pathways MinPath (Ye 2009) Smoothing Witten-Bell Gap filling c(g) =
max( c(g), median ) 300 subjects 1-3 visits/subject ~6 body
sites/visit 10-200M reads/sample 100bp reads BLAST ? Taxonomic
limitation Rem. paths in taxa < ave. Xipe Distinguish zero/low
(Rodriguez-Mueller in review) HMP Unified Metabolic Analysis
Network
- Slide 6
- HUMAnN: Community metabolic and functional reconstruction 6
Pathway coveragePathway abundance
- Slide 7
- HUMAnN: Validating gene and pathway abundances on synthetic
data 7 Validated on individual genes, module coverage + abundance
False negatives: short genes (
- Gene expression SNP genotypes Metagenomic biomarker discovery
11 Healthy/IBD BMI Diet Taxa & pathways Batch effects?
Population structure? Niches & Phylogeny Test for correlates
Multiple hypothesis correction Feature selection p >> n
Confounds/ stratification/ environment Cross- validate Biological
story? Independent sample Intervention/ perturbation
- Slide 12
- LEfSe: Metagenomic class comparison and explanation 12 LEfSe
http://huttenhower.sph.harvard.edu/lefse Nicola Segata LDA + Effect
Size
- Slide 13
- LEfSe: Evaluation on synthetic data 13
- Slide 14
- Microbes characteristic of the oral and gut microbiota 14
- Slide 15
- Aerobic, microaerobic and anaerobic communities High
oxygen:skin, nasal Mid oxygen:vaginal, oral Low oxygen:gut
- Slide 16
- LEfSe: The TRUC murine colitis microbiota 16 With Wendy
Garrett
- Slide 17
- MetaHIT: The gut microbiome and IBD 17 WGS reads Pathways/
modules 124 subjects:99 healthy 21 UC + 4 CD ReBLASTed against KEGG
since published data obfuscates read counts Taxa Phymm Brady 2009
Genes (KOs) Pathways (KEGGs) Qin 2010 With Ramnik Xavier, Joshua
Korzenik
- Slide 18
- MetaHIT: Taxonomic CD biomarkers 18 Firmicutes
Enterobacteriaceae Up in CD Down in CD UC
- Slide 19
- MetaHIT: Functional CD biomarkers 19 Motility Transporters
Sugar metabolism Down in CD Up in CD Subset of enriched modules in
CD patientsSubset of enriched pathways in CD patients
Growth/replication
- Slide 20
- Sleipnir C++ library for computational functional genomics Data
types for biological entities Microarray data, interaction data,
genes and gene sets, functional catalogs, etc. etc. Network
communication, parallelization Efficient machine learning
algorithms Generative (Bayesian) and discriminative (SVM) And its
fully documented! Sleipnir: Software for scalable functional
genomics Massive datasets require efficient algorithms and
implementations. 20 Its also speedy: microbial data integration
computation takes
- HMP: Metabolism, host-microbiome interactions, and microbial
taxa 24 >3200 gene families differential in the mucosa >1500
upregulated outside the mucosa and not in any Actinobacterial
genome 16S WGS