Computational metagenomics and the human microbiome Curtis Huttenhower 01-21-11 Harvard School of Public Health Department of Biostatistics

  • View
    222

  • Download
    3

Embed Size (px)

Citation preview

  • Slide 1
  • Computational metagenomics and the human microbiome Curtis Huttenhower 01-21-11 Harvard School of Public Health Department of Biostatistics
  • Slide 2
  • What to do with your metagenome? 2 (x10 10 ) Diagnostic or prognostic biomarker for host disease Public health tool monitoring population health and interactions Comprehensive snapshot of microbial ecology and evolution Reservoir of gene and protein functional information Whos there? What are they doing? What do functional genomic data tell us about microbiomes? What can our microbiomes tell us about us? * * Using terabases of sequence and thousands of experimental results
  • Slide 3
  • The Human Microbiome Project 3 2007 - ongoing 300 normal adults, 18-40 16S rDNA + WGS 5 sites/18 samples + blood Oral cavity: saliva, tongue, palate, buccal mucosa, gingiva, tonsils, throat, teeth Skin: ears, inner elbows Nasal cavity Gut: stool Vagina: introitus, mid, fornix Reference genomes (~200+800) All healthy subjects; followup projects in psoriasis, Crohns, colitis, obesity, acne, cancer, antibiotic resistant infection Hamady, 2009 Kolenbrander, 2010
  • Slide 4
  • HMP Organisms: Everyone and everywhere is different 4 Body sites + individuals Organisms (taxa) ear gutnosemouthvaginaarm mucosapalategingivatonsilssalivasub. plaq.sup. plaq.throattongue Every microbiome is surprisingly different Most organisms are rare in most places Even common organisms vary tremendously in abundance among individuals Aerobicity, interaction with the immune system, and extracellular medium appear to be major determinants There are few organismal biotypes in health
  • Slide 5
  • HUMAnN: Community metabolic and functional reconstruction 5 WGS reads Pathways/ modules Genes (KOs) Pathways (KEGGs) Functional seq. KEGG + MetaCYC CAZy, TCDB, VFDB, MEROPS BLAST Genes Genes Pathways MinPath (Ye 2009) Smoothing Witten-Bell Gap filling c(g) = max( c(g), median ) 300 subjects 1-3 visits/subject ~6 body sites/visit 10-200M reads/sample 100bp reads BLAST ? Taxonomic limitation Rem. paths in taxa < ave. Xipe Distinguish zero/low (Rodriguez-Mueller in review) HMP Unified Metabolic Analysis Network
  • Slide 6
  • HUMAnN: Community metabolic and functional reconstruction 6 Pathway coveragePathway abundance
  • Slide 7
  • HUMAnN: Validating gene and pathway abundances on synthetic data 7 Validated on individual genes, module coverage + abundance False negatives: short genes (
  • Gene expression SNP genotypes Metagenomic biomarker discovery 11 Healthy/IBD BMI Diet Taxa & pathways Batch effects? Population structure? Niches & Phylogeny Test for correlates Multiple hypothesis correction Feature selection p >> n Confounds/ stratification/ environment Cross- validate Biological story? Independent sample Intervention/ perturbation
  • Slide 12
  • LEfSe: Metagenomic class comparison and explanation 12 LEfSe http://huttenhower.sph.harvard.edu/lefse Nicola Segata LDA + Effect Size
  • Slide 13
  • LEfSe: Evaluation on synthetic data 13
  • Slide 14
  • Microbes characteristic of the oral and gut microbiota 14
  • Slide 15
  • Aerobic, microaerobic and anaerobic communities High oxygen:skin, nasal Mid oxygen:vaginal, oral Low oxygen:gut
  • Slide 16
  • LEfSe: The TRUC murine colitis microbiota 16 With Wendy Garrett
  • Slide 17
  • MetaHIT: The gut microbiome and IBD 17 WGS reads Pathways/ modules 124 subjects:99 healthy 21 UC + 4 CD ReBLASTed against KEGG since published data obfuscates read counts Taxa Phymm Brady 2009 Genes (KOs) Pathways (KEGGs) Qin 2010 With Ramnik Xavier, Joshua Korzenik
  • Slide 18
  • MetaHIT: Taxonomic CD biomarkers 18 Firmicutes Enterobacteriaceae Up in CD Down in CD UC
  • Slide 19
  • MetaHIT: Functional CD biomarkers 19 Motility Transporters Sugar metabolism Down in CD Up in CD Subset of enriched modules in CD patientsSubset of enriched pathways in CD patients Growth/replication
  • Slide 20
  • Sleipnir C++ library for computational functional genomics Data types for biological entities Microarray data, interaction data, genes and gene sets, functional catalogs, etc. etc. Network communication, parallelization Efficient machine learning algorithms Generative (Bayesian) and discriminative (SVM) And its fully documented! Sleipnir: Software for scalable functional genomics Massive datasets require efficient algorithms and implementations. 20 Its also speedy: microbial data integration computation takes
  • HMP: Metabolism, host-microbiome interactions, and microbial taxa 24 >3200 gene families differential in the mucosa >1500 upregulated outside the mucosa and not in any Actinobacterial genome 16S WGS