Large scale genomic data mining Curtis Huttenhower 10-23-09 Harvard School of Public Health...

Large scalegenomic data mining

Curtis Huttenhower

10-23-09Harvard School of Public HealthDepartment of Biostatistics

Mining Biological Data

~100 GB

More than 100GB

~100 GB

More than 100GB

~100 GB

More than 100GB

How can we ask and answer specific biomedical questions

using thousands ofgenome-scale datasets?

Outline

2. Applications:Human molecular data

and clinical cancer cohorts

1. Methodology:Algorithms for mining

genome-scale datasets

3. Next steps:Methods for microbial communities

and functional metagenomics

A Definition of Functional Genomics

Genomic data Prior knowledge

Data↓

Function

Function↓

Function

Gene↓

Function

MEFIT: A Framework forFunctional Genomics

BRCA1 BRCA2 0.9BRCA1 RAD51 0.8RAD51 TP53 0.85…

Related Gene Pairs

HighCorrelation

LowCorrelation

BRCA1 BRCA2 0.9BRCA1 RAD51 0.8RAD51 TP53 0.85…

BRCA2 SOX2 0.1RAD51 FOXP2 0.2ACTR1 H6PD 0.15…

Related Gene Pairs

Unrelated Gene PairsHigh

CorrelationLow

Correlation

Golub 1999

Butte 2000

Whitfield 2002

Hansen 1998

Functional Relationship

Golub 1999

Butte 2000

Whitfield 2002

Hansen 1998

Biological Context

Functional areaTissueDisease…

Functional Interaction Networks

Global interaction network

Autophagy networkVacuolar transport

network Translation network

Currently have data from30,000 human experimental results,

15,000 expression conditions +15,000 diverse others, analyzed for

200 biological functions and150 diseases

Predicting Gene Function

Cell cycle genes

Predicted relationships between genes

HighConfidence

LowConfidence

Predicting Gene FunctionPredicted relationships

between genes

HighConfidence

LowConfidence

Cell cycle genes

Predicting Gene FunctionPredicted relationships

between genes

HighConfidence

LowConfidence

These edges provide a measure of how likely a gene is to

specifically participate in the process of

interest.

Comprehensive Validation of Computational Predictions

Genomic data

Computational Predictions of Gene Function

MEFITSPELLHibbs et al 2007

bioPIXIEMyers et al 2005

Genes predicted to function in mitochondrion organization

and biogenesis

Laboratory ExperimentsPetite

frequencyGrowthcurves

Confocal microscopy

New known functions for correctly predicted genes

Retraining

With David Hess, Amy Caudy

Prior knowledge

Evaluating the Performance of Computational Predictions

106Original GO Annotations

Genes involved in mitochondrion organization and biogenesis

135Under-annotations

82Novel Confirmations,

First Iteration

17Novel Confirmations,

Second Iteration

340 total: >3x previously known genes in ~5 person-months

Evaluating the Performance of Computational Predictions

106Original GO Annotations

Genes involved in mitochondrion organization and biogenesis

95Under-annotations

40Confirmed

Under-annotations

80Novel Confirmations

First Iteration

17Novel Confirmations

Second Iteration

340 total: >3x previously known genes in ~5 person-months

Computational predictions from large collections of genomic data can be

accurate despite incomplete or misleading gold standards, and they

continue to improve as additional data are incorporated.

Functional Associations Between Contexts

HighConfidence

LowConfidence

The average strength of these relationships

indicates how cohesive a process is.

Cell cycle genes

HighConfidence

LowConfidence

Cell cycle genes

DNA replication genes

The average strength of these relationships indicates how

associated two processes are.

HighConfidence

LowConfidence

Cell cycle genes

Functional mapping:Scoring functional associations

How can we formalizethese relationships?

Any sets of genes G1 and G2 in a network can be compared

using four measures:

• Edges between their genes

• Edges within each set• The background edges

incident to each set• The baseline of all edges

in the network

),(),(

21, 21 GGwithin

baseline

GGbackground

GGbetweenFA GG

Stronger connections between the sets increase association.

Stronger within self-connections or nonspecific background connections decrease association.

Functional mapping:Bootstrap p-values

• Scoring functional associations is great……how do you interpret an association score?– For gene sets of arbitrary sizes?– In arbitrary graphs?– Each with its own bizarre distribution of edges?

Empirically!# Genes 1 5 10 50

Histograms of FAs for random sets

For any graph, compute FA scores for many randomly chosen gene sets of different sizes. Null distribution is

approximately normal with mean 1.

Standard deviation is asymptotic in the sizes

of both gene sets.

Maps FA scores to p-values for any gene sets and

underlying graph.

Null distribution σs for one graph

|)(|||

|||)(|),(ˆ

1),(ˆ

jijiFA

BGGAGG

)(1)( ),(ˆ),,(ˆ, 212121xxFAP GGGGGG

Functional Associations Between Processes

EdgesAssociations between processes

VeryStrong

ModeratelyStrong

Hydrogen Transport

Electron Transport

Cellular Respiration

Protein ProcessingPeptide

Metabolism

Cell Redox Homeostasis

Aldehyde Metabolism

Energy Reserve

Metabolism

Vacuolar Protein

Catabolism

Negative Regulation of Protein Metabolism

Organelle Fusion

Protein Depolymerization

Organelle Inheritance

VeryStrong

ModeratelyStrong

BordersData coverage of processes

WellCovered

SparselyCovered

Hydrogen Transport

Electron Transport

Metabolism

Aldehyde Metabolism

Energy Reserve

Metabolism

Vacuolar Protein

Catabolism

Organelle Fusion

VeryStrong

ModeratelyStrong

NodesCohesiveness of processes

BelowBaseline

Baseline(genomic

background)

VeryCohesive

BordersData coverage of processes

WellCovered

SparselyCovered

Hydrogen Transport

Electron Transport

Metabolism

Aldehyde Metabolism

Energy Reserve

Metabolism

Vacuolar Protein

Catabolism

Organelle Fusion

AHP1DOT5GRX1GRX2…

APE3LAP4PAI3PEP4 …

Functional Maps:Focused Data Summarization

ACGGTGAACGTACAGTACAGATTACTAGGACATTAGGCCGTATCCGATACCCGATA

Data integration summarizes an impossibly huge amount of experimental data into an

impossibly huge number of predictions; what next?

Functional Maps:Focused Data Summarization

ACGGTGAACGTACAGTACAGATTACTAGGACATTAGGCCGTATCCGATACCCGATA

How can a biologist take advantage of all this data to study

his/her favorite gene/pathway/disease without

losing information?

Functional mapping• Very large collections of genomic data• Specific predicted molecular interactions• Pathway, process, or disease

associations• Underlying experimental results and

functional activities in data

Outline

HEFalMp: Predicting human gene function

HEFalMp

HEFalMp: Predicting humangenetic interactions

HEFalMp

HEFalMp: Analyzing human genomic data

HEFalMp

HEFalMp: Understanding human disease

HEFalMp

Validating Human Predictions

Autophagy

Luciferase(Negative control)

ATG5(Positive control) LAMP2 RAB11A

NotStarved

Starved(Autophagic)

Predicted novel autophagy proteins

5½ of 7 predictions currently confirmed

With Erin Haley, Hilary Coller

Current Work: MolecularMechanisms in a Colon Cancer CohortWith Shuji Ogino, Charlie Fuchs

~3,100gastrointestinal

subjects

~3,800tissue samples

~1,450colon cancer

samples~1,150

CpG island methylation

~1,200LINE-1

methylation

~700TMA immuno-histochemistry

~2,100cancer

mutation tests

Health Professionals Follow-Up

StudyNurse’s HealthStudy

LINE-1 Methylation• Repetitive element making up ~20% of

mammalian genomes• Very easy to assay methylation level (%)• Good proxy for whole-genome methylation

DASL Gene Expression• Gene expression analysis from

paraffin blocks• Thanks to Todd Golub, Yujin

Hoshida

~775gene

expression

Colon Cancer:LINE-1 methylation levels

30 35 40 45 50 55 60 65 70 75 8030

LINE-1 Methylation in Mul-tiple Tumors from the Same

Subject

Methylation %, Tumor #1M

ρ = 0.718, p < 0.01

Ogino et al, 2008

Lower LINE-1 methylation associates with poor colon cancer prognosis.

LINE-1 methylation varies remarkably between individuals…

…but it is highly correlated within individuals.

What does it all mean??What is the biological

mechanism linking LINE-1 methylation to colon cancer?

With Shuji Ogino, Charlie Fuchs

30 35 40 45 50 55 60 65 70 75 8030

LINE-1 Methylation in Mul-tiple Tumors from the Same

Subject

Methylation %, Tumor #1M

ρ = 0.718, p < 0.01

Ogino et al, 2008

Lower LINE-1 methylation associates with poor colon cancer prognosis.

LINE-1 methylation varies remarkably between individuals…

…but it is highly correlated within individuals.

This suggests a genetic effect.

This suggests a copy number variation.

This suggests linkage to a cancer-related pathway.

Is anything different about these outliers?

What is the biological mechanism linking LINE-1

methylation to colon cancer?

With Shuji Ogino, Charlie Fuchs

Preliminary Data• Six genes differentially expressed even using naïve methods• One uncharacterized, one oncogene, three malignancy, one histone• 1/3 are from a family with known variable GI expression, prognostic

value• 2/3 fall in same cytogenic band, which is also a known CNV hotspot• HEFalMp links to a set of transmembrane receptors/channels• Better analysis pulls out mostly one-carbon metabolism and a few

more signaling pathways (neurotransmitters??)

Check back in acouple of months!

Outline

Next Steps:Microbial Communities

• Data integration is off to a great start in humans– Complex communities of distinct cell types– Very sparse prior knowledge

• Concentrated in a few specific areas

– Variation across populations– Critical to understand mechanisms of disease

• What about microbial communities?– Complex communities of distinct species/strains– Very sparse prior knowledge

• Concentrated in a few specific species/strains

– Variation across populations– Critical to understand mechanisms of disease

Next Steps:Functional Metagenomics

• Metagenomics: data analysis from environmental samples– Microflora: environment includes us!

• Another data integration problem– Must include datasets from multiple organisms

• Another context-specificity problem– Now “context” can also mean “species”

• What questions can we answer?– How do human microflora interact with diabetes,

obesity, oral health, antibiotics, aging, …– What’s shared within community X?

What’s different? What’s unique?– What’s perturbed in disease state Y?

One organism, or many? Host interactions?– Current methods annotate ~50% of synthetic data,

<5% of environmental data

PKH2LPD1

W04B5.5

R04B3.2

LLC1.3

T21F4.1

ARG1DLD

PKH2LPD1

W04B5.5

R04B3.2

LLC1.3

T21F4.1

ARG1DLD

~120 available expression datasets

~70 species

PKH2LPD1

W04B5.5

R04B3.2

LLC1.3

T21F4.1

ARG1DLD

Weskamp et al 2004

Flannick et al 2006

Kanehisa et al 2008

Tatusov et al 1997

• Data integration works just as well in microbes as it does in humans• We know an awful lot about some microorganisms and almost nothing about others• Purely sequence-based and purely network-based tools for function transfer both fall short• We need data integration to take advantage of both and mine out useful biology!

Functional Maps forFunctional Metagenomics

YG16YG15

KO1: YG1, YG2, YG3KO2: YG4KO3: YG6…

ECG1, ECG2PAG1ECG3, PAG2…

Validating Orthology-BasedFunctional Mapping

Does unweighted data integration predict functional relationships?

What is the effect of “projecting” through an orthologous space?

Recall

Unsupervised integration

Individual datasets

Recall

m) Individual

datasets

Unsupervised integration

YG16YG15

YG3Holdout set,

uncharacterized “genome”

Random subsets,characterized “genomes”

48KEGG KEGG

Can subsets of the yeast genome predict a heldout subset’s

functional maps?

Can subsets of the yeast genome predict a heldout subset’s

interactome?

0.68 0.48

0.39 0.25

0.30 0.37

0.27 0.39

What have we learned?• Yeast is incredibly well-curated

• KEGG tends to be more specific than GO

• Predicting interactomes by projecting through

functional maps

works decently in the absolute best case

Now, what happens if you do this forcharacterized microbes?

• ~20 (somewhat) well-characterized species

• 1-35 datasets each

• Integrate within species

• Evaluate using KEGG

• Then cross-validate by holding out species

Recall

Unsupervised integrations

Next Steps:Missing Methodology, Mining

• Most machine learning algorithms are optimized for one of two cases:

– Small, dense data

– Large, sparse data

• HEFalMp integrates ~300M records using ~1K features, relatively few of which are missing, in ~200 contexts

Feature selection

Regularization

Dimension reductionSimple models, efficient algorithmsSlightly less

Next Steps:Missing Methodology, Models

Dataset #1

Dataset #2

Dataset #2 …

Dataset #1

Dataset #2

Dataset #3 …

Biological Context

Dataset #1

Dataset #2

Dataset #3 …

Cellular Processes

Tissue/Cell Lineage

Disease State

Developmental Stage

Cross-Species Orthology

This is clearly not a sustainable system;novel large-scale hierarchical modeling is needed to capture the complex biology of metazoan and

metagenomic interaction networks.

Types of Interactions

Regulation

Efficient Computation For Biological Discovery

Massive datasets and genomes require efficient algorithms and implementations.

• Sleipnir C++ library for computational functional genomics

• Data types for biological entities• Microarray data, interaction data, genes and gene sets,

functional catalogs, etc. etc.• Network communication, parallelization

• Efficient machine learning algorithms• Generative (Bayesian) and discriminative (SVM)

• And it’s fully documented!It’s also speedy: improves on Bayes Net Toolbox by

~22x in memory usage and up to >100x in runtime.

Efficient Computation For Biological Discovery

Massive datasets and genomes require efficient algorithms and implementations.

• Sleipnir C++ library for computational functional genomics

• Data types for biological entities• Microarray data, interaction data, genes and gene sets,

functional catalogs, etc. etc.• Network communication, parallelization

• Efficient machine learning algorithms• Generative (Bayesian) and discriminative (SVM)

• And it’s fully documented!

8 hours

1 minute

30 years

2 months

18 hours

Original processing time

Current processing time

2-3 hours

Outline

• Bayesian system for genomic

data integration• Sleipnir software for efficient

large scale data mining• Functional mapping to statistically

summarize large data collections

• HEFalMp system for human data

analysis and integration

• Six confirmed predictions in

autophagy• Ongoing analysis of

LINE-1methylation in colon

cancer• Data integration

applied tomicrobial

communities andfunctional

metagenomics• Efficient machine

learningfor large, dense

feature spaces

Thanks!

NIGMShttp://function.princeton.edu/sleipnir

http://function.princeton.edu/hefalmp

Interested? We’re lookingfor students and postdocs!Biostatistics Department

http://huttenhower.sph.harvard.edu

Hilary CollerErin HaleyTsheko Mutungu

Olga TroyanskayaMatt HibbsChad MyersDavid HessEdo AiroldiFlorian Markowetz

Shuji OginoCharlie Fuchs

Colon Cancer:Immunohistochemistry

Tumor #1 Tumor #2 … Tumor #700

AKT1 0 11 55AURKA 0 5 0CCND1 25 0 30

… …

Conditions

Quantities

The world’s smallest, cheapest microarray!

What does the IHC data tell us about LINE-1 hypomethylation?

Colon Cancer:Immunohistochemistry

~700 Tumor Samples

LINE-1 hypomethylated outliers

LINE-1 methylation “normal”

STAT3VDR

CDKN1B

PPARGCDK8

CTSBPTEN

CCND10

LINE-1 Methylation Low

Normal

Can existing microarrays amplify the LINE-1

hypomethylation signal?

The world’s smallest, cheapest microarray!

Colon Cancer:Mining Microarrays

STAT3VDR

CDKN1B

PPARGCDK8

CTSBPTEN

CCND1-0.6

~650 datasets~15,000 expression conditions

~24,000 genes

Most like our 26-gene LINE-1 differential methylation

signature

Least like the signature

26 genes in signature

Identify microarray datasets with conditions enriched for

LINE-1 hypomethylation.

Colon Cancer: Mining Microarrays

“The goal of GSEA is to determine whether members of a gene

set S tend to occur toward the top (or bottom) of the list L.”

Subramanian et al, 2005

Most like our 26-gene LINE-1 differential methylation

signature

Least like the signature

Bleomycin effect on mutagen-sensitive lymphoblastoid cells

Folic acid deficiency effect on colon cancer cells

Bladder tumor stage classification

Normal tissue of diverse types

Muscle function and aging

Non-diseased lung tissue

What CNV-linked genes are differentially expressed in

these datasets?

Dataset 1

Condition XCondition YCondition Z

Dataset 2

Condition ACondition BCondition CCondition DCondition E

“The goal of GSEA is to determine whether members of a gene

set S tend to occur toward the top (or bottom) of the list L.”

Subramanian et al, 2005

these datasets?

Gene XGene YGene Z

Gene AGene BGene CGene DGene E

Most upregulated insignificantly enriched datasets

Most downregulated

PSGs (11 genes on 19q13.3)

PCDHs (~50 genes on 5q31.3) Misc. ~12 genes on 16p13.3

Iafrate et al, 2005

these datasets?

Iafrate et al, 2005

Pregnancy specific β glycoproteins

Salahshor et al, 2005

“PSG9 is not found in the non-pregnant adult except in association with cancer, and it appears to be an early molecular event associated with colorectal cancer.”

Differential gene expression profile reveals deregulation of pregnancy specific β1 glycoprotein 9 early during colorectal carcinogenesis

Colon Cancer:Generating a Hypothesis

these datasets?

Colon Cancer:Generating a Hypothesis

these datasets?

Colon Cancer:Using All the Data

these datasets?

GI cancers and chemotherapy

Yes(caveat investigator)

Get back to me in a couple of months…

What’s the state of the data?• Extremely hypomethylated colon cancer carries a significantly poor prognosis

• In our cohort, these ~20 tumors are weakly enriched for a protein activity signature based on IHC

• The expression datasets most enriched for the same signature represent mainly GI cancer and chemotherapy conditions

• The PSG gene family is upregulated in these datasets and is linked to a known CNV

• HEFalMp associates the PSGs with cancer based on correlation with known colorectal cancer genes in a variety of expression datasets

Nothing definite – yet.

• Of only five regulators found, four have

generic cell cycle/proliferation targets

• Just five basic regulators for ~7,000 genes?

• These motifs only appear upstream of ~half

of the genes

Human Regulatory Networks

6,829genes

Serum re-stimulated (hrs)Serum starved (hrs)1

5< <50

2 4 8 24 96 1 2 4 8 24 48

FIRE: Elemento et al. 2007

Quiescence: reversible exit from the cell cycle

Regulatory Modules:Expression Biclusters + Sequence Motifs

3 4 71 2 5 6 8Bicluster:Coregulated subset of genes and conditions

…do all that, and simultaneously find

(under)enriched sequence motifs!

…any dataset can contain many

overlapping biclusters…

…any gene or condition can participate in

multiple biclusters…

COALESCE: Combinatorial Algorithm forExpression and Sequence-based Cluster Extraction

Gene Expression DNA Sequence

5’ UTR 3’ UTR

Upstream flank Downstream flank

Evolutionary Conservation

Nucleosome Positions

Identify conditions where genes

coexpress

Identify motifs enriched in

genes’ sequences

Create a new module

Select genes based on conditions

and motifs

Subtract mean from all data

Regulatory modules• Coregulated genes• Conditions where they’re

coregulated• Putative regulating motifs

Feature selection:Tests for differential expression/frequency

Bayesian integration

COALESCE: SelectingCoexpressed Conditions

• For each gene expression condition…– Compare distributions of values for

• Genes in the module versus• Genes not in the module

– If significantly different, include the condition

Preserving data structure:• If multiple conditions derive from the

samedataset, can be included/excluded as a

unit• For example, time course vs. deletion

collection• Test using multivariate z-test• Precalculate covariance matrix; still very

efficient

COALESCE: SelectingSignificant Motifs

• Coalesce looks for three kinds of motifs:– K-mers– Reverse complement pairs– Probabilistic Suffix Trees (PSTs)

• For every possible motif…– Compare distributions of values for

• Genes in the module versus• Genes not in the module

– If significantly different, include the motif

ACGACGT

ACGACAT | ATGTCGT

• This can distinguish flanks from UTRs• Fast!• Efficient enough to search coding sequence

(e.g. exons/introns)

COALESCE: SelectingProbable Genes

• For each gene in the genome…For each significant condition… For each significant motif…

What’s the probability the gene came from the module’s distribution?

What’s the probability that it came from outside the module?

)()|()()|(

)()|()|(

MgPMgDPMgPMgDP

MgPMgDPDMgP

Distributions of each feature in and out of the developing module are observed from the data.

Prior is used to stabilize module convergence; genes already in the module are more likely to stay there next iteration.

The probability of a gene being in the module given some data…

COALESCE: IntegratingAdditional Data Types

Nucleosome placement Evolutionary conservation

• Can be included as additional datasets and feature

selected just like expression conditions/motifs.

• Or can be used as a prior or weight on the values of

individual motifs.

G1 2.5 0.0

G2 0.6 0.5

G3 1.2 0.9

… … …

TCCGGTAGAACTACTGGTATTGTTTTGGATTCCGGTGATG

COALESCE Results:S. cerevisiae Modules

~2,200 conditions

~6,000 genes

The haystack

A needle

100 genes80 conditions

COALESCE Results:Yeast TF/Target Accuracy

Bas1p Hap4p Met32p

Cup2p Met31p

Zap1p Upc2p Mbp1p

Hsf1p Gln3p Hap3p Gcn4p Uga3p Gis1p Hap5p

0.0999999999999997

COALESCE

cMonkey

Weeder

COALESCE Results:Yeast Clustering Accuracy

• ~2,200 yeast conditions– Recapitulation of known biology from Gene Ontology

COALESCE Results:Yeast Clustering Accuracy

• ~2,200 yeast conditions– Recapitulation of known biology from Gene Ontology

ASCL1 in 5’ flank, unch. sequences underenriched in 3’ UTR

M. musculus: Up in callosal and motor neurons

C. elegans: Up in larvae, down in adults

GATA in 5’ flank, miR-788 seed in 3’ UTR

AAGGGGC (zf?) and enriched in 5’ flank

H. sapiens: Up in normal muscle, down in diabetic

COALESCE: Coregulated Quiescence Modules

Down during quiescence entry, up during quiescence exit,down with adenoviral infection

Specific predicted uncharacterized reverse complement motif

Up during quiescence entry, down during quiescence exit

Many known related (proliferation) motifs:Pax4, Staf, NFKB1, Gfi, ESR1, Runx1, Su(H)

Down during quiescence entry,enriched for transport/trafficking

miR-297 motif predicted in 3’ UTR (CACATAC)

Down with let-7 exposure

let-7 motifs predicted in 3’ UTR (UACCUC)

Summary

• COALESCE algorithm for regulatory module prediction

– Biclustering + putative de novo motifs

– Optimized for complex organisms (fast!)• Large genomes, large data collections

– High accuracy, low false positives

– Leverage prior knowledge, multiple data types

Large scale genomic data mining Curtis Huttenhower 10-23-09 Harvard School of Public Health...

Documents

Curtis Huttenhower - Harvard UniversityPadmanabhan P, Trachtenberg A, Ankarklev J, Brancucci NM, Huttenhower C, Duraisingh MT, Ghiran I, Kuo WP, Filgueira L, Martinelli R, Marti M

Biology 458 Biostatistics Prototypes · Biology 458 Biostatistics Prototypes Week 01 2007 Biostatistics 01 - Introduction ... Nonparametric Statistics in R 2007 Biostatistics 31 Œ

Biostatistics and Bioinformatics - University Bulletinbulletin.gwu.edu/public-health/biostatistics... · 2020-05-08 · BIOSTATISTICS AND BIOINFORMATICS The Department of Biostatistics

Biost 518 / Biost 515 Applied Biostatistics II ... · Biost 518 / Biost 515 Applied Biostatistics II / Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University

Supervised and unsupervised methods for large scale genomic data integration Curtis Huttenhower 03-25-10 Harvard School of Public Health Department of

Charting the function of microbes and microbial communities Curtis Huttenhower 11-17-11 Harvard School of Public Health Department of Biostatistics

Large scale genomic data mining Curtis Huttenhower 11-14-09 Harvard School of Public Health Department of Biostatistics

biostatistics basic

The Biostatistics Graduate Program at Boston …The Biostatistics Graduate Program at Boston University (MA/PhD) 542 Program Handbook 2017-2018 1 Biostatistics Program Contacts Biostatistics

Biostatistics ANOVA.pptx

Computational Methodology for Microbial and Metagenomic Characterization using Large Scale Functional Genomic Data Integration Curtis Huttenhower 03-08-10

Biostatistics ii4june

Package ‘curatedOvarianData’ - Bioconductor · jamin Haibe-Kains, Svitlana Tyekucheva, Jie Ding, Ina Jazic, Michael Birrer, Giovanni Parmi- giani, Curtis Huttenhower, Levi Waldron

Biostatistics in Nursing Research 101409.ppt - Biostatistics in

Initial results from the Microbiome Quality Control Project pilot phase (MBQC-pilot) Curtis Huttenhower 09-30-14 Harvard School of Public Health Department

Network integration and function prediction: Putting it all together Slides courtesy of Curtis Huttenhower 04-13-11 Harvard School of Public Health Department

Meta’omic Analysis with MetaPhlAn, HUMAnN, and LEfSe Curtis Huttenhower 08-08-13 Harvard School of Public Health Department of Biostatistics

BIOST 514/517 Biostatistics I / Applied Biostatistics Icourses.washington.edu/b517/Lectures/L08.pdf · 2013-10-25 · BIOST 514/517 Biostatistics I / Applied Biostatistics I Kathleen

Biostatistics 1

Meta’omic functional profiling with ShortBRED Curtis Huttenhower 09-19-15 Harvard School of Public Health Department of Biostatistics U. Oregon