103
Phylogeny-Driven Approaches to Genomics and Metagenomics Jonathan A. Eisen University of California, Davis @phylogenomics Talk at University of Washington October 23, 2013 Wednesday, October 23, 13

"Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Embed Size (px)

DESCRIPTION

Talk by Jonathan Eisen at U. Washington 10/23/13

Citation preview

Page 1: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Phylogeny-Driven Approaches to Genomics and Metagenomics

Jonathan A. EisenUniversity of California, Davis

@phylogenomics

Talk atUniversity of Washington

October 23, 2013

Wednesday, October 23, 13

Page 2: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

My Obsessions

Jonathan A. EisenUniversity of California, Davis

@phylogenomics

Talk atUniversity of Washington

October 23, 2013

Wednesday, October 23, 13

Page 3: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Open Science

Wednesday, October 23, 13

Page 4: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Open Science

XWednesday, October 23, 13

Page 5: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Social Media & Science

Wednesday, October 23, 13

Page 6: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Social Media & Science

XWednesday, October 23, 13

Page 7: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

• RedSox

RedSox

Wednesday, October 23, 13

Page 8: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

• RedSox

RedSox

XWednesday, October 23, 13

Page 9: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Microbial Evolution

Wednesday, October 23, 13

Page 10: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Sequencing

Wednesday, October 23, 13

Page 11: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Sequencing, Phylogeny, Microbes

Wednesday, October 23, 13

Page 12: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Four Eras of Sequencing & Microbes

Wednesday, October 23, 13

Page 13: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Era I: The Tree of Life

Wednesday, October 23, 13

Page 14: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Tree from Woese. 1987. Microbiological Reviews 51:221

Lost in Graduate School?

Colias

Wednesday, October 23, 13

Page 15: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Tree from Woese. 1987. Microbiological Reviews 51:221

XLost in Graduate School?

Colias Phil Hanawalt

Wednesday, October 23, 13

Page 16: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Tree from Woese. 1987. Microbiological Reviews 51:221

XLost in Graduate School?

Colias Phil Hanawalt Adaptive Mutation

Wednesday, October 23, 13

Page 17: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Tree from Woese. 1987. Microbiological Reviews 51:221

X XLost in Graduate School?

Colias Phil Hanawalt Adaptive Mutation

@RELenski

Wednesday, October 23, 13

Page 18: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Tree from Woese. 1987. Microbiological Reviews 51:221

Lost in Graduate School?

Get A Map

Wednesday, October 23, 13

Page 19: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Tree from Woese. 1987. Microbiological Reviews 51:221

Woese - Three Domains 1977

Wednesday, October 23, 13

Page 20: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Tree from Woese. 1987. Microbiological Reviews 51:221

Map for Graduate School

Wednesday, October 23, 13

Page 21: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Limited Sampling of RRR Studies

Tree from Woese. 1987. Microbiological Reviews 51:221

Wednesday, October 23, 13

Page 22: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

My Study Organisms

Tree from Woese. 1987. Microbiological Reviews 51:221

Wednesday, October 23, 13

Page 23: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

E.coli vs. H. volcanii UV survival

1E-07

1E-06

1E-05

0.0001

0.001

0.01

0.1

1

RelativeSurvival

0 50 100 150 200 250 300 350 400

UV J/m2

UV Survival E.coli vs H.volcanii

H.volcanii WFD11

E.coli NR10125 mfd+

E.coli NR10121 mfd-

Wednesday, October 23, 13

Page 24: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

H. volcanii Excision Repair

0

0.2

0.4

0.6

0 2000 4000 6000 8000 10000 12000 14000 16000 18000

Avg. Mol. Wt.(Base Pairs)

H. volcanii UV Repair Label 7 - 45J / m2)

45 J/m2 Dark 24 Hours

45 J/m2 Photoreac.

45 J/m2 t0

0 J/m2 t0

Wednesday, October 23, 13

Page 27: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Whatever the History: Try to Incorporate It

from Lake et al. doi: 10.1098/rstb.2009.0035

Wednesday, October 23, 13

Page 28: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

adapted from Baldauf, et al., in Assembling the Tree of Life, 2004

Tree Updated

Wednesday, October 23, 13

Page 29: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Era II: rRNA in the Environment

Wednesday, October 23, 13

Page 30: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

DNA extraction

PCRSequence

rRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

Yeast

Makes lots of copies of the rRNA genes in sample

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

Yeast

rRNA1 5’

...TACAGTATAGGTGGAGCTAGCGATCGATC

GA... 3’

PCR and phylogenetic analysis of rRNA genes

Wednesday, October 23, 13

Page 31: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Chemosynthetic Symbionts

Eisen et al. 1992Eisen et al. 1992. J. Bact.174: 3416

Wednesday, October 23, 13

Page 32: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

DNA extraction

PCRSequence

rRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

rRNA2

Makes lots of copies of the rRNA genes in sample

rRNA1 5’

...ACACACATAGGTGGAGCTAGCGATCGAT

CGA... 3’

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

rRNA2

rRNA2 5’

...TACAGTATAGGTGGAGCTAGCGATCGATC

GA... 3’

PCR and phylogenetic analysis of rRNA genes

Yeast T A C A G TYeast

Wednesday, October 23, 13

Page 33: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

DNA extraction

PCRSequence

rRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

rRNA2

Makes lots of copies of the rRNA genes in sample

rRNA1 5’...ACACACATAGGTGGAGCTA

GCGATCGATCGA... 3’

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

rRNA2rRNA2

5’..TACAGTATAGGTGGAGCTAGCGACGATCGA... 3’

PCR and phylogenetic analysis of rRNA genes

rRNA3 5’...ACGGCAAAATAGGTGGATT

CTAGCGATATAGA... 3’

rRNA4 5’...ACGGCCCGATAGGTGGATT

CTAGCGCCATAGA... 3’

rRNA3 C A C T G T

rRNA4 C A C A G T

Yeast T A C A G T

Yeast

rRNA3 rRNA4

Wednesday, October 23, 13

Page 34: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

DNA extraction

PCRSequence

rRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

rRNA2

Makes lots of copies of the rRNA genes in sample

rRNA1 5’...ACACACATAGGTGGAGCTA

GCGATCGATCGA... 3’

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

rRNA2rRNA2

5’..TACAGTATAGGTGGAGCTAGCGACGATCGA... 3’

PCR and phylogenetic analysis of rRNA genes

rRNA3 5’...ACGGCAAAATAGGTGGATT

CTAGCGATATAGA... 3’

rRNA4 5’...ACGGCCCGATAGGTGGATT

CTAGCGCCATAGA... 3’

rRNA3 C A C T G T

rRNA4 C A C A G T

Yeast T A C A G T

Yeast

rRNA3 rRNA4

Phylogeny

Wednesday, October 23, 13

Page 35: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

• OTUs• Taxonomic lists• Relative abundance of taxa• Ecological metrics (alpha / beta diversity)

• Phylogenetic metrics• Binning• Identification of novel groups• Clades• Rates of change• LGT• Convergence• PD• Phylogenetic ecology (e.g., Unifrac)

Uses of rRNA Phylogeny

Wednesday, October 23, 13

Page 36: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Approaching to NGS

Discovery of DNA structure(Cold Spring Harb. Symp. Quant. Biol. 1953;18:123-31)

1953

Sanger sequencing method by F. Sanger(PNAS ,1977, 74: 560-564)

1977

PCR by K. Mullis(Cold Spring Harb Symp Quant Biol. 1986;51 Pt 1:263-73)

1983

Development of pyrosequencing(Anal. Biochem., 1993, 208: 171-175; Science ,1998, 281: 363-365)

1993

1980

1990

2000

2010

Single molecule emulsion PCR 1998

Human Genome Project(Nature , 2001, 409: 860–92; Science, 2001, 291: 1304–1351)

Founded 454 Life Science 2000

454 GS20 sequencer(First NGS sequencer) 2005

Founded Solexa 1998

Solexa Genome Analyzer(First short-read NGS sequencer) 2006

GS FLX sequencer(NGS with 400-500 bp read lenght) 2008

Hi-Seq2000(200Gbp per Flow Cell) 2010

Illumina acquires Solexa(Illumina enters the NGS business) 2006

ABI SOLiD(Short-read sequencer based upon ligation) 2007

Roche acquires 454 Life Sciences(Roche enters the NGS business) 2007

NGS Human Genome sequencing(First Human Genome sequencing based upon NGS technology) 2008

From Slideshare presentation of Cosentino Cristianhttp://www.slideshare.net/cosentia/high-throughput-equencing

MiseqRoche JrIon TorrentPacBioOxford

Sequencing Has Gone Crazy

Wednesday, October 23, 13

Page 37: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

rRNA PCR Revolution

• More PCR products

• Deeper sequencing• The rare biosphere• Relative abundance estimates

• More samples (with barcoding)• Times series• Spatially diverse sampling• Fine scale sampling

Wednesday, October 23, 13

Page 38: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Beta-Diversity

a broader range of Proteobacteria, but yielded similar results(Fig. S1 and Tables S2 and S3).Across all samples, we identified 4,931 quality Nitrosomadales

sequences, which grouped into 176 OTUs (operational taxo-nomic units) using an arbitrary 99% sequence similarity cutoff.This cutoff retained a high amount of sequence diversity, butminimized the chance of including diversity because of se-quencing or PCR errors. Most (95%) of the sequences appearclosely related either to the marine Nitrosospira-like clade,known to be abundant in estuarine sediments (e.g., ref. 19) or tomarine bacterium C-17, classified as Nitrosomonas (20) (Fig. S2).Pairwise community similarity between the samples was calcu-lated based on the presence or absence of each OTU usinga rarefied Sørensen’s index (4). Community similarity using thisincidence index was highly correlated with the abundance-basedSørensen index (Mantel test: ! = 0.9239; P = 0.0001) (21).A plot of community similarity versus geographic distance for

each pairwise set of samples revealed that the Nitrosomonadalesdisplay a significant, negative distance-decay curve (slope = !0.08,P < 0.0001) (Fig. 2). Furthermore, the slope of this curve variedsignificantly among the three spatial scales. The distance-decayslope within marshes was significantly shallower than the overallslope (slope=!0.04;P< 0.0334) and steeper acrossmarsheswithina region than the overall slope (slope= !0.27, P < 0.0007) (Fig. 2).In contrast, at the continental scale, the distance-decay curve didnot differ from zero (P = 0.0953). Thus, there is no evidence thatsampling across continents contributed Nitromonadales OTU di-versity in addition to what was already observed at the marsh andregional scales. Furthermore, additional analyses suggest that theseresults are not driven by a few outlier samples (Fig. S3).Over all spatial scales, both the environment and dispersal lim-

itation appear to influence Nitrosomonadales "-diversity. Rankedpartial Mantel tests revealed that the similarity in Nitrosomo-nadales community composition between samples was highly cor-related with environmental distance (!=!0.5339; P=0.0001) andgeographic distance (! = !0.2803; P = 0.0001), but not plantcommunity similarity (P = 0.72) (Table S2).To further identify the relative importance of factors con-

tributing to these correlations, we used a multiple regression onmatrices (MRM). The partial regression coefficients of an MRMmodel give a measure of the rate of change in community sim-ilarity per standardized unit of similarity for the variable of in-terest; all other explanatory variables are held constant (22).Over all scales, the MRMmodel explained a large and significantproportion (R2 = 46%; P < 0.0001) of the variability in Nitro-

somonadales community similarity. Geographic distance con-tributed the largest partial regression coefficient (b = 0.40,P < 0.0001), with sediment moisture, nitrate concentration, plantcover, salinity, and air and water temperature contributing tosmaller, but significant, partial regression coefficients (b = 0.09–0.17, P < 0.05) (Table 1). Because salt marsh bacteria may bedispersing through ocean currents, we also used a global oceancirculation model (23), as applied previously (24), to estimaterelative dispersal times of hypothetical microbial cells betweeneach sampling location. Dispersal times between sampling pointsdid not explain more variability in bacterial community similarity(ln dispersal time: b= 0.06, P= !0.0799; with dispersal R2 = 0.47vs. without 0.46). Therefore, in the remaining analyses we usegeographic distance rather than dispersal time.As hypothesized, the relative importance of environmental

factors versus geographic distance to Nitrosomadales communitysimilarity differed across the three spatial scales. Contrary to ourexpectations, however, geographic distance had a strong effecton community similarity within salt marshes (partial regressioncoefficient b = 0.47) but no effect at larger scales (Table 1).Furthermore, the relative importance of different environmentalvariables varied by scale. Sediment moisture, which is likely re-lated to unmeasured variables, such as oxygen availability, wasthe most important variable explaining community similaritywithin marshes (b = 0.63). In contrast, water temperature (b =0.45) and nitrate concentrations (b = 0.17) were more importantat the regional and continental scales, respectively.The varying importance of the environmental parameters at

different spatial scales likely reflects differences in their un-derlying variability at these scales. For example, the MRMmodeldid exceptionally well in explaining variation in Nitrosomadalescommunity similarity at the regional scale (R2 = 0.61) (Table 1).Notably, this spatial scale captures a latitudinal gradient on theeast and west coasts of North America, which results in highvariability in water temperature. Previous studies in the field andlaboratory support the idea that AOB composition is particularlysensitive to temperature (e.g., refs. 25 and 26). Within marshes,

Fig. 1. The 13 marshes sampled (see Table S1 for details). Marshes com-pared with one another within regions are circled. (Inset) The arrangementof sampling points within marshes. Six points were sampled along a 100-mtransect, and a seventh point was sampled "1 km away. Two marshes in theNortheast United States (outlined stars) were sampled more intensively,along four 100-m transects in a grid pattern.

Fig. 2. Distance-decay curves for the Nitrosomadales communities. Thedashed, blue line denotes the least-squares linear regression across all spatialscales. The solid lines denote separate regressions within each of the threespatial scales: within marshes, regional (across marshes within regions circled inFig. 1), and continental (across regions). The slopes of all lines (except the solidlight blue line) are significantly less than zero. The slopes of the solid red linesare significantly different from the slope of the all scale (blue dashed) line.

Martiny et al. PNAS | May 10, 2011 | vol. 108 | no. 19 | 7851

ECOLO

GY

a broader range of Proteobacteria, but yielded similar results(Fig. S1 and Tables S2 and S3).Across all samples, we identified 4,931 quality Nitrosomadales

sequences, which grouped into 176 OTUs (operational taxo-nomic units) using an arbitrary 99% sequence similarity cutoff.This cutoff retained a high amount of sequence diversity, butminimized the chance of including diversity because of se-quencing or PCR errors. Most (95%) of the sequences appearclosely related either to the marine Nitrosospira-like clade,known to be abundant in estuarine sediments (e.g., ref. 19) or tomarine bacterium C-17, classified as Nitrosomonas (20) (Fig. S2).Pairwise community similarity between the samples was calcu-lated based on the presence or absence of each OTU usinga rarefied Sørensen’s index (4). Community similarity using thisincidence index was highly correlated with the abundance-basedSørensen index (Mantel test: ! = 0.9239; P = 0.0001) (21).A plot of community similarity versus geographic distance for

each pairwise set of samples revealed that the Nitrosomonadalesdisplay a significant, negative distance-decay curve (slope = !0.08,P < 0.0001) (Fig. 2). Furthermore, the slope of this curve variedsignificantly among the three spatial scales. The distance-decayslope within marshes was significantly shallower than the overallslope (slope=!0.04;P< 0.0334) and steeper acrossmarsheswithina region than the overall slope (slope= !0.27, P < 0.0007) (Fig. 2).In contrast, at the continental scale, the distance-decay curve didnot differ from zero (P = 0.0953). Thus, there is no evidence thatsampling across continents contributed Nitromonadales OTU di-versity in addition to what was already observed at the marsh andregional scales. Furthermore, additional analyses suggest that theseresults are not driven by a few outlier samples (Fig. S3).Over all spatial scales, both the environment and dispersal lim-

itation appear to influence Nitrosomonadales "-diversity. Rankedpartial Mantel tests revealed that the similarity in Nitrosomo-nadales community composition between samples was highly cor-related with environmental distance (!=!0.5339; P=0.0001) andgeographic distance (! = !0.2803; P = 0.0001), but not plantcommunity similarity (P = 0.72) (Table S2).To further identify the relative importance of factors con-

tributing to these correlations, we used a multiple regression onmatrices (MRM). The partial regression coefficients of an MRMmodel give a measure of the rate of change in community sim-ilarity per standardized unit of similarity for the variable of in-terest; all other explanatory variables are held constant (22).Over all scales, the MRMmodel explained a large and significantproportion (R2 = 46%; P < 0.0001) of the variability in Nitro-

somonadales community similarity. Geographic distance con-tributed the largest partial regression coefficient (b = 0.40,P < 0.0001), with sediment moisture, nitrate concentration, plantcover, salinity, and air and water temperature contributing tosmaller, but significant, partial regression coefficients (b = 0.09–0.17, P < 0.05) (Table 1). Because salt marsh bacteria may bedispersing through ocean currents, we also used a global oceancirculation model (23), as applied previously (24), to estimaterelative dispersal times of hypothetical microbial cells betweeneach sampling location. Dispersal times between sampling pointsdid not explain more variability in bacterial community similarity(ln dispersal time: b= 0.06, P= !0.0799; with dispersal R2 = 0.47vs. without 0.46). Therefore, in the remaining analyses we usegeographic distance rather than dispersal time.As hypothesized, the relative importance of environmental

factors versus geographic distance to Nitrosomadales communitysimilarity differed across the three spatial scales. Contrary to ourexpectations, however, geographic distance had a strong effecton community similarity within salt marshes (partial regressioncoefficient b = 0.47) but no effect at larger scales (Table 1).Furthermore, the relative importance of different environmentalvariables varied by scale. Sediment moisture, which is likely re-lated to unmeasured variables, such as oxygen availability, wasthe most important variable explaining community similaritywithin marshes (b = 0.63). In contrast, water temperature (b =0.45) and nitrate concentrations (b = 0.17) were more importantat the regional and continental scales, respectively.The varying importance of the environmental parameters at

different spatial scales likely reflects differences in their un-derlying variability at these scales. For example, the MRMmodeldid exceptionally well in explaining variation in Nitrosomadalescommunity similarity at the regional scale (R2 = 0.61) (Table 1).Notably, this spatial scale captures a latitudinal gradient on theeast and west coasts of North America, which results in highvariability in water temperature. Previous studies in the field andlaboratory support the idea that AOB composition is particularlysensitive to temperature (e.g., refs. 25 and 26). Within marshes,

Fig. 1. The 13 marshes sampled (see Table S1 for details). Marshes com-pared with one another within regions are circled. (Inset) The arrangementof sampling points within marshes. Six points were sampled along a 100-mtransect, and a seventh point was sampled "1 km away. Two marshes in theNortheast United States (outlined stars) were sampled more intensively,along four 100-m transects in a grid pattern.

Fig. 2. Distance-decay curves for the Nitrosomadales communities. Thedashed, blue line denotes the least-squares linear regression across all spatialscales. The solid lines denote separate regressions within each of the threespatial scales: within marshes, regional (across marshes within regions circled inFig. 1), and continental (across regions). The slopes of all lines (except the solidlight blue line) are significantly less than zero. The slopes of the solid red linesare significantly different from the slope of the all scale (blue dashed) line.

Martiny et al. PNAS | May 10, 2011 | vol. 108 | no. 19 | 7851

ECOLO

GY

Drivers of bacterial !-diversity depend on spatial scaleJennifer B. H. Martinya,1, Jonathan A. Eisenb, Kevin Pennc, Steven D. Allisona,d, and M. Claire Horner-Devinee

aDepartment of Ecology and Evolutionary Biology, and dDepartment of Earth System Science, University of California, Irvine, CA 92697; bDepartment ofEvolution and Ecology, University of California Davis Genome Center, Davis, CA 95616; cCenter for Marine Biotechnology and Biomedicine, The ScrippsInstitution of Oceanography, University of California at San Diego, La Jolla, CA 92093; and eSchool of Aquatic and Fishery Sciences, University of Washington,Seattle, WA 98195

Edited by Edward F. DeLong, Massachusetts Institute of Technology, Cambridge, MA, and approved March 31, 2011 (received for review November 1, 2010)

The factors driving !-diversity (variation in community composi-tion) yield insights into the maintenance of biodiversity on theplanet. Here we tested whether the mechanisms that underliebacterial !-diversity vary over centimeters to continental spatialscales by comparing the composition of ammonia-oxidizing bacte-ria communities in salt marsh sediments. As observed in studiesof macroorganisms, the drivers of salt marsh bacterial !-diversitydepend on spatial scale. In contrast to macroorganism studies,however, we found no evidence of evolutionary diversificationof ammonia-oxidizing bacteria taxa at the continental scale, de-spite an overall relationship between geographic distance andcommunity similarity. Our data are consistent with the idea thatdispersal limitation at local scales can contribute to !-diversity,even though the 16S rRNA genes of the relatively common taxaare globally distributed. These results highlight the importanceof considering multiple spatial scales for understanding microbialbiogeography.

microbial composition | distance-decay | Nitrosomonadales | ecological drift

Biodiversity supports the ecosystem processes upon which so-ciety depends (1). Understanding the mechanisms that gen-

erate andmaintain biodiversity is thus key to predicting ecosystemresponses to future environmental changes. The decrease incommunity similarity with geographic distance is a universalbiogeographic pattern observed in communities from alldomains of life (as in refs. 2–4). Pinpointing the underlyingcauses of this “distance-decay” pattern continues to be an area ofintense research (5–9), as such studies of !-diversity (variation incommunity composition) yield insights into the maintenance ofbiodiversity. These studies are still relatively rare for micro-organisms, however, and thus our understanding of the mecha-nisms underlying microbial diversity—most of the tree of life—remains limited.!-Diversity, and therefore distance-decay patterns, could be

driven solely by differences in environmental conditions acrossspace, a hypothesis summed up by microbiologists as, “every-thing is everywhere—the environmental selects” (10). Under thismodel, a distance-decay curve is observed because environmen-tal variables tend to be spatially autocorrelated, and organismswith differing niche preferences are selected from the availablepool of taxa as the environment changes with distance.Dispersal limitation can also give rise to !-diversity, as it per-

mits historical contingencies to influence present-day biogeo-graphic patterns. For example, neutral niche models, in which anorganism’s abundance is not influenced by its environmentalpreferences, predict a distance-decay curve (8, 11). On relativelyshort time scales, stochastic births and deaths contribute toa heterogeneous distribution of taxa (ecological drift). On longertime scales, stochastic genetic processes allow for taxon di-versification across the landscape (evolutionary drift). If dispersalis limiting, then current environmental or biotic conditions willnot fully explain the distance-decay curve, and thus geographicdistance will be correlated with community similarity even aftercontrolling for other factors (2).For macroorganisms, the relative contribution of environ-

mental factors or dispersal limitation to !-diversity depends on

spatial scale (12). Fifty-years ago, Preston (13) noted that theturnover rate (rate of change) of bird species composition acrossspace within a continent is lower than that across continents. Heattributed the high turnover rate across continents to evolu-tionary diversification (i.e., speciation) between faunas as a resultof dispersal limitation and the lower turnover rates of bird spe-cies within continents as a result of environmental variation.Here we investigate whether the mechanisms underlying !-

diversity in bacteria also vary by spatial scale. We chose to focuson the ammonia-oxidizing bacteria (AOB), which along with theammonia-oxidizing archaea (14), perform the rate-limiting step ofnitrification and thus play a key role in nitrogen dynamics. Wecompared AOB community composition in 106 sediment samplesfrom 12 salt marshes on three continents. A partially nestedsampling design achieved a relatively balanced distribution ofpairwise distance classes over nine orders of magnitude, from3 cm to 12,500 km (Fig. 1 and Table S1). We limited our sam-pling to a monophyletic group of bacteria, the AOB within the!-Proteobacteria, and one habitat, salt marshes primarily domi-nated by cordgrass (Spartina spp.). This approach constrainedthe pool of total diversity (richness) and kept the environmentaland plant variation relatively constant, increasing our ability toidentify if dispersal limitation influences AOB composition.We then asked two questions: (i) Does bacterial !-diversity—

specifically, the slope of the distance-decay curve—vary overlocal (within marsh), regional (across marshes within a coast),and continental scales? (ii) Do the underlying factors (environ-mental variation or dispersal limitation) explaining this diversityvary by spatial scale? Because most bacteria are small, abundant,and hardy, we predicted that dispersal limitation would occurprimarily across continents, resulting in genetically divergentmicrobial “provinces” (15). At the same time, we predicted thatenvironmental factors would contribute equally to distance-decay at all scales, resulting in the steepest slope at the continentalscale as reported in plant and animal communities (12, 13, 16).

Results and DiscussionWe characterized AOB community composition by cloning andSanger sequencing of 16S rRNA gene regions targeted by twoprimer sets. Here we focus on the results from a subset of thosesequences from the order Nitrosomonadales, generated usingprimers specific for AOB within the !-Proteobacteria class (17).The second primer set (18) generated longer sequences from

Author contributions: J.B.H.M. and M.C.H.-D. designed research; J.B.H.M., J.A.E., K.P., andM.C.H.-D. performed research; J.B.H.M., S.D.A., and M.C.H.-D. analyzed data; and J.B.H.M.and M.C.H.-D. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Freely available online through the PNAS open access option.

Data deposition: The sequences reported in this paper have been deposited in the Gen-Bank database (accession nos. HQ271472–HQ276885 and HQ276886–HQ283075).1To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1016308108/-/DCSupplemental.

7850–7854 | PNAS | May 10, 2011 | vol. 108 | no. 19 www.pnas.org/cgi/doi/10.1073/pnas.1016308108

Drivers of bacterial !-diversity depend on spatial scaleJennifer B. H. Martinya,1, Jonathan A. Eisenb, Kevin Pennc, Steven D. Allisona,d, and M. Claire Horner-Devinee

aDepartment of Ecology and Evolutionary Biology, and dDepartment of Earth System Science, University of California, Irvine, CA 92697; bDepartment ofEvolution and Ecology, University of California Davis Genome Center, Davis, CA 95616; cCenter for Marine Biotechnology and Biomedicine, The ScrippsInstitution of Oceanography, University of California at San Diego, La Jolla, CA 92093; and eSchool of Aquatic and Fishery Sciences, University of Washington,Seattle, WA 98195

Edited by Edward F. DeLong, Massachusetts Institute of Technology, Cambridge, MA, and approved March 31, 2011 (received for review November 1, 2010)

The factors driving !-diversity (variation in community composi-tion) yield insights into the maintenance of biodiversity on theplanet. Here we tested whether the mechanisms that underliebacterial !-diversity vary over centimeters to continental spatialscales by comparing the composition of ammonia-oxidizing bacte-ria communities in salt marsh sediments. As observed in studiesof macroorganisms, the drivers of salt marsh bacterial !-diversitydepend on spatial scale. In contrast to macroorganism studies,however, we found no evidence of evolutionary diversificationof ammonia-oxidizing bacteria taxa at the continental scale, de-spite an overall relationship between geographic distance andcommunity similarity. Our data are consistent with the idea thatdispersal limitation at local scales can contribute to !-diversity,even though the 16S rRNA genes of the relatively common taxaare globally distributed. These results highlight the importanceof considering multiple spatial scales for understanding microbialbiogeography.

microbial composition | distance-decay | Nitrosomonadales | ecological drift

Biodiversity supports the ecosystem processes upon which so-ciety depends (1). Understanding the mechanisms that gen-

erate andmaintain biodiversity is thus key to predicting ecosystemresponses to future environmental changes. The decrease incommunity similarity with geographic distance is a universalbiogeographic pattern observed in communities from alldomains of life (as in refs. 2–4). Pinpointing the underlyingcauses of this “distance-decay” pattern continues to be an area ofintense research (5–9), as such studies of !-diversity (variation incommunity composition) yield insights into the maintenance ofbiodiversity. These studies are still relatively rare for micro-organisms, however, and thus our understanding of the mecha-nisms underlying microbial diversity—most of the tree of life—remains limited.!-Diversity, and therefore distance-decay patterns, could be

driven solely by differences in environmental conditions acrossspace, a hypothesis summed up by microbiologists as, “every-thing is everywhere—the environmental selects” (10). Under thismodel, a distance-decay curve is observed because environmen-tal variables tend to be spatially autocorrelated, and organismswith differing niche preferences are selected from the availablepool of taxa as the environment changes with distance.Dispersal limitation can also give rise to !-diversity, as it per-

mits historical contingencies to influence present-day biogeo-graphic patterns. For example, neutral niche models, in which anorganism’s abundance is not influenced by its environmentalpreferences, predict a distance-decay curve (8, 11). On relativelyshort time scales, stochastic births and deaths contribute toa heterogeneous distribution of taxa (ecological drift). On longertime scales, stochastic genetic processes allow for taxon di-versification across the landscape (evolutionary drift). If dispersalis limiting, then current environmental or biotic conditions willnot fully explain the distance-decay curve, and thus geographicdistance will be correlated with community similarity even aftercontrolling for other factors (2).For macroorganisms, the relative contribution of environ-

mental factors or dispersal limitation to !-diversity depends on

spatial scale (12). Fifty-years ago, Preston (13) noted that theturnover rate (rate of change) of bird species composition acrossspace within a continent is lower than that across continents. Heattributed the high turnover rate across continents to evolu-tionary diversification (i.e., speciation) between faunas as a resultof dispersal limitation and the lower turnover rates of bird spe-cies within continents as a result of environmental variation.Here we investigate whether the mechanisms underlying !-

diversity in bacteria also vary by spatial scale. We chose to focuson the ammonia-oxidizing bacteria (AOB), which along with theammonia-oxidizing archaea (14), perform the rate-limiting step ofnitrification and thus play a key role in nitrogen dynamics. Wecompared AOB community composition in 106 sediment samplesfrom 12 salt marshes on three continents. A partially nestedsampling design achieved a relatively balanced distribution ofpairwise distance classes over nine orders of magnitude, from3 cm to 12,500 km (Fig. 1 and Table S1). We limited our sam-pling to a monophyletic group of bacteria, the AOB within the!-Proteobacteria, and one habitat, salt marshes primarily domi-nated by cordgrass (Spartina spp.). This approach constrainedthe pool of total diversity (richness) and kept the environmentaland plant variation relatively constant, increasing our ability toidentify if dispersal limitation influences AOB composition.We then asked two questions: (i) Does bacterial !-diversity—

specifically, the slope of the distance-decay curve—vary overlocal (within marsh), regional (across marshes within a coast),and continental scales? (ii) Do the underlying factors (environ-mental variation or dispersal limitation) explaining this diversityvary by spatial scale? Because most bacteria are small, abundant,and hardy, we predicted that dispersal limitation would occurprimarily across continents, resulting in genetically divergentmicrobial “provinces” (15). At the same time, we predicted thatenvironmental factors would contribute equally to distance-decay at all scales, resulting in the steepest slope at the continentalscale as reported in plant and animal communities (12, 13, 16).

Results and DiscussionWe characterized AOB community composition by cloning andSanger sequencing of 16S rRNA gene regions targeted by twoprimer sets. Here we focus on the results from a subset of thosesequences from the order Nitrosomonadales, generated usingprimers specific for AOB within the !-Proteobacteria class (17).The second primer set (18) generated longer sequences from

Author contributions: J.B.H.M. and M.C.H.-D. designed research; J.B.H.M., J.A.E., K.P., andM.C.H.-D. performed research; J.B.H.M., S.D.A., and M.C.H.-D. analyzed data; and J.B.H.M.and M.C.H.-D. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Freely available online through the PNAS open access option.

Data deposition: The sequences reported in this paper have been deposited in the Gen-Bank database (accession nos. HQ271472–HQ276885 and HQ276886–HQ283075).1To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1016308108/-/DCSupplemental.

7850–7854 | PNAS | May 10, 2011 | vol. 108 | no. 19 www.pnas.org/cgi/doi/10.1073/pnas.1016308108

Our data are consistent with the idea that dispersal limitation at local scales can contribute to à-diversity, even though the 16S rRNA genes of the relatively common taxa are globally distributed.

Wednesday, October 23, 13

Page 39: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Drosophila microbiome

Both natural surveys and laboratory experiments indicate that host diet plays a major role in shaping the Drosophila bacterial microbiome.

Laboratory strains provide only a limited model of natural host–microbe interactions

Wednesday, October 23, 13

Page 40: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

The Built Environment

ORIGINAL ARTICLE

Architectural design influences the diversity andstructure of the built environment microbiome

Steven W Kembel1, Evan Jones1, Jeff Kline1,2, Dale Northcutt1,2, Jason Stenson1,2,Ann M Womack1, Brendan JM Bohannan1, G Z Brown1,2 and Jessica L Green1,3

1Biology and the Built Environment Center, Institute of Ecology and Evolution, Department ofBiology, University of Oregon, Eugene, OR, USA; 2Energy Studies in Buildings Laboratory,Department of Architecture, University of Oregon, Eugene, OR, USA and 3Santa Fe Institute,Santa Fe, NM, USA

Buildings are complex ecosystems that house trillions of microorganisms interacting with eachother, with humans and with their environment. Understanding the ecological and evolutionaryprocesses that determine the diversity and composition of the built environment microbiome—thecommunity of microorganisms that live indoors—is important for understanding the relationshipbetween building design, biodiversity and human health. In this study, we used high-throughputsequencing of the bacterial 16S rRNA gene to quantify relationships between building attributes andairborne bacterial communities at a health-care facility. We quantified airborne bacterial communitystructure and environmental conditions in patient rooms exposed to mechanical or windowventilation and in outdoor air. The phylogenetic diversity of airborne bacterial communities waslower indoors than outdoors, and mechanically ventilated rooms contained less diverse microbialcommunities than did window-ventilated rooms. Bacterial communities in indoor environmentscontained many taxa that are absent or rare outdoors, including taxa closely related to potentialhuman pathogens. Building attributes, specifically the source of ventilation air, airflow rates, relativehumidity and temperature, were correlated with the diversity and composition of indoor bacterialcommunities. The relative abundance of bacteria closely related to human pathogens was higherindoors than outdoors, and higher in rooms with lower airflow rates and lower relative humidity.The observed relationship between building design and airborne bacterial diversity suggests thatwe can manage indoor environments, altering through building design and operation the communityof microbial species that potentially colonize the human microbiome during our time indoors.The ISME Journal advance online publication, 26 January 2012; doi:10.1038/ismej.2011.211Subject Category: microbial population and community ecologyKeywords: aeromicrobiology; bacteria; built environment microbiome; community ecology; dispersal;environmental filtering

Introduction

Humans spend up to 90% of their lives indoors(Klepeis et al., 2001). Consequently, the way wedesign and operate the indoor environment has aprofound impact on our health (Guenther andVittori, 2008). One step toward better understandingof how building design impacts human healthis to study buildings as ecosystems. Built envi-ronments are complex ecosystems that containnumerous organisms including trillions of micro-organisms (Rintala et al., 2008; Tringe et al., 2008;Amend et al., 2010). The collection of microbiallife that exists indoors—the built environment

microbiome—includes human pathogens and com-mensals interacting with each other and with theirenvironment (Eames et al., 2009). There have beenfew attempts to comprehensively survey the builtenvironment microbiome (Rintala et al., 2008;Tringe et al., 2008; Amend et al., 2010), with moststudies focused on measures of total bioaerosolconcentrations or the abundance of culturable orpathogenic strains (Berglund et al., 1992; Toivolaet al., 2002; Mentese et al., 2009), rather than a morecomprehensive measure of microbial diversity inindoor spaces. For this reason, the factors thatdetermine the diversity and composition of the builtenvironment microbiome are poorly understood.However, the situation is changing. The develop-ment of culture-independent, high-throughputmolecular sequencing approaches has transformedthe study of microbial diversity in a variety ofenvironments, as demonstrated by the recent explo-sion of research on the microbial ecology of aquaticand terrestrial ecosystems (Nemergut et al., 2011)

Received 23 October 2011; revised 13 December 2011; accepted13 December 2011

Correspondence: SW Kembel, Biology and the Built EnvironmentCenter, Institute of Ecology and Evolution, Department of Biology,University of Oregon, Eugene, OR 97405, USA.E-mail: [email protected]

The ISME Journal (2012), 1–11& 2012 International Society for Microbial Ecology All rights reserved 1751-7362/12

www.nature.com/ismej

Microbial Biogeography of Public Restroom SurfacesGilberto E. Flores1, Scott T. Bates1, Dan Knights2, Christian L. Lauber1, Jesse Stombaugh3, Rob Knight3,4,

Noah Fierer1,5*

1 Cooperative Institute for Research in Environmental Science, University of Colorado, Boulder, Colorado, United States of America, 2 Department of Computer Science,

University of Colorado, Boulder, Colorado, United States of America, 3 Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado, United

States of America, 4 Howard Hughes Medical Institute, University of Colorado, Boulder, Colorado, United States of America, 5 Department of Ecology and Evolutionary

Biology, University of Colorado, Boulder, Colorado, United States of America

Abstract

We spend the majority of our lives indoors where we are constantly exposed to bacteria residing on surfaces. However, thediversity of these surface-associated communities is largely unknown. We explored the biogeographical patterns exhibitedby bacteria across ten surfaces within each of twelve public restrooms. Using high-throughput barcoded pyrosequencing ofthe 16 S rRNA gene, we identified 19 bacterial phyla across all surfaces. Most sequences belonged to four phyla:Actinobacteria, Bacteriodetes, Firmicutes and Proteobacteria. The communities clustered into three general categories: thosefound on surfaces associated with toilets, those on the restroom floor, and those found on surfaces routinely touched withhands. On toilet surfaces, gut-associated taxa were more prevalent, suggesting fecal contamination of these surfaces. Floorsurfaces were the most diverse of all communities and contained several taxa commonly found in soils. Skin-associatedbacteria, especially the Propionibacteriaceae, dominated surfaces routinely touched with our hands. Certain taxa were morecommon in female than in male restrooms as vagina-associated Lactobacillaceae were widely distributed in femalerestrooms, likely from urine contamination. Use of the SourceTracker algorithm confirmed many of our taxonomicobservations as human skin was the primary source of bacteria on restroom surfaces. Overall, these results demonstrate thatrestroom surfaces host relatively diverse microbial communities dominated by human-associated bacteria with clearlinkages between communities on or in different body sites and those communities found on restroom surfaces. Moregenerally, this work is relevant to the public health field as we show that human-associated microbes are commonly foundon restroom surfaces suggesting that bacterial pathogens could readily be transmitted between individuals by the touchingof surfaces. Furthermore, we demonstrate that we can use high-throughput analyses of bacterial communities to determinesources of bacteria on indoor surfaces, an approach which could be used to track pathogen transmission and test theefficacy of hygiene practices.

Citation: Flores GE, Bates ST, Knights D, Lauber CL, Stombaugh J, et al. (2011) Microbial Biogeography of Public Restroom Surfaces. PLoS ONE 6(11): e28132.doi:10.1371/journal.pone.0028132

Editor: Mark R. Liles, Auburn University, United States of America

Received September 12, 2011; Accepted November 1, 2011; Published November 23, 2011

Copyright: ! 2011 Flores et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported with funding from the Alfred P. Sloan Foundation and their Indoor Environment program, and in part by the NationalInstitutes of Health and the Howard Hughes Medical Institute. The funders had no role in study design, data collection and analysis, decision to publish, orpreparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected]

Introduction

More than ever, individuals across the globe spend a largeportion of their lives indoors, yet relatively little is known about themicrobial diversity of indoor environments. Of the studies thathave examined microorganisms associated with indoor environ-ments, most have relied upon cultivation-based techniques todetect organisms residing on a variety of household surfaces [1–5].Not surprisingly, these studies have identified surfaces in kitchensand restrooms as being hot spots of bacterial contamination.Because several pathogenic bacteria are known to survive onsurfaces for extended periods of time [6–8], these studies are ofobvious importance in preventing the spread of human disease.However, it is now widely recognized that the majority ofmicroorganisms cannot be readily cultivated [9] and thus, theoverall diversity of microorganisms associated with indoorenvironments remains largely unknown. Recent use of cultiva-tion-independent techniques based on cloning and sequencing ofthe 16 S rRNA gene have helped to better describe these

communities and revealed a greater diversity of bacteria onindoor surfaces than captured using cultivation-based techniques[10–13]. Most of the organisms identified in these studies arerelated to human commensals suggesting that the organisms arenot actively growing on the surfaces but rather were depositeddirectly (i.e. touching) or indirectly (e.g. shedding of skin cells) byhumans. Despite these efforts, we still have an incompleteunderstanding of bacterial communities associated with indoorenvironments because limitations of traditional 16 S rRNA genecloning and sequencing techniques have made replicate samplingand in-depth characterizations of the communities prohibitive.With the advent of high-throughput sequencing techniques, wecan now investigate indoor microbial communities at anunprecedented depth and begin to understand the relationshipbetween humans, microbes and the built environment.

In order to begin to comprehensively describe the microbialdiversity of indoor environments, we characterized the bacterialcommunities found on ten surfaces in twelve public restrooms(six male and six female) in Colorado, USA using barcoded

PLoS ONE | www.plosone.org 1 November 2011 | Volume 6 | Issue 11 | e28132

the stall in), they were likely dispersed manually after women usedthe toilet. Coupling these observations with those of thedistribution of gut-associated bacteria indicate that routine use oftoilets results in the dispersal of urine- and fecal-associated bacteriathroughout the restroom. While these results are not unexpected,they do highlight the importance of hand-hygiene when usingpublic restrooms since these surfaces could also be potentialvehicles for the transmission of human pathogens. Unfortunately,previous studies have documented that college students (who arelikely the most frequent users of the studied restrooms) are notalways the most diligent of hand-washers [42,43].

Results of SourceTracker analysis support the taxonomicpatterns highlighted above, indicating that human skin was theprimary source of bacteria on all public restroom surfacesexamined, while the human gut was an important source on oraround the toilet, and urine was an important source in women’srestrooms (Figure 4, Table S4). Contrary to expectations (seeabove), soil was not identified by the SourceTracker algorithm asbeing a major source of bacteria on any of the surfaces, includingfloors (Figure 4). Although the floor samples contained family-leveltaxa that are common in soil, the SourceTracker algorithmprobably underestimates the relative importance of sources, like

Figure 3. Cartoon illustrations of the relative abundance of discriminating taxa on public restroom surfaces. Light blue indicates lowabundance while dark blue indicates high abundance of taxa. (A) Although skin-associated taxa (Propionibacteriaceae, Corynebacteriaceae,Staphylococcaceae and Streptococcaceae) were abundant on all surfaces, they were relatively more abundant on surfaces routinely touched withhands. (B) Gut-associated taxa (Clostridiales, Clostridiales group XI, Ruminococcaceae, Lachnospiraceae, Prevotellaceae and Bacteroidaceae) were mostabundant on toilet surfaces. (C) Although soil-associated taxa (Rhodobacteraceae, Rhizobiales, Microbacteriaceae and Nocardioidaceae) were in lowabundance on all restroom surfaces, they were relatively more abundant on the floor of the restrooms we surveyed. Figure not drawn to scale.doi:10.1371/journal.pone.0028132.g003

Figure 4. Results of SourceTracker analysis showing the average contributions of different sources to the surface-associatedbacterial communities in twelve public restrooms. The ‘‘unknown’’ source is not shown but would bring the total of each sample up to 100%.doi:10.1371/journal.pone.0028132.g004

Bacteria of Public Restrooms

PLoS ONE | www.plosone.org 5 November 2011 | Volume 6 | Issue 11 | e28132

high diversity of floor communities is likely due to the frequency ofcontact with the bottom of shoes, which would track in a diversityof microorganisms from a variety of sources including soil, which isknown to be a highly-diverse microbial habitat [27,39]. Indeed,bacteria commonly associated with soil (e.g. Rhodobacteraceae,Rhizobiales, Microbacteriaceae and Nocardioidaceae) were, on average,more abundant on floor surfaces (Figure 3C, Table S2).Interestingly, some of the toilet flush handles harbored bacterialcommunities similar to those found on the floor (Figure 2,Figure 3C), suggesting that some users of these toilets may operatethe handle with a foot (a practice well known to germaphobes andthose who have had the misfortune of using restrooms that are lessthan sanitary).

While the overall community level comparisons between thecommunities found on the surfaces in male and female restroomswere not statistically significant (Table S3), there were gender-

related differences in the relative abundances of specific taxa onsome surfaces (Figure 1B, Table S2). Most notably, Lactobacillaceaewere clearly more abundant on certain surfaces within femalerestrooms than male restrooms (Figure 1B). Some species of thisfamily are the most common, and often most abundant, bacteriafound in the vagina of healthy reproductive age women [40,41]and are relatively less abundant in male urine [28,29]. Ouranalysis of female urine samples collected as part of a previousstudy [26] (Figure 1A), found that Lactobacillaceae were dominant inurine, therefore implying that surfaces in the restrooms whereLactobacillaceae were observed were contaminated with urine. Otherstudies have demonstrated a similar phenomenon, with vagina-associated bacteria having also been observed in airplanerestrooms [11] and a child day care facility [10]. As we foundthat Lactobacillaceae were most abundant on toilet surfaces andthose touched by hands after using the toilet (with the exception of

Figure 2. Relationship between bacterial communities associated with ten public restroom surfaces. Communities were clustered usingPCoA of the unweighted UniFrac distance matrix. Each point represents a single sample. Note that the floor (triangles) and toilet (asterisks) surfacesform clusters distinct from surfaces touched with hands.doi:10.1371/journal.pone.0028132.g002

Table 1. Results of pairwise comparisons for unweighted UniFrac distances of bacterial communities associated with varioussurfaces of public restrooms on the University of Colorado campus using the ANOSIM test in Primer v6.

Door in Door out Stall in Stall outFaucethandle

Soapdispenser

Toilet flushhandle Toilet seat Toilet floor

Door in

Door out 20.139

Stall in 0.149 20.053

Stall out 20.074 20.083 20.037

Faucet handle 20.062 20.011 20.092 20.040

Soap dispenser 20.020 0.014 20.060 20.001 0.070

Toilet flush handle 0.376* 0.405* 0.221 0.350* 0.172* 0.470*

Toilet seat 0.742* 0.672* 0.457* 0.586* 0.401* 0.653* 0.187*

Toilet floor 0.995* 0.988* 0.993* 0.961* 0.758* 0.998* 0.577* 0.950*

Sink floor 1.000* 0.995* 1.000* 0.974* 0.770* 1.000* 0.655* 0.982* 20.033

The R-statistic is shown for each comparison with asterisks denoting comparisons that were statistically significant at P#0.01.doi:10.1371/journal.pone.0028132.t001

Bacteria of Public Restrooms

PLoS ONE | www.plosone.org 4 November 2011 | Volume 6 | Issue 11 | e28132

10 FEBRUARY 2012 VOL 335 SCIENCE www.sciencemag.org 650

NEWSFOCUS

CR

ED

ITS

(T

OP

TO

BO

TT

OM

): (P

HO

TO

) C

OU

RT

ES

Y G

ILB

ER

TO

FLO

RE

S; (C

HA

RT

) G

. E

. F

LO

RE

S E

T A

L.,

PLO

S O

NE

6, 1

1 (2

01

1);

PH

OT

O B

Y S

ISIR

A G

OR

TH

ALA

In just that short time, the microbes had begun to take on a “signature” of outside air (more types from plants and soil), and 2 hours after the windows were shut again, the proportion of microbes from the human body increased back to pre-vious levels.

The s tudy, which appeared online 26 Janu-ary in The ISME Journal, found that mechanically ventilated rooms had lower microbial diversity than ones with open win-dows. The availability of fresh air translated into lower proportions of microbes associ-ated with the human body, and consequently, fewer potential pathogens. Although this result suggests that having natural airfl ow may be healthier, Green says answering that question requires clinical data; she’s hoping to convince a hospital to participate in a study to see if the incidence of hospital-acquired infections is associated with a room’s micro-bial community.

For his part, Peccia, who is also a Sloan grantee, is merging microbiology and the

physics of aerosols to look more closely at how the movement of air affects microbes. Peccia says his group is building on work by air-quality engineers and scientists, but “we want to add biology to the equation.”

Bacteria in air behave like other particles; their size dictates how they disperse or settle. Humans in a room not only shed microbes from their skin and mouths, but they also drum up microbial material from the fl oor as

they move around. But to quantify those con-tributions, Peccia’s team has had to develop new methods to collect airborne bacteria and extract their DNA, as the microbes are much less abundant in air than on surfaces.

In one recent study, they used air fi lters to sample airborne particles and microbes in a classroom during 4 days during which students were present and 4 days during which the room was vacant. They measured the abundance and type of fungal and bac-terial genomes present and estimated the microbes’ concentrations in the entire room. By accounting for bacteria entering and leav-

ing the room through ventilation, they calculated that people shed or resuspended about 35 million bacterial cells per person per hour. That number is much higher than the several-hundred-thousand maximum previously estimated to be present in indoor air, Peccia reported last fall at the American Association for Aerosol Research Conference in Orlando, Florida.

His group’s data also suggest that rooms have “memories” of past human inhabitants. By kick-ing into the air settled microbes from the fl oor, occupants expose themselves not just to the microbes of a person coughing next to them, but also possibly to those from a person who coughed in the room a few hours or even days ago.

Peccia hopes to come up with ways to describe the distribution of bacteria indoors that can be used in conjunction with exist-ing knowledge about particulate matter and chemicals in designing healthier buildings. “My hope is that we can bring this enough to the forefront that people who do aerosol sci-ence will fi nd it as important to know biology as to know physics and chemistry,” he says.

Still, even though he’s a willing partici-

pant in indoor microbial ecology research, Peccia thinks that the field has yet to gel. And the Sloan Foundation’s Olsiewski shares some of his con-cern. “Everybody’s gen-erating vast amounts of

data,” she says, but looking across data sets can be diffi cult because groups choose dif-ferent analytical tools. With Sloan support, though, a data archive and integrated analyt-ical tools are in the works.

To foster collaborations between micro-biologists, architects, and building scientists, the foundation also sponsored a symposium on the microbiome of the built environment at the 2011 Indoor Air conference in Austin, Texas, and launched a Web site, MicroBE.net, that’s a clearinghouse of information on the fi eld. Although Olsiewski won’t say how long the foundation will fund its indoor microbial ecology program, she says Sloan is committed to supporting all of the current projects for the next few years. The program’s ultimate goal, she says, is to create a new fi eld of scientifi c inquiry that eventually will be funded by tradi-tional government funding agencies focused on basic biology and environmental policy.

Matthew Kane, a microbial ecologist and program director at the U.S. National Sci-ence Foundation (NSF), says that although there was interest in these questions prior to the Sloan program, the Sloan Foundation has taken a directed approach to funding the research, and “I have no doubt that their investment is going to reap great returns.” So far, though, NSF has funded only one study on indoor microbes: a study of Pseudomonas bacteria in human households.

As studies like Green’s building ecology analysis progress, they should shed light on how indoor environments differ from those traditionally studied by microbial ecologists. “It’s important to have a quantitative under-standing of how building design impacts microbial communities indoors, and how these communities impact human health,” Green says. But it remains to be seen whether we’ll someday design and maintain our build-ings with microbes in mind.

–COURTNEY HUMPHRIES

Courtney Humphries is a freelance writer in Boston and author of Superdove.

100

80

60

40

20

0

Ave

rag

e c

on

trib

uti

on

(%

)

Door in

Door out

Stall i

n

Stall o

ut

Faucet h

andles

Soap disp

enser

Toile

t seat

Toile

t flu

sh h

andle

Toile

t flo

or

Sink f

loor

SOURCES

Soil

Water

Mouth

Urine

Gut

Skin

Outside infl uence. Students prepare to sample air outside a class-room in China as part of an indoor ecology study.

Bathroom biogeography. By swabbing different surfaces in public restrooms, researchers determined that microbes vary in where they come from depend-ing on the surface (chart).

Published by AAAS

on

Febr

uary

9, 2

012

ww

w.s

cien

cem

ag.o

rgD

ownl

oade

d fro

m

Wednesday, October 23, 13

Page 41: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Citizen Science - Project MERCCURI

Wednesday, October 23, 13

Page 42: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Phone Microbiome

Georgia Barguil

Jack Gilbert

Wednesday, October 23, 13

Page 43: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Era III: Genomics

Wednesday, October 23, 13

Page 44: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

1st Genome Sequence

Fleischmann et al. 1995

Wednesday, October 23, 13

Page 45: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

My Study Organisms

Tree from Woese. 1987. Microbiological Reviews 51:221

Wednesday, October 23, 13

Page 46: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

TIGR Genome Projects

Tree from Woese. 1987. Microbiological Reviews 51:221

Wednesday, October 23, 13

Page 47: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

TIGR Genome Projects

Tree from Woese. 1987. Microbiological Reviews 51:221

Wednesday, October 23, 13

Page 48: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

If you can’t beat them, critique them ...

Fleischmann et al. 1995

Wednesday, October 23, 13

Page 49: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Helicobacter pylori genome 1997

Wednesday, October 23, 13

Page 50: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

PHYLOGENENETIC PREDICTION OF GENE FUNCTION

IDENTIFY HOMOLOGS

OVERLAY KNOWNFUNCTIONS ONTO TREE

INFER LIKELY FUNCTIONOF GENE(S) OF INTEREST

1 2 3 4 5 6

3 5

3

1A 2A 3A 1B 2B 3B

2A 1B

1A

3A

1B2B

3B

ALIGN SEQUENCES

CALCULATE GENE TREE

12

4

6

CHOOSE GENE(S) OF INTEREST

2A

2A

5

3

Species 3Species 1 Species 2

1

1 2

2

2 31

1A 3A

1A 2A 3A

1A 2A 3A

4 6

4 5 6

4 5 6

2B 3B

1B 2B 3B

1B 2B 3B

ACTUAL EVOLUTION(ASSUMED TO BE UNKNOWN)

Duplication?

EXAMPLE A EXAMPLE B

Duplication?

Duplication?

Duplication

5

METHOD

Ambiguous

Based on Eisen, 1998 Genome Res 8: 163-167.

Phylogenomics

Wednesday, October 23, 13

Page 51: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Phylogenetic Prediction of Function

• Many powerful and automated similarity based methods for assigning genes to protein families• COGs• PFAM HMM searches

• Some limitations of similarity based methods can be overcome by phylogenetic approaches

• Automated methods now available• Sean Eddy• Steven Brenner• Kimmen Sjölander

Wednesday, October 23, 13

Page 52: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Phylogenetic Prediction of Function

• Many powerful and automated similarity based methods for assigning genes to protein families• COGs• PFAM HMM searches

• Some limitations of similarity based methods can be overcome by phylogenetic approaches

• Automated methods now available• Sean Eddy• Steven Brenner• Kimmen Sjölander

• But …

Wednesday, October 23, 13

Page 53: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Carboxydothermus hydrogenoformans

• Isolated from a Russian hotspring• Thermophile (grows at 80°C)• Anaerobic• Grows very efficiently on CO (Carbon

Monoxide)• Produces hydrogen gas• Low GC Gram positive (Firmicute)• Genome Determined (Wu et al. 2005

PLoS Genetics 1: e65. )

Wednesday, October 23, 13

Page 56: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Non-Homology Predictions: Phylogenetic Profiling

• Step 1: Search all genes in organisms of interest against all other genomes

• Ask: Yes or No, is each gene found in each other species

• Cluster genes by distribution patterns (profiles)

Wednesday, October 23, 13

Page 58: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

B. subtilis new sporulation genes

Wednesday, October 23, 13

Page 59: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

From http://genomesonline.orgWednesday, October 23, 13

Page 60: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

PG Profiling Independent Contrasts

Wednesday, October 23, 13

Page 61: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Whole Genome Trees

AMPHORA

Wednesday, October 23, 13

Page 62: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Era IV: Genomes in the Environment

Wednesday, October 23, 13

Page 63: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

DNA extraction

PCRSequence

rRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

rRNA2

Makes lots of copies of the rRNA genes in sample

rRNA1 5’...ACACACATAGGTGGAGCTA

GCGATCGATCGA... 3’

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

rRNA2rRNA2

5’..TACAGTATAGGTGGAGCTAGCGACGATCGA... 3’

PCR and phylogenetic analysis of rRNA genes

rRNA3 5’...ACGGCAAAATAGGTGGATT

CTAGCGATATAGA... 3’

rRNA4 5’...ACGGCCCGATAGGTGGATT

CTAGCGCCATAGA... 3’

rRNA3 C A C T G T

rRNA4 C A C A G T

Yeast T A C A G T

Yeast

rRNA3 rRNA4

Phylotyping

Wednesday, October 23, 13

Page 64: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

DNA extraction

PCRSequenceall genes

Shotgun

Shotgun metagenomics

Wednesday, October 23, 13

Page 65: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

DNA extraction

PCRSequenceall genes

Shotgun

Shotgun metagenomics

Wednesday, October 23, 13

Page 66: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

DNA extraction

PCRSequenceall genes

Phylogenetic tree

Shotgun

rRNA1

E. coli Humans

rRNA2

Yeast

rRNA3 rRNA4

Phylotyping

Phylogeny has many uses in shotgun metagenomics

Wednesday, October 23, 13

Page 67: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Uses of Phylogeny in Metagenomics

• Taxonomic assessment• Phylogenetic OTUs• Phylogenetic taxonomy assignment• Phylogenetic binning

• Sample comparisons and hypothesis testing• Alpha diversity (i.e., PD)• Beta diversity• Trait evolution• Dispersal• Functional predictions• Rates of evolution• Convergence

Wednesday, October 23, 13

Page 70: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

0

0.125

0.250

0.375

0.500

Alphapro

teobacteria

Betap

roteobacteria

Gamm

aproteobacteria

Epsilo

nproteobacteria

Deltapro

teobacteria

Cyanobacteria

Firmicutes

Actinobacteria

Chlorobi

CFB

Chloroflexi

Spirochaetes

Fusobacteria

Deinococcus-Th

ermus

Euryarchaeota

Crenarchaeota

Sargasso Phylotypes

Wei

ghte

d %

of C

lone

s

Major Phylogenetic Group

EFG EFTu HSP70 RecA RpoB rRNA

Phylotyping - Sargasso Metagenome

Venter et al., Science 304: 66. 2004

Wednesday, October 23, 13

Page 71: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

AMPHORA Phylotyping

AMPHORA

http://genomebiology.com/2008/9/10/R151 Genome Biology 2008, Volume 9, Issue 10, Article R151 Wu and Eisen R151.7

Genome Biology 2008, 9:R151

sequences are not conserved at the nucleotide level [29]. As a

result, the nr database does not actually contain many more

protein marker sequences that can be used as references than

those available from complete genome sequences.

Comparison of phylogeny-based and similarity-based phylotypingAlthough our phylogeny-based phylotyping is fully auto-

mated, it still requires many more steps than, and is slower

than, similarity based phylotyping methods such as a

MEGAN [30]. Is it worth the trouble? Similarity based phylo-

typing works by searching a query sequence against a refer-

ence database such as NCBI nr and deriving taxonomic

information from the best matches or 'hits'. When species

that are closely related to the query sequence exist in the ref-

erence database, similarity-based phylotyping can work well.

However, if the reference database is a biased sample or if it

contains no closely related species to the query, then the top

hits returned could be misleading [31]. Furthermore, similar-

ity-based methods require an arbitrary similarity cut-off

value to define the top hits. Because individual bacterial

genomes and proteins can evolve at very different rates, a uni-

versal cut-off that works under all conditions does not exist.

As a result, the final results can be very subjective.

In contrast, our tree-based bracketing algorithm places the

query sequence within the context of a phylogenetic tree and

only assigns it to a taxonomic level if that level has adequate

sampling (see Materials and methods [below] for details of

the algorithm). With the well sampled species Prochlorococ-

cus marinus, for example, our method can distinguish closely

related organisms and make taxonomic identifications at the

species level. Our reanalysis of the Sargasso Sea data placed

672 sequences (3.6% of the total) within a P. marinus clade.

On the other hand, for sparsely sampled clades such as

Aquifex, assignments will be made only at the phylum level.

Thus, our phylogeny-based analysis is less susceptible to data

sampling bias than a similarity based approach, and it makes

Major phylotypes identified in Sargasso Sea metagenomic dataFigure 3Major phylotypes identified in Sargasso Sea metagenomic data. The metagenomic data previously obtained from the Sargasso Sea was reanalyzed using AMPHORA and the 31 protein phylogenetic markers. The microbial diversity profiles obtained from individual markers are remarkably consistent. The breakdown of the phylotyping assignments by markers and major taxonomic groups is listed in Additional data file 5.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Alphap

roteo

bacte

ria

Betapr

oteob

acter

ia

Gammap

roteo

bacte

ria

Deltap

roteo

bacte

ria

Epsilo

npro

teoba

cteria

Unclas

sified

prote

obac

teria

Bacter

oidete

s

Chlamyd

iae

Cyano

bacte

ria

Acidob

acter

ia

Therm

otoga

e

Fusob

acter

ia

Actino

bacte

ria

Aquific

ae

Plancto

mycete

s

Spiroc

haete

s

Firmicu

tes

Chloro

flexi

Chloro

bi

Unclas

sified

bacte

ria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

Rel

ativ

e ab

unda

nce

Wednesday, October 23, 13

Page 72: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

GOS 1

GOS 2

GOS 3

GOS 4

GOS 5

Phylogenetic ID of Novel Lineages

Wu et al PLoS One 2011

Wednesday, October 23, 13

Page 74: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Wu et al. 2006 PLoS Biology 4: e188.

Baumannia makes vitamins and cofactors

Sulcia makes amino acids

Phylogenetic Binning

Wednesday, October 23, 13

Page 75: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Improving Phylogenomics I

Wednesday, October 23, 13

Page 76: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Updated Tree of Life

Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree

Wednesday, October 23, 13

Page 77: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Genomes Poorly Sampled

Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree

Wednesday, October 23, 13

Page 78: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

TIGR Tree of Life Project

Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree

Wednesday, October 23, 13

Page 79: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Genomic Encyclopedia of Bacteria & Archaea

Wu et al. 2009 Nature 462, 1056-1060

Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree

Wednesday, October 23, 13

Page 80: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Genomic Encyclopedia of Bacteria & Archaea

Wu et al. 2009 Nature 462, 1056-1060

Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree

Wednesday, October 23, 13

Page 81: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Family Diversity vs. PD

Wu et al. 2009 Nature 462, 1056-1060

Wednesday, October 23, 13

Page 82: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

The Dark Matter of Biology

From Wu et al. 2009 Nature 462, 1056-1060Wednesday, October 23, 13

Page 83: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

83

Number of SAGs from Candidate Phyla

OD

1

OP

11

OP

3

SA

R4

06

Site A: Hydrothermal vent 4 1 - -Site B: Gold Mine 6 13 2 -Site C: Tropical gyres (Mesopelagic) - - - 2Site D: Tropical gyres (Photic zone) 1 - - -

Sample collections at 4 additional sites are underway.

Phil Hugenholtz

GEBA Uncultured

Wednesday, October 23, 13

Page 84: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

JGI Dark Matter Project

environmental samples (n=9)

isolation of singlecells (n=9,600)

whole genomeamplification (n=3,300)

SSU rRNA gene based identification

(n=2,000)

genome sequencing, assembly and QC (n=201)

draft genomes(n=201)

SAK

HSM ETLTG

HOT

GOM

GBS

EPR

TAETL T

PR

EBS

AK E

SM G TATTG

OM

OT

seawater brackish/freshwater hydrothermal sediment bioreactor

GN04WS3 (Latescibacteria)GN01

!"#$%&'$LD1

WS1PoribacteriaBRC1

LentisphaeraeVerrucomicrobia

OP3 (Omnitrophica)ChlamydiaePlanctomycetes

NKB19 (Hydrogenedentes)WYOArmatimonadetesWS4

ActinobacteriaGemmatimonadetesNC10SC4WS2

Cyanobacteria()*&2

Deltaproteobacteria

EM19 (Calescamantes)+,-*./'&'012345678#89/,-568/:

GAL35Aquificae

EM3Thermotogae

Dictyoglomi

SPAMGAL15

CD12 (Aerophobetes)OP8 (Aminicenantes)AC1SBR1093

ThermodesulfobacteriaDeferribacteres

Synergistetes

OP9 (Atribacteria)()*&2

CaldisericaAD3

Chloroflexi

AcidobacteriaElusimicrobiaNitrospirae49S1 2B

CaldithrixGOUTA4

*;<%0123=/68>8?8,6@98/:Chlorobi

486?8,A-5BTenericutes4AB@9/,-568/Chrysiogenetes

Proteobacteria

4896@9/,-565BTG3SpirochaetesWWE1 (Cloacamonetes)

C=1ZB3

=D)&'EF58>@,@,,AB&CG56?ABOP1 (Acetothermia)Bacteriodetes

TM7GN02 (Gracilibacteria)

SR1BH1

OD1 (Parcubacteria)

(*1OP11 (Microgenomates)

Euryarchaeota

Micrarchaea

DSEG (Aenigmarchaea)Nanohaloarchaea

Nanoarchaea

Cren MCGThaumarchaeota

Cren C2Aigarchaeota

Cren pISA7

Cren ThermoproteiKorarchaeota

pMC2A384 (Diapherotrites)

BACTERIA ARCHAEA

archaeal toxins (Nanoarchaea)

lytic murein transglycosylase

stringent response (Diapherotrites, Nanoarchaea)

ppGpp

limitingamino acids

SpotT RelA

(GTP or GDP)+ PPi

GTP or GDP+ATP

limitingphosphate,fatty acids,carbon, iron

DksA

Expression of components for stress response

sigma factor (Diapherotrites, Nanoarchaea)

!4

"#$#"%

!2!3 !1

-35 -10

&'()

&*()

+',#-./0123452

oxidoretucase

+ +e- donor e- acceptor

H

'Ribo

ADP

+

'62

O

Reduction

OxidationH

'Ribo

ADP

'6

O

2H

',)##$#6##$#72#####################',)6+ + -

HGT from Eukaryotes (Nanoarchaea)

Eukaryota

O68*62

OH

'6

*8*63

OO

68*62

'6

*8*63

O

tetra-peptide

O68*62

OH

'6

*8*63

OO

68*62

'6

*8*63

O

tetra-peptide

murein (peptido-glycan)

archaeal type purine synthesis (Microgenomates)

PurFPurD9:3'PurL/QPurMPurKPurE9:3*PurB

PurP

?

Archaea

adenine guanine

O

6##'2

+'

'62

'

'

H

H

'

'

'

H

HH' '

H

PRPP ;,<*,+

IMP

,<*,+

A*

GUA *G U

GU

A

*

GU

A UA * U

A * U

Growing AA chain

=+',>?/0@#recognizes

UGA1+',

UGA recoded for Gly (Gracilibacteria)

ribosome

Woyke et al. Nature 2013.

Wednesday, October 23, 13

Page 85: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

A Genomic Encyclopedia of Microbes (GEM)

Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree

Wednesday, October 23, 13

Page 86: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Improving Phylogenomics II

• Better Methods

Wednesday, October 23, 13

Page 87: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

iSEEM

Wednesday, October 23, 13

Page 88: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Zorro - Automated Masking

ce to

Tru

e Tr

ee

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

200 400 800 1600 3200

Dist

ance

to T

rue

Tree

Sequence Length

200

no maskingzorrogblocks

Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty in Phylogenomics. PLoS ONE 7(1): e30288. doi:10.1371/journal.pone.0030288

Wednesday, October 23, 13

Page 89: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Kembel Combiner

cally defined by a sequence similarity threshold) in the sampleas equally related. Newer ! diversity measures that incorporatephylogenetic information are more powerful because they ac-count for the degree of divergence between sequences (13, 18,29, 30). Phylogenetic ! diversity measures can also be eitherquantitative or qualitative depending on whether abundance istaken into account. The original, unweighted UniFrac measure(13) is a qualitative measure. Unweighted UniFrac measuresthe distance between two communities by calculating the frac-tion of the branch length in a phylogenetic tree that leads todescendants in either, but not both, of the two communities(Fig. 1A). The fixation index (FST), which measures thedistance between two communities by comparing the geneticdiversity within each community to the total genetic diversity ofthe communities combined (18), is a quantitative measure thataccounts for different levels of divergence between sequences.The phylogenetic test (P test), which measures the significanceof the association between environment and phylogeny (18), istypically used as a qualitative measure because duplicate se-quences are usually removed from the tree. However, the Ptest may be used in a semiquantitative manner if all clones,even those with identical or near-identical sequences, are in-cluded in the tree (13).

Here we describe a quantitative version of UniFrac that wecall “weighted UniFrac.” We show that weighted UniFrac be-haves similarly to the FST test in situations where both are

applicable. However, weighted UniFrac has a major advantageover FST because it can be used to combine data in whichdifferent parts of the 16S rRNA were sequenced (e.g., whennonoverlapping sequences can be combined into a single treeusing full-length sequences as guides). We use two differentdata sets to illustrate how analyses with quantitative and qual-itative ! diversity measures can lead to dramatically differentconclusions about the main factors that structure microbialdiversity. Specifically, qualitative measures that disregard rel-ative abundance can better detect effects of different foundingpopulations, such as the source of bacteria that first colonizethe gut of newborn mice and the effects of factors that arerestrictive for microbial growth such as temperature. In con-trast, quantitative measures that account for the relative abun-dance of microbial lineages can reveal the effects of moretransient factors such as nutrient availability.

MATERIALS AND METHODS

Weighted UniFrac. Weighted UniFrac is a new variant of the original un-weighted UniFrac measure that weights the branches of a phylogenetic treebased on the abundance of information (Fig. 1B). Weighted UniFrac is thus aquantitative measure of ! diversity that can detect changes in how many se-quences from each lineage are present, as well as detect changes in which taxaare present. This ability is important because the relative abundance of differentkinds of bacteria can be critical for describing community changes. In contrast,the original, unweighted UniFrac (Fig. 1A) is a qualitative ! diversity measurebecause duplicate sequences contribute no additional branch length to the tree(by definition, the branch length that separates a pair of duplicate sequences iszero, because no substitutions separate them).

The first step in applying weighted UniFrac is to calculate the raw weightedUniFrac value (u), according to the first equation:

u ! !i

n

bi " "Ai

AT#

Bi

BT"

Here, n is the total number of branches in the tree, bi is the length of branch i,Ai and Bi are the numbers of sequences that descend from branch i in commu-nities A and B, respectively, and AT and BT are the total numbers of sequencesin communities A and B, respectively. In order to control for unequal samplingeffort, Ai and Bi are divided by AT and BT.

If the phylogenetic tree is not ultrametric (i.e., if different sequences in thesample have evolved at different rates), clustering with weighted UniFrac willplace more emphasis on communities that contain quickly evolving taxa. Sincethese taxa are assigned more branch length, a comparison of the communitiesthat contain them will tend to produce higher values of u. In some situations, itmay be desirable to normalize u so that it has a value of 0 for identical commu-nities and 1 for nonoverlapping communities. This is accomplished by dividing uby a scaling factor (D), which is the average distance of each sequence from theroot, as shown in the equation as follows:

D ! !j

n

dj " #Aj

AT$

Bj

BT$

Here, dj is the distance of sequence j from the root, Aj and Bj are the numbersof times the sequences were observed in communities A and B, respectively, andAT and BT are the total numbers of sequences from communities A and B,respectively.

Clustering with normalized u values treats each sample equally instead of

TABLE 1. Measurements of diversity

Measure Measurement of " diversity Measurement of ! diversity

Only presence/absence of taxa considered Qualitative (species richness) QualitativeAdditionally accounts for the no. of times that

each taxon was observedQuantitative (species richness and evenness) Quantitative

FIG. 1. Calculation of the unweighted and the weighted UniFracmeasures. Squares and circles represent sequences from two differentenvironments. (a) In unweighted UniFrac, the distance between thecircle and square communities is calculated as the fraction of thebranch length that has descendants from either the square or the circleenvironment (black) but not both (gray). (b) In weighted UniFrac,branch lengths are weighted by the relative abundance of sequences inthe square and circle communities; square sequences are weightedtwice as much as circle sequences because there are twice as many totalcircle sequences in the data set. The width of branches is proportionalto the degree to which each branch is weighted in the calculations, andgray branches have no weight. Branches 1 and 2 have heavy weightssince the descendants are biased toward the square and circles, respec-tively. Branch 3 contributes no value since it has an equal contributionfrom circle and square sequences after normalization.

VOL. 73, 2007 PHYLOGENETICALLY COMPARING MICROBIAL COMMUNITIES 1577

Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214

Wednesday, October 23, 13

Page 90: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Kembel Copy # Correction

Kembel SW, Wu M, Eisen JA, Green JL (2012) Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance. PLoS Comput Biol 8(10): e1002743. doi:10.1371/journal.pcbi.1002743

Wednesday, October 23, 13

Page 91: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

alignment used to build the profile, resulting in a multiplesequence alignment of full-length reference sequences andmetagenomic reads. The final step of the alignment process is aquality control filter that 1) ensures that only homologous SSU-rRNA sequences from the appropriate phylogenetic domain areincluded in the final alignment, and 2) masks highly gappedalignment columns (see Text S1).We use this high quality alignment of metagenomic reads and

references sequences to construct a fully-resolved, phylogenetictree and hence determine the evolutionary relationships betweenthe reads. Reference sequences are included in this stage of theanalysis to guide the phylogenetic assignment of the relativelyshort metagenomic reads. While the software can be easilyextended to incorporate a number of different phylogenetic toolscapable of analyzing metagenomic data (e.g., RAxML [27],pplacer [28], etc.), PhylOTU currently employs FastTree as adefault method due to its relatively high speed-to-performanceratio and its ability to construct accurate trees in the presence ofhighly-gapped data [29]. After construction of the phylogeny,lineages representing reference sequences are pruned from thetree. The resulting phylogeny of metagenomic reads is then used tocompute a PD distance matrix in which the distance between apair of reads is defined as the total tree path distance (i.e., branchlength) separating the two reads [30]. This tree-based distancematrix is subsequently used to hierarchically cluster metagenomicreads via MOTHUR into OTUs in a fashion similar to traditionalPID-based analysis [31]. As with PID clustering, the hierarchicalalgorithm can be tuned to produce finer or courser clusters,corresponding to different taxonomic levels, by adjusting theclustering threshold and linkage method.To evaluate the performance of PhylOTU, we employed

statistical comparisons of distance matrices and clustering resultsfor a variety of data sets. These investigations aimed 1) to compare

PD versus PID clustering, 2) to explore overlap between PhylOTUclusters and recognized taxonomic designations, and 3) to quantifythe accuracy of PhylOTU clusters from shotgun reads relative tothose obtained from full-length sequences.

PhylOTU Clusters Recapitulate PID ClustersWe sought to identify how PD-based clustering compares to

commonly employed PID-based clustering methods by applyingthe two methods to the same set of sequences. Both PID-basedclustering and PhylOTU may be used to identify OTUs fromoverlapping sequences. Therefore we applied both methods to adataset of 508 full-length bacterial SSU-rRNA sequences (refer-ence sequences; see above) obtained from the Ribosomal DatabaseProject (RDP) [25]. Recent work has demonstrated that PID ismore accurately calculated from pairwise alignments than multiplesequence alignments [32–33], so we used ESPRIT, whichimplements pairwise alignments, to obtain a PID distance matrixfor the reference sequences [32]. We used PhylOTU to compute aPD distance matrix for the same data. Then, we used MOTHUR tohierarchically cluster sequences into OTUs based on both PIDand PD. For each of the two distance matrices, we employed arange of clustering thresholds and three different definitions oflinkage in the hierarchical clustering algorithm: nearest-neighbor,average, and furthest-neighbor.To statistically evaluate the similarity of cluster composition

between of each pair of clustering results, we used two summarystatistics that together capture the frequency with which sequencesare co-clustered in both analyses: true conjunction rate (i.e., theproportion of pairs of sequences derived from the same cluster inthe first analysis that also are clustered together in the secondanalysis) and true disjunction rate (i.e., the proportion of pairs ofsequences derived from different clusters in the first analysis thatalso are not clustered together in the second analysis) (see Methods

Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in this generalizeworkflow of PhylOTU. See Results section for details.doi:10.1371/journal.pcbi.1001061.g001

Finding Metagenomic OTUs

PLoS Computational Biology | www.ploscompbiol.org 3 January 2011 | Volume 7 | Issue 1 | e1001061

Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA, Pollard KS. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061

Sharpton PhylOTU

Wednesday, October 23, 13

Page 92: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

NMF in MetagenomesCharacterizing the niche-space distributions of componentsS

ite

s

N orth American E ast C oast_G S 005_E mbayment

N orth American E ast C oast_G S 002_C oasta l

N orth American E ast C oast_G S 003_C oasta l

N orth American E ast C oast_G S 007_C oasta l

N orth American E ast C oast_G S 004_C oasta l

N orth American E ast C oast_G S 013_C oasta l

N orth American E ast C oast_G S 008_C oasta l

N orth American E ast C oast_G S 011_E stuary

N orth American E ast C oast_G S 009_C oasta l

E astern Tropica l Pacific_G S 021_C oasta l

N orth American E ast C oast_G S 006_E stuary

N orth American E ast C oast_G S 014_C oasta l

Polynesia Archipelagos_G S 051_C ora l R eef Atoll

G alapagos Islands_G S 036_C oasta l

G alapagos Islands_G S 028_C oasta l

Indian O cean_G S 117a_C oasta l sample

G alapagos Islands_G S 031_C oasta l upwelling

G alapagos Islands_G S 029_C oasta l

G alapagos Islands_G S 030_W arm S eep

G alapagos Islands_G S 035_C oasta l

S argasso S ea_G S 001c_O pen O cean

E astern Tropica l Pacific_G S 022_O pen O cean

G alapagos Islands_G S 027_C oasta l

Indian O cean_G S 149_H arbor

Indian O cean_G S 123_O pen O cean

C aribbean S ea_G S 016_C oasta l S ea

Indian O cean_G S 148_Fringing R eef

Indian O cean_G S 113_O pen O cean

Indian O cean_G S 112a_O pen O cean

C aribbean S ea_G S 017_O pen O cean

Indian O cean_G S 121_O pen O cean

Indian O cean_G S 122a_O pen O cean

G alapagos Islands_G S 034_C oasta l

C aribbean S ea_G S 018_O pen O cean

Indian O cean_G S 108a_Lagoon R eef

Indian O cean_G S 110a_O pen O cean

E astern Tropica l Pacific_G S 023_O pen O cean

Indian O cean_G S 114_O pen O cean

C aribbean S ea_G S 019_C oasta l

C aribbean S ea_G S 015_C oasta l

Indian O cean_G S 119_O pen O cean

G alapagos Islands_G S 026_O pen O cean

Polynesia Archipelagos_G S 049_C oasta l

Indian O cean_G S 120_O pen O cean

Polynesia Archipelagos_G S 048a_C ora l R eef

Component 1

Component 2

Component 3

Component 4

Component 5

0 .1 0 .2 0 .3 0 .4 0 .5 0 .6

0 .2 0 .4 0 .6 0 .8 1 .0

Salin

ity

Sam

ple

Dep

th

Ch

loro

ph

yll

Tem

pera

ture

Inso

lati

on

Wate

r D

ep

th

G enera l

H ighM ediumLowN A

H ighM ediumLowN A

W ater depth

>4000m2000!4000m900!2000m100!200m20!100m0!20m

>4000m2000!4000m900!2000m100!200m20!100m0!20m

(a) (b) (c)

Figure 3: a) Niche-space distributions for our five components (HT ); b) the site-similarity matrix (HT H); c) environmental variables for the sites. The matrices arealigned so that the same row corresponds to the same site in each matrix. Sites areordered by applying spectral reordering to the similarity matrix (see Materials andMethods). Rows are aligned across the three matrices.

Figure 3a shows the estimated niche-space distribution for each of the five com-ponents. Components 2 (Photosystem) and 4 (Unidentified) are broadly distributed;Components 1 (Signalling) and 5 (Unidentified) are largely restricted to a handful ofsites; and component 3 shows an intermediate pattern. There is a great deal of overlapbetween niche-space distributions for di�erent components.

Figure 3b shows the pattern of filtered similarity between sites. We see clear pat-terns of grouping, that do not emerge when we calculate functional distances withoutfiltering, or using PCA rather than NMF filtering (Figure 3 in Text S1). As withthe Pfams, we see clusters roughly associated with our components, but there is moreoverlapping than with the Pfam clusters (Figure 2b).

Figure 3c shows the distribution of environmental variables measured at each site.Inspection of Figure 3 reveals qualitative correspondence between environmental factorsand clusters of similar sites in the similarity matrix. For example, the “North AmericanEast Coast” samples are divided into two groups, one in the top left and the other in thebottom right of the similarity matrix. Inspection of the environmental features suggeststhat the split in these samples could be mostly due to the di�erences in insolation andwater depth.

We can also examine patterns of similarity between the components themselves,using niche-site distributions or functional profiles (see Figure 5 in Text S1). All 5

8

Functional biogeography of ocean microbes revealed through non-negative matrixfactorization Jiang et al. In press PLoS One. Comes out 9/18.

w/ Weitz, Dushoff, Langille, Neches, Levin, etc

Wednesday, October 23, 13

Page 93: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Phylosift - Mining the Global Metagenome

Jonathan Eisen

Students and other staff: - Eric Lowe, John Zhang, David Coil

Open source community: - BLAST, LAST, HMMER, Infernal, pplacer, Krona, metAMOS, Bioperl, Bio::Phylo, JSON, etc. etc.

PhyloSift is open source software:-http://phylosift.wordpress.org-http://github.com/gjospin/phylosift

Erick MatsenFHCRC

Todd TreangenBNBI, NBACC

Holly Bik

TiffanieNelson

MarkBrown

Aaron Darling

Guillaume Jospin

Supported by DHS GrantWednesday, October 23, 13

Page 94: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Phylosift/ pplacer Workflow

Aaron Darling, Guillaume Jospin, Holly Bik, Erik Matsen, Eric Lowe, and others

Input Sequences rRNA workflow

protein workflow

profile HMMs used to align candidates to reference alignment

Taxonomic Summaries

parallel option

hmmalign multiple alignment

LAST fast candidate search

pplacer phylogenetic placement

LAST fast candidate search

LAST fast candidate search

search input against references

hmmalign multiple alignment

hmmalign multiple alignment

Infernal multiple alignment

LAST fast candidate search

<600 bp

>600 bp

Sample Analysis & Comparison

Krona plots, Number of reads placed

for each marker gene

Edge PCA, Tree visualization, Bayes factor tests

each

inpu

t seq

uenc

e sc

anne

d ag

ains

t bot

h w

orkf

low

s

Wednesday, October 23, 13

Page 95: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Markers

• PMPROK – Dongying Wu’s Bac/Arch markers

• Eukaryotic Orthologs – Parfrey 2011 paper• 16S/18S rRNA • Mitochondria - protein-coding genes• Viral Markers – Markov clustering on

genomes• Codon Subtrees – finer scale taxonomy• Extended Markers – plastids, gene families

Wednesday, October 23, 13

Page 96: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Output 1: Taxonomy

Taxonomic summary plots in Krona (Ondov et al 2011)

Wednesday, October 23, 13

Page 97: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Output 2: Phylogenetic Tree of ReadsPlacement tree from 2 week old infant gut data

Wednesday, October 23, 13

Page 98: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

QIIME and Edge PCA on 110 fecal metagenomes from

Yatsunenko et al 2012 Nature.

Sequenced with 454, to about 150Mbp/metagenome

Darling et al Submitted.

Edge PCA vs. UNIFRAC PCA

Edge PCA: Matsen and Evans 2013

Wednesday, October 23, 13

Page 99: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Improving Phylogenomics III

• Better Data Sets

Wednesday, October 23, 13

Page 100: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

More Markers

Phylogenetic group Genome Number

Gene Number

Maker Candidates

Archaea 62 145415 106Actinobacteria 63 267783 136Alphaproteobacteria 94 347287 121Betaproteobacteria 56 266362 311Gammaproteobacteria 126 483632 118Deltaproteobacteria 25 102115 206Epislonproteobacteria 18 33416 455Bacteriodes 25 71531 286Chlamydae 13 13823 560Chloroflexi 10 33577 323Cyanobacteria 36 124080 590Firmicutes 106 312309 87Spirochaetes 18 38832 176Thermi 5 14160 974Thermotogae 9 17037 684

Wu D, Jospin G, Eisen JA (2013) Systematic Identification of Gene Families for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological Studies of Bacteria and Archaea and Their Major Subgroups. PLoS ONE 8(10): e77033. doi:10.1371/journal.pone.0077033

Wednesday, October 23, 13

Page 101: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Sifting FamiliesRepresentative

Genomes

ExtractProtein

Annotation

All v. AllBLAST

HomologyClustering

(MCL)

SFams

Align & Build

HMMs

HMMs

Screen forHomologs

NewGenomes

ExtractProtein

Annotation

Figure 1

Sharpton et al. 2012.BMC bioinformatics, 13(1), 264.

AB

C

��

�� �

��

��

��

��

��

��

��

��

��

� ��

��

��

��

��

��

� �

��

� �

��

��

� �

��

��

� �

� �

� �

��

��

� ��

��

��

��

��

��

��

��

��

��

� �

��

��

� �

��

��

� �

��

��

��

��

��

��

��

� �

��

��

���

��

��

� �

��

��

��

� ��

��

� �

��

��

� �

� �� �

� �

��

��

��

��

���

� �

��

� �

��

��

��

��

��

��

��

���

��

��

��

��

��

� �

��

��

��

��

��

��

���

��

��

��

��

��

� �

��

� �

��

�� �

��

��

� �

��

��

��

��

��

��

��

��

�� �

��

��

��

���

��

��

��

��

��

�� �

�� �

��

��

��

��

��

�� �

��

� ��

� �

��

��

��

� �

��

� �

��

� �

��

��

��

��

��

� �

��

��

��

� �

��

��

��

��

��

��

��

��

��

� �

��

��

��

��

��

� �

��

Wednesday, October 23, 13

Page 102: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Better Reference Tree

Lang JM, Darling AE, Eisen JA (2013) Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees and Supermatrices. PLoS ONE 8(4): e62510. doi:10.1371/journal.pone.0062510

Wednesday, October 23, 13

Page 103: "Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan Eisen at U. Washington on

Acknowledgements• GEBA:

• $$: DOE-JGI, DSMZ• Eddy Rubin, Phil Hugenholtz, Hans-Peter Klenk, Nikos Kyrpides, Tanya Woyke, Dongying Wu, Aaron Darling,

Jenna Lang• GEBA Cyanobacteria

• $$: DOE-JGI• Cheryl Kerfeld, Dongying Wu, Patrick Shih

• Haloarchaea• $$$ NSF• Marc Facciotti, Aaron Darling, Erin Lynch,

• Phylosift• $$$ DHS• Aaron Darling, Erik Matsen, Holly Bik, Guillaume Jospin

• iSEEM:• $$: GBMF• Katie Pollard, Jessica Green, Martin Wu, Steven Kembel, Tom Sharpton, Morgan Langille, Guillaume Jospin,

Dongying Wu, • aTOL

• $$: NSF• Naomi Ward, Jonathan Badger, Frank Robb, Martin Wu, Dongying Wu

• Others (not mentioned in detail)• $$: NSF, NIH, DOE, GBMF, DARPA, Sloan• Frank Robb, Craig Venter, Doug Rusch, Shibu Yooseph, Nancy Moran, Colleen Cavanaugh, Josh Weitz• EisenLab: Srijak Bhatnagar, Russell Neches, Lizzy Wilbanks, Holly Bik

Wednesday, October 23, 13