Upload
others
View
20
Download
0
Embed Size (px)
Citation preview
Genome-Based Microbial TaxonomyComing of Age
Philip Hugenholtz, Adam Skarshewski, and Donovan H. Parks
Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The Universityof Queensland, St Lucia QLD 4072, Australia
Correspondence: [email protected]
Reconstructing the complete evolutionary history of extant life on our planet will be one ofthe most fundamental accomplishments of scientific endeavor, akin to the completion of theperiodic table, which revolutionized chemistry. The road to this goal is via comparativegenomics because genomes are our most comprehensive and objective evolutionary docu-ments. The genomes of plant and animal species have been systematically targeted over thepast decade to provide coverage of the tree of life. However, multicellular organisms onlyemerged in the last 550 million years of more than three billion years of biological evolutionand thus comprise a small fraction of total biological diversity. The bulk of biodiversity, bothpast and present, is microbial. We have only scratched the surface in our understanding of themicrobial world, as most microorganisms cannot be readily grown in the laboratory andremain unknown to science. Ground-breaking, culture-independent molecular techniquesdeveloped over the past 30 years have opened the door to this so-called microbial dark matterwith an accelerating momentum driven byexponential increases in sequencing capacity. Weare on the verge of obtaining representative genomes across all life for the first time. However,historical use of morphology, biochemical properties, behavioral traits, and single-markergenes to infer organismal relationships mean that the existing highly incomplete tree isriddled with taxonomic errors. Concerted efforts are now needed to synthesize and integratethe burgeoning genomic data resources into a coherent universal tree of life and genome-based taxonomy.
SETTING THE STAGE FOR A TAXONOMICCLASSIFICATION BASED ONEVOLUTIONARY RELATIONSHIPS
Closely following on from Darwin’s thesisthat all life forms on our planet arose
from a common ancestor (Darwin 1859), havebeen biologists’ attempts to classify life natural-istically according to evolution. Initially andunderstandably, phenotype (morphology, de-velopment, etc.) was the primary basis for in-
ferring relationships between organisms result-ing in a schema that lumped all microbial life(which had been discovered 200 years earlierthrough the advent of the microscope) into asingle “primitive” kingdom at the base of thetree (Fig 1A) (Haeckel 1866). The discovery ofthe structure of DNA in the latter half of the20th century and its role as the heritable blue-print of life led to the proposal that genesare a more objective basis than phenotype forinferring evolutionary (phylogenetic) relation-
Editor: Howard Ochman
Additional Perspectives on Microbial Evolution available at www.cshperspectives.org
Copyright # 2016 Cold Spring Harbor Laboratory Press; all rights reserved; doi: 10.1101/cshperspect.a018085
Cite this article as Cold Spring Harb Perspect Biol 2016;8:a018085
1
on November 26, 2020 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from
ships (Zuckerkandl and Pauling 1965). CarlWoese ran with this idea focusing on compari-son of universally conserved components of theprotein manufacturing machinery in the cell,the ribosomal RNA (rRNA) genes, which hecorrectly reasoned would produce an objectivetree of life. His first trees and all subsequentefforts turned the phenotype-based tree on itshead; instead of microorganisms occupyinga lowly corner of the tree, all multicellular lifeclustered together in a corner of one of threenewly described primary lines of descent (Fig.1B) (Woese and Fox 1977). Small subunit ribo-somal RNA- (16S rRNA)-based classification ofbacteria and archaea was enthusiastically em-braced by microbiologists following Woese’sdiscoveries, in large part because natural rela-tionships between microbes are virtually un-detectable using phenotypic properties (Stanierand Van Niel 1962). Thirty years on, 16S rRNAsequences form the basis of microbial classifi-cation; however, vast numbers of discrepanciesexist between taxonomy and phylogeny withmany currently defined taxa not forming evo-lutionarily coherent (monophyletic) groups. Aconspicuous case in point is the genus Clostrid-ium, which is superficially united by a commonmorphology and ability to produce endospores,but represents dozens of phylogenetically dis-tinct groups within the phylum Firmicutes (Yu-tin and Galperin 2013). This greatly impedesour ability to understand the ecology andevolution of ecosystems, such as mammalianguts, where clostridia are important functionalpopulations. The large number of unresolvedtaxonomic errors is due to a combination ofhistorical artifacts (phenotypic classification)and limitations with rRNA gene trees such aspoor phylogenetic resolution, inadequate refer-ence sequences, and sequencing artifacts (no-tably chimeras, as discussed below). Genometrees inferred that using multiple marker genesoffer greater phylogenetic resolution than16S rRNA and other single-marker gene trees(Ciccarelli et al. 2006; Lang et al. 2013) and are,therefore, a more reliable basis for taxono-mic classification. However, publicly availablegenome sequences are still far from represen-tative of microbial diversity as a whole, as
revealed by the development of culture-inde-pendent methods.
MOST MICROBIAL DIVERSITY ISUNCULTURED BUT HAS RECENTLY BECOMEREADILY ACCESSIBLE GENOMICALLY
Culture-independent molecular techniquesfounded on 16S rRNA in the mid-1980s (Olsenet al. 1986) highlighted our ignorance of most ofthe tree of life by crudely outlining its borders.This was achieved by sequencing 16S rRNAgenes from bulk DNAs extracted directly fromenvironmental sources (Pace 1997). This typeof microbial community profiling has improvedwith increased sequencing and computing ca-pacity. The startling conclusion from morethan two decades of such culture-independentenvironmental sequence surveys is that .80%of microbial evolutionary diversity is represent-ed by uncultured microorganisms distributedacross upward of 100 major lines of descentwithin the Bacteria and Archaea (Harris et al.2013) and that the amount of recognized micro-bial dark matter is still increasing (Fig. 2).
The task to obtain representative genomiccoverage of all recognized microbial diversity,estimated conservatively to represent hundredsof thousands of species (Curtis and Sloan2005), is daunting. However, two promisingculture-independent approaches have emergedto achieve this goal. The first is metagenomics,the application of high-throughput sequencingof DNA extracted directly from environmentalsamples, and the second is single-cell genomics,the physical separation and amplification ofcells before sequencing. Initially, it was unfeasi-ble to extract genomes of individual popula-tions from metagenomic data (a bioinformaticprocess called binning) because of insufficientsequencing depth and inadequate binningtools. Only in some instances could completeor near-complete genomes be reconstructedfrom environmental sequences, and these weretypically of dominant populations with mini-mal genomic heterogeneity (Tyson et al. 2004;Garcia Martin et al. 2006; Elkins et al. 2008).In contrast, most populations in early meta-genomic studies remained unidentified, being
P. Hugenholtz et al.
2 Cite this article as Cold Spring Harb Perspect Biol 2016;8:a018085
on November 26, 2020 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from
Flexibac
terFla
vobact
erium
Planctomyces
Agrobacterium
Mitochondrion
Rhodocycl
us
Escher
ichia
Desulfovib
rio
Chloroplas
t
Synechococcus
Gloeobacter
Chlam
ydia
Chlorob
ium
Clostrid
ium
Bac
illus
Helioba
cterium
Arth
roba
cter
pOP
S19
pOPS66
Roo
t
pJP 78pSL 50
Thermo
filum
Thermoproteus
Pyrod
ictium
Sulfolobu
s
Gp.
3 lo
w te
mp
Gp.
2 lo
w te
mp
Gp.
1 lo
w te
mp
Mar
ine
Gp.
1 lo
w te
mp
Metha
nosp
irillum
Methano
bacteiu
m
Methanothermus
Methanopyrus
Haloferax
Archaeoglobus
Therm
oplas
ma
Marine low temp
Methanococcus
Thermococc
uspS
L 22pS
L 12
pJP 27
Coprinu
sHom
o Zea
Cry
ptom
onas
Ach
lya
Costaria
Porphyra
Param
ecium Babesia
Dictyostelium
Entamoeba
Naegleria
Euglena
Encephali
tozoon
Trichom
onas
Vairim
orpha
Giardia
Trypanosom
aPhysaru
m
Chloroflexus
Thermus
Aquifex
EM17
Thermotoga
Lepton
ema
Bac
teria
Archae
a
Euca
rya
0.1
chan
ges/
site
AB
Figu
re1.
Two
rep
rese
nta
tio
ns
of
the
tree
of
life
.T
he
firs
t(A
)b
ased
on
ph
eno
typ
icco
mp
aris
on
sre
sult
ing
inlu
mp
ing
of
mic
roo
rgan
ism
sin
toa
sin
gle
un
dif
fere
nti
ated
mas
sat
the
bas
eo
fth
etr
ee(c
ircl
edin
red
),an
dth
ese
con
d(B
)b
ased
on
gen
oty
pic
(rR
NA
gen
es)
com
par
iso
ns
reve
alin
gth
atm
ost
div
ersi
ty(i
ncl
ud
ing
the
Eu
cary
a)is
actu
ally
mic
rob
ial
wit
hm
ult
icel
lula
rli
fefo
rms
on
lyem
ergi
ng
rela
tive
lyre
cen
tly
inev
olu
tio
nar
yh
isto
ry(c
ircl
edin
red
).(P
anel
Ais
rep
rod
uce
dfr
om
Hae
kcel
1866
,an
dis
free
lyav
aila
ble
inth
ep
ub
lic
do
mai
nan
dfr
eeo
fkn
own
rest
rict
ion
su
nd
erco
pyr
igh
tla
w.P
anel
Bis
bas
edo
nF
igu
re2
inB
arn
set
al.
1996
.)
Genome-Based Microbial Taxonomy Coming of Age
Cite this article as Cold Spring Harb Perspect Biol 2016;8:a018085 3
on November 26, 2020 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from
represented by a smattering of anonymous se-quence data. Single-cell genomics overcomesthis problem by separating environmental sam-ples into their component populations up front,most efficiently via cell sorting, thereby avoid-ing the need for binning and allowing se-quencing resources to be focused on individualgenomes (Lasken and McLean 2014). The po-tential of this approach for articulating micro-bial dark matter was shown in a recent studydescribing 200 single-cell genomes representing29 major uncultured bacterial and archaeal lin-eages that challenge established boundaries be-tween the three domains of life. These includean archaeal-type purine synthesis in bacteriaand complete sigma factors in Archaea similarto those in Bacteria (Rinke et al. 2013). Despitethis promising start, we need thousands ofmicrobial dark matter genomes, rather thanhundreds, to gain a comprehensive picture ofmicrobial evolution and ecology. Technicalchallenges associated with single-cell genomics,including the need for whole-genome amplifi-
cation and the high potential for contamina-tion, currently preclude this level of scale-up.Moreover, the average completeness of a sin-gle-cell genome is currently only 35% (Parkset al. 2015) because of large biases in genomecoverage introduced during the whole-genomeamplification step (Lasken and McLean 2014).Recently, a new and effective binning strategybased on differing relative abundance patternsof populations between related microbial com-munities has been developed by multiple groups(Albertsen et al. 2013; Alneberg et al. 2013; Sha-ron and Banfield 2013; Imelfort et al. 2014; Kanget al. 2014; Nielsen et al. 2014). This approachuses differential sequencing coverage of a givenpopulation between metagenomic data sets asa proxy of its relative abundance, and leveragesthe much higher genome coverage afforded bymodern sequencers, which produce tens ofbillions of sequence base pairs per run. Unlikesingle-cell genomics, differential coverage bin-ning will scale to produce tens of thousands ofpopulation genomes in a short time frame. For
600016S sequences from clones
16S sequences from named isolates5000
4000
3000
2000
Cum
ulat
ive
phyl
ogen
etic
div
ersi
ty
1000
2000 2001 2002 2003 2004 2005 2006 2007
Year
2008 2009 2010 2011 2012 2013
19%19%
19%20%
22%24%
30%34%
40%46%
51%55%
60%68%
0
Phylogenetic diversityPhylogenetic diversitymissed by isolate sequencesmissed by isolate sequences
Phylogenetic diversitymissed by isolate sequences
Figure 2. Increase in recognized phylogenetic diversity as measured by rRNA sequence novelty (y-axis) over time(x-axis). Recognized diversity has increased by an order of magnitude in the past 10 years, most of which(�80%) is a result of newly identified uncultured lineages (microbial dark matter). Note that the apparentleveling off of novel phylogenetic diversity discovery in 2013 is likely a consequence of the recent move from full-length 16S rRNA sequencing to partial gene sequencing on next-generation sequencing platforms, which are notincluded in this estimate (adapted from Rinke et al. 2013).
P. Hugenholtz et al.
4 Cite this article as Cold Spring Harb Perspect Biol 2016;8:a018085
on November 26, 2020 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from
example, �50 Gbp of sequence data (a singlelane of HiSeq Illumina data) from more thanthree related environmental samples will typi-cally produce 50 to 100 high-quality populationbins (.80% complete, ,10% contaminated)(Imelfort et al. 2014; Parks et al. 2015). Fur-thermore, the method does not require spe-cialized equipment beyond access to high-throughput sequencing and high-performancecomputing, and therefore is being rapidlyadopted by research groups worldwide. It isentirely feasible that .500,000 population ge-nomes will be generated in the next few yearsproviding the necessary volume of data to ad-equately cover all of the major lines of descentin the bacterial and archaeal domains. This willform the basis of a comprehensive genome-based classification framework of microbialdiversity.
RECONCILING TAXONOMY WITHPHYLOGENY
Although phylogenetic inference is an objectiveapproximation of evolutionary history, micro-bial systematics—the classification of micro-organisms into hierarchical taxonomic groups(nominally domain, phylum, class, order, fam-ily, genus, species) based on phylogenetic infer-ence (or other criteria), is a largely subjectivehuman construct to help organize information.Few would argue the natural evolutionary dis-tinction between Bacteria and Archaea so therank of Domain (Kingdom) is not especiallycontroversial, with the exception of the place-ment of the Eucarya (Spang et al. 2015). How-ever, the circumscription of all ranks sub-ordinate to Domain is often fiercely debatedbetween taxonomic “lumpers” and “splitters”(Endersby 2009), which unfortunately dimin-ishes the enterprise in the eyes of other scientistsas they consider it subjective and arbitrary.Microbial systematics has been divorced fromevolutionary theory for much of its history(Doolittle 2015), so the process of reconcilingtaxonomy with phylogeny will serve to unite thetwo. This involves two main tasks.
The first is to identify polyphyletic taxa at allranks (phylum to species) in a phylogenetic tree
and to reclassify them according to a standardset of rules. Figure 3 is a genome-based recon-struction of the Gammaproteobacteria high-lighting some of the polyphyletic orders inthis class. In this tree, the order Alteromonadalescomprises four distinct lines of descent. To cor-rect this conflict between phylogeny and taxon-omy, the group containing the type genus afterwhich the order was named, Alteromonas, is firstidentified (starred in tree) and retains the ordername. The other groups can then be renamedafter the oldest validly described genus in eachgroup (shown in parentheses in Fig. 3), in thiscase Shewanellales after Shewanella, Psychromo-nadales after Psychromonas, and Cellvibrionalesafter Cellvibrio. The three other polyphyleticorders in Figure 3, Aeromonadales, Oceanospir-illales, and Pseudomonadales, can be similarlycorrected. This same approach can be used forall taxonomic ranks above genus, and if no val-idly named isolate exists for a given group,which is indeed the case for the majority ofbranches because of microbial dark matter,groups can be named after environmental 16SrRNA sequences or population genomes, withthe caveat that the name should be unique inthe taxonomy (McDonald et al. 2012). Usingthis approach, we found that 85% of 635 namedisolates belonging to the class Clostridia re-quired reassignment at one or more ranks toreconcile existing taxonomic classificationswith a genome-based phylogeny (Table 1).
The second task concerns the unevenness ofrank assignments according to sequence-basedmetrics. Konstantinidis and Tiedje proposedthe average amino acid identity (AAI) of sharedgenes between two genomes as a measure ofrelatedness (Konstantinidis and Tiedje 2005).They plotted taxonomic ranks as a function ofAAI for 175 fully sequenced strains revealing ahigh degree of overlap between different ranks,even nonadjacent ranks. We repeated this anal-ysis with nearly 6000 genomes and the currentNational Center for Biotechnology Informa-tion (NCBI) taxonomy with much the sameresult—up to five different ranks overlapped ata given AAI and the range of AAIs for eachrankoften exceeded 20% (Fig. 4). Recently, Yarzaet al. (2014) proposed using 16S rRNA gene
Genome-Based Microbial Taxonomy Coming of Age
Cite this article as Cold Spring Harb Perspect Biol 2016;8:a018085 5
on November 26, 2020 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from
sequence identity thresholds to rationally cir-cumscribe different taxonomic ranks. However,like AAIs, this does not take into account differ-ent evolutionary tempos across the tree of life,which can result in an uneven application of tax-onomic ranks as fast-evolving lineages will havelower sequence identities thantheir slowerevolv-ing counterparts for the same divergence times.Ideally, ancestors belonging to the same rank
should have been contemporaries in the past.Therefore, defining and normalizing rank distri-butions using estimated evolutionarydivergenc-es is a more naturalistic approach for assigningranks. Although accurately estimating diver-gence times has proven challenging (Arbogastet al. 2002), branch lengths in a genome-basedphylogeny may provide a suitable approxima-tion if these lengths are appropriately normal-
1639
Pasteurellales
Enterobacteriales
Vibrionales205
105
21 Aeromonadales *
Aeromonadales (Succinivibrionales)
Pseudomonadales *
Alteromonadales (Cellvibrionales)
Pseudomonadales (Moraxellales)
Oceanospirillales (Alcanivorales)
Oceanospirillales *
Oceanospirillales (Kangiellales)
9
164
46
42
314
2
6
60
12
40 Alteromonadales (Shewanellales)
Alteromonadales (Psychromonadales)
Alteromonadales *
Figure 3. A tree of isolate genomes belonging to the class Gammaproteobacteria inferred from a concatenatedalignment of 83 single-copy marker genes. Lineages are collapsed at the order level showing that some aremonophyletic (in black), but many others are polyphyletic (colored), the most extreme case in this instancebeing the Alteromonadales. Renaming of orders to remove polyphyly is shown in parentheses. Numbers insidecollapsed groups indicate the number of genomes comprising a given order.
P. Hugenholtz et al.
6 Cite this article as Cold Spring Harb Perspect Biol 2016;8:a018085
on November 26, 2020 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from
ized to take into account varying evolutionaryrates.
TRANSITIONING FROM A 16S rRNA-TO GENOME-BASED TAXONOMY
To date, 16S rRNA has been the most widelyused gene for inferring evolutionary relation-ships between microorganisms and serves asthe primary basis for microbial taxonomy. Itshigh degree of sequence conservation (Woese1987) has the dual benefits of providing a do-
main-level overview of microbial diversity, andenables culture-independent environmentalsurveys using near universal polymerase chainreaction (PCR) priming sites. The public repos-itory of 16S rRNA gene sequences number inthe millions and it is, therefore, also the mostcomprehensively sequenced marker gene avail-able to us. However, this same sequence conser-vation also leads to chimera formation in envi-ronmental surveys. Chimeras are PCR-inducedartifacts whereby an incompletely synthesized16S rRNA amplicon acts as a primer on a dif-ferent template producing a hybrid molecule.Such mispriming only requires short stretchesof sequence identity between the 30-end ofthe incomplete template and homologous re-gion of the foreign template, which aboundsin 16S rRNA because of its high degree of con-servation. Moreover, very similar chimeras canbe formed in independent studies of similarhabitats in which parent sequences are presentin approximately similar ratios (e.g., Bacter-oides–Clostridium chimeras produced in hu-man gut surveys) (Haas et al. 2011). Chimeras
Table 1 Extent of taxonomic reassignments requiredto reconcile existing taxonomy with genome-basedphylogeny for members of the class Clostridia
No. ranks requiring
reassignment
No. changes/
635
% of
total
0 92 14.51 123 19.42 309 48.73 111 17.5
Only changes in three ranks are included: order, family,
and genus.
20
40
60
80
100
0
38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98
Average amino acid identity (AAI)
Per
cent
dis
trib
utio
n
Diff. domains
Domain
Phylum
Class
Order
Family
Genus
Species
Figure 4. A histogram of pairwise average amino acid identities (AAI) between 290 archaeal and 5582 bacterialreference genomes colored by rank affiliation showing the uneveness of current taxonomic classifications.Extreme outliers include isolates classified in the same species (red) with only 68% AAI (Bdellovibrio bacter-iovorus) and others classified in different genera (green) sharing as high as 94% AAI (e.g., Tannerella andCoprobacter) (adapted from Konstantinidis and Tiedje 2005).
Genome-Based Microbial Taxonomy Coming of Age
Cite this article as Cold Spring Harb Perspect Biol 2016;8:a018085 7
on November 26, 2020 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from
have been recognized as a problem associatedwith PCR-based surveys since the 1990s andalarms have been raised about the growingnumber of undetected chimeras in the publicdatabases (Hugenholtz and Huber 2003; Ashel-ford et al. 2005). Environmental sequences nowdominate the public databases in both numberand phylogenetic diversity coverage, unlike theearly 2000s when they still represented a minor-ity of available data (Fig. 1). It is very likely thatthe problem of undetected chimeras has in-creased accordingly as suggested by the pooroverlap between different chimera-detectiontools (Haas et al. 2011). All such tools detectchimeric sequences by comparison to a refer-ence set of 16S rRNA sequences. If the referenceset is compromised by undetected chimeras,detection of new chimeric sequences is alsocompromised. In short, chimeras are a recog-nized but underestimated problem in 16S rRNAdata sets that artificially inflate diversity esti-mates and introduce noise into phylogenetictrees, ultimately compromising taxonomic clas-sifications based on these trees.
Concatenated protein marker gene trees de-rived from isolate and population genomesare much less susceptible to chimeric artifactsand provided completeness and contaminationchecks used for the latter (Parks et al. 2015).Combined with the higher phylogenetic resolu-tion afforded by concatenated markers, suchtrees are the logical choice for a phylogeneticallyreconciled taxonomy. However, the 16S rRNAdatabase is currently two orders of magnitudelarger than the genome database and significantefforts have been invested into curating 16SrRNA trees for taxonomic classification (Kimet al. 2012; McDonald et al. 2012; Quast et al.2012; Cole et al. 2013). Therefore, and despiteambiguities arising from chimeric sequences,efforts should be made to incorporate 16SrRNA-based classifications into genome-basedtaxonomy to provide taxonomic continuity inthe literature. For microbial isolates, the processof connecting 16S rRNA to genome sequencesto allow transfer of taxonomic informationshould be straightforward. However, for manyisolates, 16S rRNA genes were sequenced beforetheir genomes, introducing the risk of connect-
ing different taxa between trees. Sequencing oftype material is very important in this context asit provides unambiguous signposts in phyloge-netic trees, not only for transfer of taxonomicinformation, but also to enable correction of thenumerous polyphyly errors that presently existin microbial taxonomy (Fig. 3; Table 1) (Kyrpi-des et al. 2014). For most microbial diversity,however, we are reliant on single-cell or popu-lation genomes to connect 16S rRNA-basedclassifications to their corresponding locationsin genome trees. Unfortunately, rRNA genes areoften difficult to recover in genomes derivedfrom metagenomic data. This is because of thehigh conservation of these genes, which oftenresults in chimeric assembly, and the fact thatthe rRNA operon is typically the largest repeatin a microbial genome if present in greater thanone copy. This confounds differential coveragebinning because the rRNA genes do not binwith the rest of the genome as a result of havinga higher coverage if present in the assembly as acollapsed repeat. Development of new methodsto obtain full-length non-chimeric 16S rRNAgenes for population genomes should be a pri-ority as this will enable a smooth transition to agenome-based microbial taxonomy. Moreover,16S rRNA is likely to remain a valuable tool inthe microbiologist’s toolbox for the foreseeablefuture, for example, 16S rRNA-targeted fluores-cence in situ hybridization (Amann et al. 2001),so it cannot be easily ignored as we transition toa genome-based taxonomy.
APPLICATIONS FOR A REPRESENTATIVEGENOME TREE AND GENOME-BASEDTAXONOMY
One important and largely open question is theextent to which lateral gene transfer (LGT) playsa role in microbial evolution and ecology. Thistopic was first broached in a holistic mannerwith the availability of the first microbial ge-nome sequences. Phylogenetic analysis of nu-merous gene families revealed a high level ofinferred discordance with the 16S rRNA-basedtree over evolutionary timescales, raising thequestion as to whether the universal tree of lifewould be better represented as a network be-
P. Hugenholtz et al.
8 Cite this article as Cold Spring Harb Perspect Biol 2016;8:a018085
on November 26, 2020 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from
cause of LGT blurring organismal boundaries(Doolittle 1999). Despite the inference that LGTis a relatively common phenomenon, a compar-ison of 56 genomes selected to maximize 16SrRNA-defined phylogenetic diversity with ran-domly selected genome data sets of the samesize showed that tree-based selection resultedin higher rates of discovery of novel gene fami-lies, gene fusions, and operon arrangements(Wu et al. 2009). These findings argue againstLGT overriding vertical inheritance. However,few would argue against LGT being a potentevolutionary force, the most public face ofwhich is transfer of antibiotic resistance be-tween bacteria (Arber 2014). An analysis ofmicrobial isolate genomes to identify recentlylaterally transferred genes (.99% nucleotideidentity) in humans and the environmentpoints to ecology being a more important driverof LGT than phylogeny, particularly in humans(Smillie et al. 2011). This analysis only consid-ered isolate genomes, however, which do notadequately represent the ecosystems from whichthey were obtained. Population genomicsprovides the means to obtain a representativesampling of the component members of a givenecosystem, and a genome-based taxonomy thenbecomes the essential framework for determin-ing the frequency and mode of LGT betweenorganisms of varying evolutionary relatedness(Beiko et al. 2005).
Current concepts on many topics have like-ly been oversimplified by limited and highlyskewed sampling of microbial diversity, andstand to be overhauled by microbial dark mattergenome sequencing. Recent examples includethe discovery of nonphotosynthetic basal Cya-nobacteria (Di Rienzi et al. 2013; Soo et al.2014) and a large radiation of candidate bacte-rial phyla with unusual ribosome structures andbiogenesis mechanisms that may actually bemore representative of the “average” bacteriumthan our current preconceptions (Rinke et al.2013; Brown et al. 2015). Similarly, categoriza-tion of the bacterial cell envelope as either single(monoderm) or double (diderm) layer struc-tures, based on studies of mostly Firmicutesand Proteobacteria, will become more sophisti-cated with greater representation of candidate
phyla (Sutcliffe 2010). In summary, our knowl-edge of the microbial world is set to expanddramatically in the coming years as microbialdark matter is rapidly illuminated through ge-nome sequencing. Coupling these advances to asystematized genome-based classification willserve to organize this valuable resource into acoherent framework that will greatly facilitateanalysis and communication.
REFERENCES
Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL,Tyson GW, Nielsen PH. 2013. Genome sequences ofrare, uncultured bacteria obtained by differential cover-age binning of multiple metagenomes. Nat Biotechnol31: 533–538.
Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J,Ijaz UZ, Lahti L, Loman NJ, Andersson AF, Quince C.2013. CONCOCT: Clustering cONtigs on COverage andComposiTion. arXiv:1312.4038.
Amann R, Fuchs BM, Behrens S. 2001. The identification ofmicroorganisms by fluorescence in situ hybridisation.Curr Opin Biotechnol 12: 231–236.
Arber W. 2014. Horizontal gene transfer among bacteriaand its role in biological evolution. Life (Basel) 4: 217–224.
Arbogast BS, Edwards SV, Wakeley J, Beerli P, Slowinski JB.2002. Estimating divergence times from molecular dataon phylogenetic and population genetic timescales. AnnuRev Ecol Syst 33: 707–740.
Ashelford KE, Chuzhanova NA, Fry JC, Jones AJ, Weight-man AJ. 2005. At least 1 in 20 16S rRNA sequence recordscurrently held in public repositories is estimated to con-tain substantial anomalies. Appl Environ Microbiol 71:7724–7736.
Beiko RG, Harlow TJ, Ragan MA. 2005. Highways of genesharing in prokaryotes. Proc Natl Acad Sci 102: 14332–14337.
Brown CT, Hug LA, Thomas BC, Sharon I, Castelle CJ, SinghA, Wilkins MJ, Wrighton KC, Williams KH, BanfieldJF. 2015. Unusual biology across a group comprisingmore than 15% of domain bacteria. Nature 523: 208–211.
Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B,Bork P. 2006. Toward automatic reconstruction of a high-ly resolved tree of life. Science 311: 1283–1287.
Cole JR, Wang Q, Fish JA, Chai B, McGarrell DM, Sun Y,Brown CT, Porras-Alfaro A, Kuske CR, Tiedje JM. 2013.Ribosomal Database Project: Data and tools for highthroughput rRNA analysis. Nucleic Acids Res 42: D633–D642.
Curtis TP, Sloan WT. 2005. Microbiology. Exploring micro-bial diversity—A vast below. Science 309: 1331–1333.
Darwin CR. 1859. On the origin of species by means of naturalselection, or the preservation of favoured races in the strug-gle for life, 1st ed. John Murray, London.
Genome-Based Microbial Taxonomy Coming of Age
Cite this article as Cold Spring Harb Perspect Biol 2016;8:a018085 9
on November 26, 2020 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from
Di Rienzi SC, Sharon I, Wrighton KC, Koren O, Hug LA,Thomas BC, Goodrich JK, Bell JT, Spector TD, BanfieldJF, et al. 2013. The human gut and groundwater harbornon-photosynthetic bacteria belonging to a new candi-date phylum sibling to cyanobacteria. eLife 2: e01102.
Doolittle WF. 1999. Phylogenetic classification and the uni-versal tree. Science 284: 2124–2129.
Doolittle WF. 2015. Rethinking the tree of life. Microbe Mag,Oct, www.microbemagazine.org.
Elkins JG, Podar M, Graham DE, Makarova KS, Wolf Y,Randau L, Hedlund BP, Brochier-Armanet C, Kunin V,Anderson I, et al. 2008. A korarchaeal genome revealsinsights into the evolution of the Archaea. Proc NatlAcad Sci 105: 8102–8107.
Endersby J. 2009. Lumpers and splitters: Darwin, Hooker,and the search for order. Science 326: 1496–1499.
Garcıa Martın H, Ivanova N, Kunin V, Warnecke F, BarryKW, McHardy AC, Yeates C, He S, Salamov AA, Szeto E,et al. 2006. Metagenomic analysis of two enhanced bio-logical phosphorus removal (EBPR) sludge communi-ties. Nat Biotechnol 24: 1263–1269.
Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Gian-noukos G, Ciulla D, Tabbaa D, Highlander SK, SodergrenE, et al. 2011. Chimeric 16S rRNA sequence formationand detection in Sanger and 454-pyrosequenced PCRamplicons. Genome Res 21: 494–504.
Haeckel H. 1866. Generelle morphologie der organismen: All-gemeine grundzuge der organischen Formen-Wissenschaft,mechanisch begrundet durch die von Charles Darwin re-formirte descendenztheorie, von Ernst Haeckel. Reimer,Berlin.
Harris JK, Caporaso JG, Walker JJ, Spear JR, Gold NJ,Robertson CE, Hugenholtz P, Goodrich J, McDonald D,Knights D, et al. 2013. Phylogenetic stratigraphy in theGuerrero Negro hypersaline microbial mat. ISME J 7:50–60.
Hugenholtz P, Huber T. 2003. Chimeric 16S rDNA sequenc-es of diverse origin are accumulating in the public data-bases. Int J Syst Evol Microbiol 53: 289–293.
Imelfort M, Parks D, Woodcroft BJ, Dennis P, Hugenholtz P,Tyson GW. 2014. GroopM: An automated tool for therecovery of population genomes from related metage-nomes. PeerJ 2: e603.
Kang DD, Froula J, Egan R, Wang Z. 2014. MetaBAT: Meta-genome binning based on abundance and tetranucleo-tide frequency. Presented at the Ninth Annual DOE JointGenome Institute User Meeting. Walnut Creek, CA.
Kim OS, Cho YJ, Lee K, Yoon SH, Kim M, Na H, Park SC,Jeon YS, Lee JH, Yi H, et al. 2012. Introducing EzTaxon-e:A prokaryotic 16S rRNA gene sequence database withphylotypes that represent uncultured species. Int J SystEvol Microbiol 62: 716–721.
Konstantinidis KT, Tiedje JM. 2005. Towards a genome-based taxonomy for prokaryotes. J Bacteriol 187: 6258–6264.
Kyrpides NC, Hugenholtz P, Eisen JA, Woyke T, Goker M,Parker CT, Amann R, Beck BJ, Chain PS, Chun J, et al.2014. Genomic encyclopedia of bacteria and archaea:Sequencing a myriad of type strains. PLoS Biol 12:e1001920.
Lang JM, Darling AE, Eisen JA. 2013. Phylogeny of bacterialand archaeal genomes using conserved genes: Supertreesand supermatrices. PLoS ONE 8: e62510.
Lasken RS, McLean JS. 2014. Recent advances in genomicDNA sequencing of microbial species from single cells.Nat Rev Genet 15: 577–584.
McDonald D, Price MN, Goodrich J, Nawrocki EP, DeSantisTZ, Probst A, Andersen GL, Knight R, Hugenholtz P.2012. An improved Greengenes taxonomy with explicitranks for ecological and evolutionary analyses of bacteriaand archaea. ISME J 6: 610–618.
Nielsen HB, Almeida M, Juncker AS, Rasmussen S, Li J,Sunagawa S, Plichta DR, Gautier L, Pedersen AG, LeChatelier E, et al. 2014. Identification and assembly ofgenomes and genetic elements in complex metagenomicsamples without using reference genomes. Nat Biotechnol32: 822–828.
Olsen GJ, Lane DJ, Giovannoni SJ, Pace NR, Stahl DA. 1986.Microbial ecology and evolution: A ribosomal RNA ap-proach. Annu Rev Microbiol 40: 337–365.
Pace NR. 1997. A molecular view of microbial diversity andthe biosphere. Science 276: 734–740.
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, TysonGW. 2015. CheckM: Assessing the quality of microbialgenomes recovered from isolates, single cells, and meta-genomes. Genome Res 25: 1043–1055.
Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P,Peplies J, Glockner FO. 2012. The SILVA ribosomal RNAgene database project: Improved data processing andweb-based tools. Nucleic Acids Res 41: D590–D596.
Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ,Cheng JF, Darling A, Malfatti S, Swan BK, et al. 2013.Insights into the phylogeny and coding potential of mi-crobial dark matter. Nature 499: 431–437.
Sharon I, Banfield JF. 2013. Microbiology. Genomes frommetagenomics. Science 342: 1057–1058.
Smillie CS, Smith MB, Friedman J, Cordero OX, David LA,Alm EJ. 2011. Ecology drives a global network of geneexchange connecting the human microbiome. Nature480: 241–244.
Soo RM, Skennerton CT, Sekiguchi Y, Imelfort M, Paech SJ,Dennis PG, Sheen JA, Parks DH, Tyson GW, HugenholtzP. 2014. An expanded genomic representation of the phy-lum Cyanobacteria. Genome Biol Evol 6: 1031–1045.
Spang A, Saw JH, Jørgensen SL, Zaremba-Niedzwiedzka K,Martijn J, Lind AE, van Eijk R, Schleper C, Guy L, EttemaTJ. 2015. Complex archaea that bridge the gap betweenprokaryotes and eukaryotes. Nature 521: 173–179.
Stanier RY, Van Niel CB. 1962. The concept of a bacterium.Arch Mikrobiol 42: 17–35.
Sutcliffe IC. 2010. A phylum level perspective on bacterialcell envelope architecture. Trends Microbiol 18: 464–470.
Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ,Richardson PM, Solovyev VV, Rubin EM, Rokhsar DS,Banfield JF. 2004. Community structure and metabolismthrough reconstruction of microbial genomes from theenvironment. Nature 428: 37–43.
Woese CR. 1987. Bacterial evolution. Microbiol Rev 51: 221–271.
P. Hugenholtz et al.
10 Cite this article as Cold Spring Harb Perspect Biol 2016;8:a018085
on November 26, 2020 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from
Woese CR, Fox GE. 1977. Phylogenetic structure of the pro-karyotic domain: The primary kingdoms. Proc Natl AcadSci 71: 5088–5090.
Wu D, Hugenholtz P, Mavromatis K, Pukall R, Dalin E,Ivanova NN, Kunin V, Goodwin L, Wu M, Tindall BJ,et al. 2009. A phylogeny-driven genomic encyclopaediaof Bacteria and Archaea. Nature 462: 1056–1060.
Yarza P, Yilmaz P, Pruesse E, Glockner FO, Ludwig W, Schlei-fer KH, Whitman WB, Euzeby J, Amann R, Rossello-
Mora R. 2014. Uniting the classification of cultured anduncultured bacteria and archaea using 16S rRNA genesequences. Nat Rev Microbiol 12: 635–645.
Yutin N, Galperin MY. 2013. A genomic update on clos-tridial phylogeny: Gram-negative spore formers andother misplaced clostridia. Environ Microbiol 15: 2631–2641.
Zuckerkandl E, Pauling L. 1965. Molecules as documents ofevolutionary history. J Theor Biol 8: 357–366.
Genome-Based Microbial Taxonomy Coming of Age
Cite this article as Cold Spring Harb Perspect Biol 2016;8:a018085 11
on November 26, 2020 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from
March 17, 20162016; doi: 10.1101/cshperspect.a018085 originally published onlineCold Spring Harb Perspect Biol
Philip Hugenholtz, Adam Skarshewski and Donovan H. Parks Genome-Based Microbial Taxonomy Coming of Age
Subject Collection Microbial Evolution
Genetics, and RecombinationNot So Simple After All: Bacteria, Their Population
William P. HanageAgeGenome-Based Microbial Taxonomy Coming of
Donovan H. ParksPhilip Hugenholtz, Adam Skarshewski and
Realizing Microbial EvolutionHoward Ochman
Horizontal Gene Transfer and the History of LifeVincent Daubin and Gergely J. Szöllosi
EvolutionThe Importance of Microbial Experimental Thoughts Toward a Theory of Natural Selection:
Daniel Dykhuizen
Early Microbial Evolution: The Age of AnaerobesWilliam F. Martin and Filipa L. Sousa
Prokaryotic GenomesCoevolution of the Organization and Structure of
Marie Touchon and Eduardo P.C. Rocha
Microbial SpeciationB. Jesse Shapiro and Martin F. Polz
Mutation and Its Role in the Evolution of BacteriaThe Engine of Evolution: Studying−−Mutation
Ruth HershbergCampylobacter coli
and Campylobacter jejuniThe Evolution of
Samuel K. Sheppard and Martin C.J. Maiden
Mutation)Natural Selection Mimics Mutagenesis (Adaptive The Origin of Mutants Under Selection: How
Sophie Maisnier-Patin and John R. Roth
EvolutionPaleobiological Perspectives on Early Microbial
Andrew H. Knoll
Preexisting GenesEvolution of New Functions De Novo and from
Joakim NäsvallDan I. Andersson, Jon Jerlström-Hultqvist and
http://cshperspectives.cshlp.org/cgi/collection/ For additional articles in this collection, see
Copyright © 2016 Cold Spring Harbor Laboratory Press; all rights reserved
on November 26, 2020 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from