23
Streamlining and Simplification of Microbial Genome Architecture Michael Lynch Department of Biology, Indiana University, Bloomington, Indiana 47405; email: [email protected] Annu. Rev. Microbiol. 2006. 60:327–49 The Annual Review of Microbiology is online at micro.annualreviews.org This article’s doi: 10.1146/annurev.micro.60.080805.142300 Copyright c 2006 by Annual Reviews. All rights reserved 0066-4227/06/1013-0327$20.00 Key Words genome evolution, genomic streamlining, mutation, prokaryotes, random genetic drift, recombination Abstract The genomes of unicellular species, particularly prokaryotes, are greatly reduced in size and simplified in terms of gene structure relative to those of multicellular eukaryotes. Arguments proposed to explain this disparity include selection for metabolic efficiency and elevated rates of deletion in microbes, but the evidence in sup- port of these hypotheses is at best equivocal. An alternative expla- nation based on fundamental population-genetic principles is pro- posed here. By increasing the mutational target sizes of associated genes, most forms of nonfunctional DNA are opposed by weak selec- tion. Free-living microbial species have elevated effective population sizes, and the consequent reduction in the power of random genetic drift appears to be sufficient to enable natural selection to inhibit the accumulation of excess DNA. This hypothesis provides a po- tentially unifying explanation for the continuity in genomic scaling from prokaryotes to multicellular eukaryotes, the divergent patterns of mitochondrial evolution in animals and land plants, and various aspects of genomic modification in microbial endosymbionts. 327 Annu. Rev. Microbiol. 2006.60:327-349. Downloaded from arjournals.annualreviews.org by INDIANA UNIVERSITY - Bloomington on 08/31/07. For personal use only.

Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

Embed Size (px)

Citation preview

Page 1: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

Streamlining andSimplification of MicrobialGenome ArchitectureMichael LynchDepartment of Biology, Indiana University, Bloomington, Indiana 47405;email: [email protected]

Annu. Rev. Microbiol. 2006. 60:327–49

The Annual Review of Microbiology is onlineat micro.annualreviews.org

This article’s doi:10.1146/annurev.micro.60.080805.142300

Copyright c© 2006 by Annual Reviews.All rights reserved

0066-4227/06/1013-0327$20.00

Key Words

genome evolution, genomic streamlining, mutation, prokaryotes,random genetic drift, recombination

AbstractThe genomes of unicellular species, particularly prokaryotes, aregreatly reduced in size and simplified in terms of gene structurerelative to those of multicellular eukaryotes. Arguments proposedto explain this disparity include selection for metabolic efficiencyand elevated rates of deletion in microbes, but the evidence in sup-port of these hypotheses is at best equivocal. An alternative expla-nation based on fundamental population-genetic principles is pro-posed here. By increasing the mutational target sizes of associatedgenes, most forms of nonfunctional DNA are opposed by weak selec-tion. Free-living microbial species have elevated effective populationsizes, and the consequent reduction in the power of random geneticdrift appears to be sufficient to enable natural selection to inhibitthe accumulation of excess DNA. This hypothesis provides a po-tentially unifying explanation for the continuity in genomic scalingfrom prokaryotes to multicellular eukaryotes, the divergent patternsof mitochondrial evolution in animals and land plants, and variousaspects of genomic modification in microbial endosymbionts.

327

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 2: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

Contents

INTRODUCTION. . . . . . . . . . . . . . . . . 328THE SCALING OF GENOME

CONTENT AND GENOMESIZE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

THE EVOLUTIONARYMAINTENANCE OFCOMPACT GENOMES. . . . . . . . . 329Directional Mutation Pressure . . . . 329The Selfish-DNA Hypothesis . . . . . 332Bulk DNA as a Phenotypic

Modulator . . . . . . . . . . . . . . . . . . . . 332The Mutational-Burden

Hypothesis . . . . . . . . . . . . . . . . . . . . 333THE POPULATION-GENETIC

ENVIRONMENT OFMICROBIAL SPECIES . . . . . . . . . . 334

CONSEQUENCES OFEFFICIENT REMOVALOF MUTATIONALLYHAZARDOUS DNA. . . . . . . . . . . . . 337The Immunity of Microbial

Species to Intron andMobile-Element Proliferation. . 337

The Demise of Operons and theEmergence of Modular GeneStructure in Eukaryotes . . . . . . . . 338

INSIGHTS FROMENDOSYMBIONTS . . . . . . . . . . . . 339Mitochondrial Genomes . . . . . . . . . . 339Arthropod Endosymbionts . . . . . . . . 341

INTRODUCTION

Although it has long been known that uni-cellular species generally have much smallergenomes than multicellular species, the un-derlying reasons for these differences haveonly recently become clear. From the com-pletely sequenced genomes of ∼300 prokary-otes and several dozen eukaryotes, we nowknow that genome-size expansion in multi-cellular eukaryotes is largely a consequenceof noncoding-DNA proliferation (86, 87). Itis commonly assumed that genomic expan-

sion in eukaryotes is a reflection of natural se-lection promoting various phenotypic effects,and that genomic streamlining in microbes isa consequence of selection for metabolic ef-ficiency. However, there is yet no direct ev-idence that adaptive processes are sufficientto explain genome architectural diversity, andthere are compelling reasons to think that theyare not.

If expansions in genome size have adap-tive roots, given that unicellular species haveenormous population sizes by multicellularstandards and hence a higher population-levelpotential for adaptive mutation, why are thegenomes of such species kept at such diminu-tive sizes? If there is a metabolic advantage tosmall genome size, why are large multicellu-lar species relatively immune to such costs?Given the exceptional requirements for gene-expression coordination in developmentallycomplex organisms, why have nearly all multi-cellular eukaryotes abandoned the use of oper-ons? And if there is a selective advantage tobulk DNA, why are most increases in genomesize accomplished largely by expansions ofmobile elements and introns, both of whichare mutational, transcriptional, and metabolicburdens? Although the accumulation of fur-ther data from poorly studied groups willundoubtedly promote refinements in our hy-potheses on these matters, the candidatemechanisms responsible for the phylogeneticdiversification of genome architecture mustreside in the universal population-geneticprocesses that drive evolution—mutation, re-combination, random genetic drift, and natu-ral selection.

This review (a) summarizes the existinginformation bearing on the issues outlinedabove, (b) highlights the limitations of adap-tive explanations for genome-size variation,and (c) demonstrates why a mature field ofevolutionary genomics will ultimately requirean integration of population-genetic princi-ples. Comparative genomics is a wide-rangingfield, and it is impossible to cover the fullmenu of its target subjects here. The goalinstead is to review a subset of observations

328 Lynch

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 3: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

bearing on the development of a unifying the-ory for the structural features of prokaryoticand eukaryotic genomes. Microbial genomesoccupy an extraordinarily broad range ofpopulation-genetic environments and hencetake center stage in such considerations.

THE SCALING OF GENOMECONTENT AND GENOME SIZE

Across all cellular life, including DNA viruses,there is a general increase in both coding andnoncoding DNA with total genome size, butthe proportional contributions of these com-ponents behave differently (Figure 1). Withfew exceptions, viral and prokaryotic genomesconsist of >85% coding DNA, and a sim-ilar situation is found in the smallest uni-cellular eukaryotic genomes. In contrast, thelargest well-characterized genomes of unicel-lular eukaryotes contain up to 60% noncodingDNA, with a particularly dramatic increasein this contribution occurring with genomes>10 Mb in size. For the largest genomes ofvertebrates and land plants, the amount ofcoding DNA plateaus at ∼100 Mb, leadingto a progressive decline in the fractional con-tribution of this source to <5% of the totalgenome. Both intronic and intergenic DNAcontribute to eukaryotic genome growth. Acentral challenge for evolutionary genomicsis to determine the extent to which the expan-sion of eukaryotic gene and genomic complex-ity was a necessary prerequisite or an indirectconsequence of the evolution of complex mor-phologies (86, 87).

Two general conclusions emerge from theenormous phylogenetic breadth of the pat-terns in Figure 1. First, the common assertionthat there is essentially no correlation betweengenome size and organismal complexity (16,57, 122) is a misleading consequence of focus-ing on extreme outliers rather than measuresof central tendency. Although there is consid-erable variation in genomic features amongspecies with similar levels of cellular and or-ganismal complexity, there is a clear rankingfrom viruses to prokaryotes to unicellular eu-

karyotes to multicellular eukaryotes in termsof genome size, gene number, mobile-elementnumber, intron number and size, size of in-tergenic spacer DNA, and complexity of reg-ulatory regions (86, 87). Second, there are noabrupt transitions in the scaling of genomecontent with genome size across functionallydivergent groups of organisms. The transi-tions between viruses, prokaryotes, unicellu-lar eukaryotes, and multicellular eukaryotesare all continuous and overlapping. This sug-gests that the primary forces influencing theevolution of genomic architecture are un-likely to be direct consequences of organis-mal differences in lifestyles, cell structures, orphysiologies.

THE EVOLUTIONARYMAINTENANCE OF COMPACTGENOMES

Genome-size expansion and contraction is afunction of the distribution of insertion anddeletion sizes produced by mutation and thesubsequent differential filter imposed by natu-ral selection. If natural selection were entirelyineffective, genome size would progressivelyexpand if the rate of insertion exceeded thatof deletion and would progressively decline toa minimum level compatible with the main-tenance of gene function if deletions werenumerically dominant. If, however, virtuallyevery insertion and deletion were subject tonatural selection, the size of a genome (andthe architecture of each gene within it) wouldevolve toward an optimal state for organismalfitness. Thus, one of the keys to understand-ing genome-size evolution resides in the spec-trum of changes produced by mutation.

Directional Mutation Pressure

In many species, mobile-genetic element ac-tivity encourages genome expansion. Trans-posons and retrotransposons are capable ofproliferating to new genomic sites at rates>10−5 per element per generation (48, 93,99) and only rarely are lost by spontaneous

www.annualreviews.org • Microbial Genome Architecture 329

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 4: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

Figure 1The scaling of genome content with haploid genome size. The diagonal lines plot points of equalproportional contribution to the total genome, and values plotted as 10−5 are actually equal to zero. Ineach case, a data point denotes an individual genome, except in the case of viruses, for which averages arepresented for each major family. Most data for prokaryotes and eukaryotes are from Lynch (86), althoughthe prokaryotic intron data are taken from the Zimmerly Group-II intron database(http://www.fp.ucalgary.ca/group2introns/ ). Data for double-stranded DNA viruses are deriveddirectly from the NCBI database, and in this case the intronic-DNA category excludes intron materialthat is deployed as coding DNA in alternative transcripts.

excision. Retrotransposon activity can alsoresult in the insertion of pseudogenes (dead-on-arrival copies of otherwise normally func-tioning genes) either via accidental incorpo-ration of downstream sequence into element

transcripts (95) or via reverse transcriptionand genomic integration of normal mRNAs(38). With these kinds of large-scale (generally> 1 kb) insertions arising on a stochastic basis,the prevention of runaway genome expansion

330 Lynch

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 5: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

requires the direct opposition of the fixationof insertions by natural selection and/or addi-tional mutational mechanisms for subsequentdeletion. Some forms of cellular activity, suchas the repair of double-strand breaks by non-homologous end joining, may bias the muta-tional spectrum in the direction of deletions(29, 107), but numerous other pathways leadto small-scale insertions (110, 111).

Comparative surveys of pseudogenes havebeen used to infer the relative rates and sizesof small (<500 bp) insertions and deletions.In animals, inferred deletions outnumber in-sertions by factors of two to six, with the aver-age sizes of each being approximately equal (9,66, 102, 105, 127, 129). A substantially higherrate of deletion than insertion has also beenobserved in pseudogenes in prokaryotes (94).Taken at face value, such observations suggestthat, with no assistance from selection, small-scale deletions might be sufficient to preventrunaway genome expansion. However, this isthe case only if the net erosion of DNA bysmall-scale deletion exceeds the rate of ex-pansion by large-scale events, and even in thiscase, an equilibrium level of noncoding DNAwould be defined by the relative intensities ofthese two activities.

One concern with these indirect studiesinvolves the assumption that observations onpseudogenes provide an unbiased perspectiveof the mutation process. Any insertion-associated disadvantages and/or deletion-associated benefits will tilt the observed spec-trum of effects toward deletions relative tothe input by mutation (20). As discussed be-low, the energetic effects of small insertionsor deletions on a pseudogene may be sosmall to be immune to selection, but the pri-mary burden of excess DNA need not involvemetabolic effects but rather the potential forcausing gene-expression problems. Even if in-ert spacer DNA is immune to selection againstloss-of-function mutations, this need not im-ply an absence of selection against gain-of-function mutations. For example, noncod-ing regions are known to be depauperate inshort motifs with potential for generating in-

appropriate transcription-factor binding (59),microRNA hybridization (40), and translationinitiation (90, 112), and sequences containedwithin defective mobile elements can influ-ence the regulation of adjacent genes (71, 84,117, 119). That the majority of eukaryoticgenomic DNA may be transcribed (18, 67),at least at low levels, raises the question ofwhether any segment of nonfunctional DNAis truly neutral.

Are such potential selective effects likelyto be large enough to significantly bias ourview of the insertion or deletion process? Thetentative answer is yes, given that more di-rect estimates of the mutational spectrum (lessbiased by selection) often reveal an excess ofinsertions. For example, a massive sequenceanalysis of long-term mutation-accumulationlines of the nematode Caenorhabditis eleganstaken through individual bottlenecks eachgeneration revealed a 15:4 ratio of inser-tions to deletions (none of which were as-sociated with mobile-element activity) (33).In Drosophila melanogaster, spontaneous in-sertions <4 kb in length are approximatelyfour times more abundant than deletions(128), and reporter-construct experiments inthe yeast Saccharomyces cerevisiae suggest asimilar insertion and deletion disparity (60,72, 101). These direct assays imply an in-nate mutational tendency for genome-size ex-pansion above and beyond that caused bymobile-element activity and segmental dupli-cations. Such observations may not be univer-sal, however, as reporter-construct studies inEscherichia coli suggest a slight deletion bias(115, 116), as does a survey of de novo dom-inant mutations for human genetic disorders(70), although these studies will be subject tobias if deletions and insertions are not equallylikely to produce the phenotype used to inferthe presence of a mutation. Despite the needfor more data, these observations shed sig-nificant doubt on the notion that mutationaldeletion processes are universally sufficient toprevent runaway genome size. If this view iscorrect, then some form of natural selectionis necessary for genome-size stabilization.

www.annualreviews.org • Microbial Genome Architecture 331

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 6: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

The Selfish-DNA Hypothesis

Soon after the ubiquity of mobile-genetic ele-ments became apparent, Doolittle & Sapienza(35) and Orgel & Crick (103) suggested thatnoncoding DNA is largely a byproduct of“selfish” elements proliferating within hostgenomes until opposed by natural selection.However, although the verbal logic under-lying this idea appears to be formally cor-rect with respect to transposons and retro-transposons (19), the selfish-DNA hypothesisfails to explain the large fraction of noncodingDNA that is incapable of self-replication (in-cluding spliceosomal introns and small repet-itive DNAs), nor does it explain why nearlyall types of excess DNA mutually expand insome genomes (e.g., multicellular eukaryotes)but not in others.

It is commonly argued that the compactgenomes of microbial species is a consequenceof selection for high replication rates, result-ing in the elimination of all nonessential DNA(17, 35, 54, 103, 108, 113). The implicit as-sumption here is that the cost of maintain-ing and replicating a DNA segment of a fewbase pairs (the typical size of an intergenicinsertion or deletion) is significant enoughto be perceived by natural selection. As dis-cussed below, the large effective populationsizes of most unicellular species enhance theefficiency of natural selection, so this possibil-ity cannot be ruled out entirely. However, itis unclear whether the rate of cell replicationis ever limited by DNA metabolism.

First, within and among prokaryoticspecies, there is no correlation between celldivision rate and genome size (6, 94). Sec-ond, during rapid-growth phases, prokaryoticchromosomes are often present in a nestedseries of replication stages (15), and somespecies harbor tens to hundreds of chromo-somal copies at various stages of the lifecycle (69, 92). Third, in E. coli and other eu-bacteria, DNA replication forks proceed ap-proximately 10 to 20 times faster than mRNAelongation rates, and both processes occur si-multaneously (11, 27, 45). Finally, energetic

arguments suggest that cell replication is lim-ited by factors other than DNA metabolism.Even after accounting for genomic multiplic-ity, DNA constitutes ∼2% to 5% of the totaldry weight of a typical prokaryotic cell (26,27), and the estimated cost of genomic repli-cation relative to a cell’s entire energy bud-get is even smaller (64). Similar conclusionsemerge for eukaryotic cells (114).

Bulk DNA as a PhenotypicModulator

The selfish-DNA hypothesis represented asignificant break with the common (and un-justified) tradition of attempting to explain allpatterns of biodiversity in terms of adaptiveprocesses. Indeed, prior to the development ofthis hypothesis, a number of authors had sug-gested that bulk DNA in eukaryotes is selec-tively promoted for reasons entirely unasso-ciated with gene content (5, 16, 23). Drawingfrom an impressive number of correlations be-tween nuclear volume, cell size, and cell divi-sion rates, Cavalier-Smith (16) suggested thatthe evolution of cell size imposes secondaryselection on nuclear genome size, which inturn modulates the area of the nuclear en-velope and hence the flow of transcripts tothe cytoplasm. Noting that cell division ratesare related to life-history features such as de-velopmental rate and size at maturity, it wasa small step to conclude that selection onfitness-related traits encourages the expan-sion and contraction of noncoding DNA, justas it drives advantageous nucleotide substitu-tions in coding regions. Because the postu-lated effects of bulk DNA are irrelevant tospecies without a nuclear envelope, with itsadherence to adaptive arguments, the bulk-DNA hypothesis again invokes metabolic ef-ficiency arguments to explain the streamlinedgenomes of prokaryotes (17).

A major difficulty with the bulk-DNAhypothesis became clear with the annota-tion of fully sequenced genomes. Invariably,a substantial fraction of the growth of large

332 Lynch

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 7: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

genomes is derived from the proliferation ofmobile-genetic elements (87). Given the con-siderable mutational burden imposed by suchactivity (48, 93, 99), this form of genomicgrowth is more likely to be opposed than en-couraged by positive selection at the level ofthe host genome (19).

A second concern with the bulk-DNAhypothesis involves the matter of causal-ity. The known mechanisms for controllingcell growth include modifications of gene-copy number, ribosome number, transcrip-tion rates, nuclear membrane porosity andexport rates, and transcript longevity. Thebulk-DNA hypothesis provides no explana-tion for why natural selection on life-historyfeatures would be forced to accomplish cellmetabolic changes via risky expansions orcontractions of noncoding DNA rather thanthrough conventional allelic changes in geneexpression and/or function. An alternativepossibility is that there is no direct causal con-nection between genome size and cell fea-tures, but rather that genome-size expansionis largely a passive response to life-history fea-tures that promote population-genetic prop-erties that reduce the ability of naturalselection to eradicate excess DNA (86, 87).

A key point to be detailed below is thatthe long-term effective size of a populationis a fundamental determinant of the possiblepathways of genomic evolution in a lineage.Suppose that natural selection favors an in-crease in cell or body size in a particular lin-eage. In general, a doubling in organismal sizeresults in a ∼50% reduction in the number ofindividuals within a species (below), and byincreasing the power of random genetic drift,this reduces the efficiency of natural selec-tion, thereby facilitating the accumulation ofmildly deleterious insertions. Thus, genomesize is expected to expand in larger organisms,not necessarily because of an intrinsic insen-sitivity to excess DNA (as postulated by theselfish-DNA hypothesis) or because of a di-rect advantage of such DNA (as postulated bythe bulk-DNA hypothesis), but because of areduced ability to eradicate it.

The Mutational-Burden Hypothesis

An alternative explanation for the phyloge-netic diversity of gene and genome architec-ture, with broad explanatory power, derivesfrom fundamental principles of populationgenetics and the known constraints on genefunction. As noted above, nearly all forms ofexcess DNA are a mutational burden, evenwhen they have no direct functional signif-icance. For example, genes carrying intronsmust retain specific sequences for properspliceosomal recognition of splice sites (85).Upstream regions of genes that are tran-scribed but not translated (5′ UTRs) providesubstrate for the mutational origin of pre-mature translation-initiation codons, whichcan result in defective frameshifted products(90). Complex, modular regulatory regionsincrease a gene’s mutational target size rel-ative to simpler, overlapping regulatory ele-ments (44), and nonfunctional DNA can alsoincrease the rate of origin of harmful regula-tory mutations (40, 59).

Under the hypothesis that the primaryburden of excess DNA is its mutational li-ability, minimization of genome size (withinthe constraints dictated by the maintenanceof gene function) is expected to be uni-formly favorable (86). However, the prevail-ing population-genetic environment will de-termine which lineages are most capable ofpurging excess DNA by natural selection.Before considering a semiformal explanationfor this hypothesis, two additional points areworth making. First, although mutation andselection are often viewed as independent pro-cesses, variation in the degenerative mutationrate among alleles is equivalent to a selectiveprocess, as the descendants of some alleles areremoved from the population more rapidlythan others. Second, although random geneticdrift is often viewed as evolutionary noise thatsimply renders the outcome of natural se-lection less predictable, this is a gross over-simplification of the situation at the genomiclevel, where the degree of drift predictably de-fines the types of pathways that are open toevolution.

www.annualreviews.org • Microbial Genome Architecture 333

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 8: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

Imagine a newly arisen mutation that re-duces the fitness of its carriers by the frac-tion s (the selection coefficient). In an infinitepopulation, such a mutation would virtuallyalways be eliminated by natural selection, butfinite populations experience random tempo-ral fluctuations in allele frequency owing tothe chance variation in gamete or zygote sam-pling. The magnitude of such fluctuations isdetermined by the effective number of genesresiding at a locus at the time of reproduc-tion (Ng), which is equivalent to the effectivenumber of individuals in a haploid populationand roughly twice that in a randomly mat-ing diploid population. The stochastic com-ponent of allele-frequency change is definedby 1/Ng, and natural selection will be effec-tive at eliminating a deleterious allele only ifthe directional change dictated by s exceedsthis value. If 1/Ng is substantially larger thans, a mutant allele will behave as though it wereentirely neutral, as its dynamics are almost en-tirely dictated by chance events.

In the context of long-term genome evo-lution, the most relevant consideration is theprobability that a new mutation will eventu-ally displace the descendants of all other genecopies in the population. For a completelyneutral mutation, this fixation probability issimply equal to the initial frequency, as allgene copies have equal chances of stochasticspread throughout the population. However,relative to the neutral expectation, the prob-ability of fixation of a deleterious mutation isθ f ≈ 2Ngs/(e 2Ngs − 1) (68, 86). For sufficientlysmall 2Ngs, θ f approaches the neutral expec-tation of 1.0 (e.g., θ f ≈ 0.951 when 2Ngs =0.1, and 0.995 when 2Ngs = 0.01), whereasθ f rapidly approaches zero with large 2Ngs(θ f ≈ 0.582 when 2Ngs = 1, and 0.00045when 2Ngs = 10). Thus, 2Ngs � 1 (or equiv-alently 2s � 1/Ng) implies conditions of ef-fective neutrality, whereas if 2Ngs � 1, a dele-terious mutation has a negligible chance ofdrifting to fixation. A key to understandingphylogenetic variation in genomic architec-ture resides in the degree to which 2Ngs variesamong lineages.

Taken at face value, this simple theoreti-cal principle would appear to introduce insur-mountable practical difficulties, as the tech-nical barriers to obtaining direct estimatesof both Ng and s are enormous. However,progress can be made by recalling the mu-tational basis of the selective disadvantage ofexcess DNA. Suppose that the addition of asegment of DNA within or around an es-sential gene imposes the requirement that nnucleotides must remain unaltered to main-tain gene function, e.g., the requirements forspliceosomal-intron recognition imply n ≈ 30(85). If each of the n hazardous sites mutates atrate u (per generation), a piece of excess DNAwill elevate the rate of production of defectivealleles by an amount nu. Thus, substituting nufor s, the criterion for effective neutrality of ahazardous DNA segment becomes 2Ngnu �1 (86). Rearranging, a new DNA insertionthat imposes n hazardous sites at a locus willbe capable of drifting to fixation in an effec-tively neutral fashion provided 2Ngu � 1/n,whereas lineages with 2Ngu � 1/n are essen-tially immune to such colonization.

This simple relationship has considerableutility because the composite quantity 2Nguis readily estimated by measurements of nu-cleotide diversity at silent sites in natural pop-ulations (below). Moreover, the quantity 2Nguhas a simple mechanistic interpretation, as itis equivalent to twice the ratio of the forcesof mutation (u) and genetic drift (1/Ng) oper-ating at a nucleotide position. The remainderof this paper explores whether 2Ngu in mi-crobes is sufficiently large to insure a path to-ward genomic streamlining as postulated bythe mutational-burden hypothesis.

THE POPULATION-GENETICENVIRONMENT OF MICROBIALSPECIES

A major gap in our understanding of molec-ular genetics concerns the rate at which mu-tations arise. Most estimates for microbes arederived from laboratory reporter constructs,whereas most estimates for multicellular

334 Lynch

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 9: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

species are derived from phylogenetic com-parisons of silent sites in protein-codinggenes. Neither approach is without prob-lems: the first relying on various assump-tions regarding mutational detectability, andthe second on assumptions regarding neutral-ity. Nevertheless, with these caveats in mind,some qualitative generalities have begun toemerge for u (Table 1). On a per-cell divi-sion basis, prokaryotes have very low muta-tion rates, approximately three times lowerthan those for unicellular eukaryotes. Theper-generation mutation rates of multicellulareukaryotes are approximately 10 to 50 timesgreater than those for unicellular eukaryotes, adegree that correlates with the number of celldivisions per generation in the former (86).The rates for DNA viruses are enormous, ap-proximately 10 to 100 times greater than thosefor prokaryotes.

Put in the context of the previous theory,these estimates provide some indication of thepopulation-size constraints on genomic evo-lution. For example, noting that the thresh-old Ng below which hazardous excess DNA islikely to accumulate is ∼1/(2nu), for a mod-erately sized insert with n = 10, the thresholdNg is ∼106 for viruses and multicellular eu-karyotes, ∼107 for unicellular eukaryotes, and∼108 for prokaryotes. Because 108 is a minis-cule population size for a unicellular species(41–43), if taken at face value, these bench-marks might suggest that few microbes shouldever be on a road to genomic obesity. How-ever, Ng is not simply the absolute number ofindividuals within a species, but rather reflectsthe genetic effective population size. The lat-ter is influenced by numerous demographicand ecological factors as well as by the mag-nitude of linkage within a genome, which to-gether lead to a pronounced reduction of Ng

below the numerical abundance of a species(12, 21, 52).

Although there are well-established meth-ods for estimating Ng from observationson pedigree structure, life-history features,and/or temporal fluctuations in allele fre-quencies, none of these approaches incorpo-

Table 1 Mutation rate estimates

Mean (SE) Range n

Mammalian virus 12.41 1Bacteriophage 37.35 (18.63) 14.76–74.30 3Prokaryotes 0.50 (0.17) 0.06–1.44 8Unicellular eukaryotes, nu 1.60 (0.41) 0.83–2.52 4Invertebrates, nu 12.62 (3.17) 9.45–15.78 2Mammals, nu 53.64 (23.88) 23.23–100.75 3Land plant, nu 7.30Vertebrate, mt 193.04 (57.67) 8.10–648.00 14Land plant, mt 0.91 (0.33) 0.34–1.50 3

Rates are given for base substitution mutations, in units of 10−9 pernucleotide site per generation, where a generation is equivalent to asingle-genome replication in the case of microbes and to a completegeneration for multicellular species (including organelles). The sample size(n) denotes the number of species for which data are available. All estimatesfor nonmulticellular species are derived from laboratory experiments withreporter constructs. All but one estimate (a direct sequence-based estimatefrom a mutation-accumulation experiment) for animal nuclear genomes arederived from phylogenetic comparisons of silent sites in protein-codinggenes. The land-plant estimate for the nuclear genome is based on a broadphylogenetic comparison, assumes an annual life cycle, and should bemultiplied by x for species with generation times of x years. Estimates fororganelles are obtained from phylogenetic comparisons. Data fordouble-stranded DNA viruses are taken from Drake (36) and Drake &Hwang (37), while those for the remaining taxa are derived fromcompilations in Lynch (86) and Lynch et al. (89). nu, nuclear; mt,mitochondrion.

rates the long-term effects of linkage. Thematter is important because nucleotide posi-tions that are tightly linked to other selectedsites are subject to reduction in Ng via Hill-Robertson (62) effects. This is because selec-tion reduces the effective number of segmentsof a particular chromosomal region transmit-ted to the next generation, decreasing theefficiency of selection throughout the entireregion (52). In principle, a large microbialpopulation could have a small Ng if selectivesweeps during prolonged periods of asexualreproduction periodically purged most lin-eages from the population.

The only known way to estimate a speciesNg relevant to long-term genomic evolution isto evaluate the amount of standing variation atnucleotide sites assumed to be neutral. Withnew divergence between sites in two randomalleles arising at rate 2u per generation, and

www.annualreviews.org • Microbial Genome Architecture 335

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 10: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

old heterozygosity being lost at rate 1/Ng,the average divergence at neutral sites is 2Nguat mutation-drift equilibrium. The traditionalestimator of this quantity is the average num-ber of substitutions separating the silent sitesof randomly sequenced protein-coding genes(π s), with the most notable studies of this sortin microbiology being MLST (multi-locus se-quence typing) surveys of global isolates ofprokaryotes and fungi (24, 51). On average, πs

is ∼0.104 in prokaryotic species, and ∼0.057and ∼0.014 in the nuclear genes of unicellu-lar and multicellular eukaryotes, respectively(Table 2). Substantial variation among esti-mates exists within each of these groups as aconsequence of sampling error, but the rank-ing of average values is highly significant, andbecause most species can be expected to fluc-tuate in density over long timescales, thesevalues are much more meaningful than in-dividual estimates. There are, nevertheless,a number of caveats with respect to thesedata.

First, because the mean time to fixationof a neutral mutation is 2Ng generations,for constant u, larger 2Ngu estimates inte-grate the series of events within a lineageover longer time spans. Second, a substan-tial fraction of 2Ngu estimates for microbial

Table 2 Estimates of 2Ngu

Mean (SE)Standarddeviation n

Prokaryotes 0.1044 (0.0161) 0.1113 48Unicellular eukaryotes, nu 0.0573 (0.0150) 0.0777 27Animals, nu 0.0134 (0.0025) 0.0149 36Land plants, nu 0.0152 (0.0027) 0.0134 24Unicellular eukaryotes, mt 0.0175 (0.0072) 0.0177 6Animals, mt 0.0345 (0.0036) 0.0288 66Land plants, mt <0.0004 (0.0004) 0.0008 4

All estimates are of intraspecific diversities for silent sites ofprotein-coding genes, taken from compilations in Lynch (86) and Lynch et al.(89). The unicellular eukaryote category includes some taxa with a smallnumber of cell types (e.g., filamentous fungi and slime molds). The sample size(n) denotes the number of genera in the analysis, with the estimate for eachgenus being an average of several species in some cases. nu, nuclear; mt,mitochondrion.

species (prokaryotic and eukaryotic) are de-rived from pathogens, the population struc-ture of which must be largely dictated bytheir substantially less abundant multicellularhosts. As a consequence, pathogens have re-duced levels of variation relative to free-livingspecies (86). Third, numerous difficulties ex-ist in identifying species boundaries of mi-crobial species. Although microbial sequencesurveys often subdivide species into clades,and these were generally adhered to with theestimates leading to the results in Table 1,such distinctions are often no more defen-sible than drawing phylogenetic boundariesbetween populations of multicellular speciesdiffering in local adaptations. Fourth, the in-terpretation of silent-site diversity as an es-timate of 2Ngu critically depends on the as-sumption of neutrality of synonymous sitesof protein-coding genes. This is a signif-icant concern because the composition ofsilent sites of mRNAs may be under selectionfor features influencing transcript processingand/or translation efficiency. Any downwardbias in estimates of 2Ngu imposed by such se-lection may be greatest in populations withlarge Ng, and the fact that prokaryotic diver-gence rates of silent sites can be 5 to 10 timeslower than actual mutation rates may reflectsuch an effect (81, 100).

Finally, although prokaryotes do not en-gage in conventional meiotic recombination,which could greatly magnify their vulnerabil-ity to selective sweeps, such species do engagein a number of parasexual processes, includ-ing conjugation, transformation, and trans-duction. The net effects of such activities willdetermine whether prokaryotes are more vul-nerable to hitchhiking effects than multicel-lular eukaryotes. Insight into this matter canbe gained from the levels of linkage disequilib-rium among silent sites in natural populations,which provide an estimate of 2Ngc, where c isthe rate of recombination per site per gener-ation. Dividing such a measure by 2Ngu pro-vides an estimate of c/u, the recombinationrate scaled to the mutation rate, a parameterthat dictates the vulnerability of a species to

336 Lynch

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 11: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

selective sweeps. The average estimates of thisquantity in animals and land plants, 2.3 (0.9)and 1.2 (0.5), respectively, are comparable tothe average for the five prokaryotic specieswith available data, 4.3 (1.6) (86). Thus, al-though there is considerable room for morework in this area, there appears to be no for-mal justification for the idea that prokaryoticspecies are unusually vulnerable to hitchhik-ing effects.

Taking all of these issues into considera-tion, it is difficult to escape the conclusionthat the average 2Ngu for prokaryotes is atleast 10 times that in multicellular eukaryotes,despite the relatively low prokaryotic muta-tion rate. It is notable, however, that even if2Ngu in prokaryotes is 10 times greater thanthe estimates in Table 2 suggest, the aver-age prokaryotic Ng would still be only ∼1010.As the total numbers of individuals withinspecies increase nearly 20 orders of magnitudefrom large vertebrates to miniscule prokary-otes, this finding demonstrates that linkageeffects are much more important than demo-graphic effects in defining long-term geneticeffective population sizes.

CONSEQUENCES OFEFFICIENT REMOVALOF MUTATIONALLYHAZARDOUS DNA

In an earlier review on the hazardous na-ture of insertional DNA, Lawrence et al.(78) suggested that prokaryotes have evolvedmutational processes biased toward deletionsto prevent such accumulations. However, asnoted above, there is yet no convincing evi-dence for deletion bias at the level of mutationin any organism, and were such a bias to ex-ist, it would impose a substantial challenge tothe maintenance of essential genes in a com-pact genome. The argument promoted here,with two examples, is that mutational dele-tion bias need not be invoked at all to explainthe sanitized genomes of microbial species, asNg in most such species is large enough fornatural selection to inhibit the accumulation

of excess DNA. Moreover, purifying selectionin prokaryotes appears to be so efficient thatevolutionary pathways that are unavoidablein multicellular eukaryotes are essentially un-available for exploration. Contrary to the ar-gument that the relative constancy in genomearchitecture among free-living prokaryotes isa deep mystery (58), this pattern is entirelycompatible with population-genetic theory.

The Immunity of Microbial Speciesto Intron and Mobile-ElementProliferation

Soon after the discovery that eukaryotic genesare often subdivided into pieces, it was pro-posed that introns contributed to the forma-tion of new proteins by promoting the recom-binational shuffling of modular domains intocombinations with novel functions (30, 34,50). However, although it is now clear thatevery well-studied eukaryotic genome con-tains introns as well as the spliceosomal ma-chinery for removing them from precursormRNAs (88), it is equally clear that allprokaryotic genomes are devoid of them.There is no direct evidence that prokaryotesever harbored spliceosomal introns, and thetheory presented above suggests why.

Because the known molecular require-ments for spliceosomal processing imply thereservation of n ≈ 20 to 40 nucleotide sitesfor the precise recognition of each intron(85, 86), the approximate condition for themaintenance of an intron is 2Ngu � 1/20.Thus, because the average estimate of 2Ngufor prokaryotes, π s = 0.10, is probably down-wardly biased, natural selection appears tobe efficient enough to effectively immunizeprokaryotic genomes from intron coloniza-tion. In contrast, average π s for unicellulareukaryotes (0.06) is close to the thresholdfor spliceosomal intron colonization, and suchspecies exhibit a wide range of variation in in-tron numbers, with some approaching zero(Figure 1). Finally, with average π s ≈ 0.01,multicellular species provide a highly permis-sive environment for intron colonization, and

www.annualreviews.org • Microbial Genome Architecture 337

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 12: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

all such species average four to seven intronsper protein-coding gene (86, 87).

Some prokaryotic genomes do containGroup-II “self-splicing” introns, which, in-stead of relying on an extrinsic spliceosome,moderate their own excision by use of sec-ondary RNA structures in combination withself-encoded multifunctional proteins (74).Because of the larger number of nucleotidesites involved in Group-II intron function,such elements are expected to impose a sub-stantially greater mutational burden on hostgenes than do spliceosomal introns and shouldbe easily purged from prokaryotic genomes.In fact, the few Group-II introns that do residein prokaryotes avoid this fate by inhabiting in-tergenic regions (in which case, they are nottechnically introns at all) or mobile elements(which may facilitate horizontal transfer) (28,109, 124, 130).

The phylogenetic distribution of mobileelements closely resembles that of introns. Al-though prokaryotic genomes can harbor var-ious forms of transposons (often referred toas insertion sequences), most of them con-tain 10 or fewer elements, which togetherusually comprise <0.1% of the host genome(91). Retrotransposons are entirely unknownin prokaryotes, although a poorly understoodtype of element is found at very low lev-els in some species (75). In eukaryotes, thereappears to be a threshold genome size of∼10 Mb below which mobile elements arenearly absent, an intermediate range in whichonly a fraction of species harbor them, andan upper threshold (∼100 Mb) above whichall species (entirely multicellular) are heavilyinfected with multiple element families (87).

This general pattern is readily understoodin terms of the demographic requirements forthe maintenance of a mobile-element family.Once inserted into a host genome, all mo-bile elements accumulate degenerative muta-tions in an effectively neutral manner, suchthat the long-term viability of an elementfamily within a host species requires that anaverage element produce at least one success-ful offspring insertion prior to complete inac-

tivation. Because mobile-element insertionshave a wide range of deleterious effects onthe host (8, 25, 63), many of them are elimi-nated by natural selection, the exact numberdepending on the host’s Ng. Insertions with ef-fects on host fitness >1/Ng are rapidly purged,whereas those with effects �1/Ng are free togo to fixation. Thus, for any mobile-elementfamily, there exists a critical host Ng abovewhich the maintenance of insert number isunsustainable.

The Demise of Operons and theEmergence of Modular GeneStructure in Eukaryotes

Coregulation of the expression of genes mutu-ally engaged in metabolic and developmentalprocesses is critical to all organisms. Prokary-otes often achieve this end by using operonsto cotranscribe adjacent genes, a process onlyrarely seen in eukaryotes. Nearly all genesin trypanosomes are transcribed into poly-cistronic units prior to trans splicing (13), butthe genomes of such species are remarkablyprokaryote-like in that they are nearly devoidof introns and mobile elements. Some genesof nematodes (10), flatworms (31), and nu-cleomorphs of chlorarachniophyte algae (53)are also arranged into operons, but the vastmajority of well-studied eukaryotes (includingfungi, plants, and other metazoans) appear tolack operons entirely. Given that the eukary-otic lifestyle does not impose an insurmount-able barrier to operon function and the po-tential advantages of operons for coordinatedgene expression, why have operons becomedisassembled in most eukaryotic genomes?

Following the logic developed above, acase can be made that the loss of eukary-otic operons was not a consequence of pos-itive selection for single-gene control, but apassive response to a reduction in the effi-ciency of selection for operon maintenance.Operons minimize the mutational vulnera-bility of genetic pathways by eliminating thenecessity of maintaining gene-specific regula-tory elements. However, because small local

338 Lynch

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 13: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

duplications are common and most transcrip-tion factor binding sites are simple enough(<10 bp in length) to arise at an apprecia-ble rate by single-base substitutions, all genesare vulnerable to the chance invasion of reg-ulatory elements capable of driving indepen-dent expression. By increasing the size of themutational target, the accumulation of gene-specific regulatory elements imposes a selec-tive disadvantage equivalent to ∼10u for eachregulatory element (assuming 10 nucleotidesper element). Thus, unless 2Ngu > 0.1, whichis rarely the case in eukaryotes, selection is in-capable of preventing the fixation of a mutantself-regulated allele. In contrast, prokaryoticpopulations, where 2Ngu > 0.1 is common,are resistant to the break up of operons bydrift-mutation processes.

Thus, direct selection for genomic stream-lining need not be responsible for the as-sembly and maintenance of operons, whichinstead may be simple consequences ofpopulation-genetic environments that dis-courage the emergence of single-gene con-trol by mutational processes. A similar ideahas been suggested by Price et al. (106), al-though these authors put more emphasis onpositive selection for the coordinated expres-sion of genes than on the mutational cost ofmaintaining redundant regulatory machineryfor individual genes.

This hypothesis is a radical departure fromthe selfish-operon model of Lawrence & Roth(79), which postulates that operons evolve tofacilitate the joint horizontal transfer of func-tionally integrated genes to alternative hostspecies. The selfish-operon model was moti-vated by the belief that (a) the metabolic ad-vantages of operons are too weak to maintainthem in the face of stochastic microchromo-somal rearrangements, and (b) the chance ori-gin of gene-specific promoters will eventuallyoverwhelm the benefits of coregulation. How-ever, the fact that functionally related genescan remain aggregated to some extent evenoutside operons, in both prokaryotes (76, 104)and eukaryotes (22, 80, 82, 83, 120), is incon-sistent with the first point, and the ability of

selection to oppose the mutational cost of in-dependent gene regulation in microbes is in-consistent with the second point.

INSIGHTS FROMENDOSYMBIONTS

Prokaryotes have entered stable coevolution-ary relationships with larger eukaryotic hostson numerous occasions. Such interactions candramatically alter the population-genetic en-vironment by changing the colonist’s trans-mission dynamics and/or mutational features.Thus, the genomes of endosymbionts pro-vide unique resources for evaluating the gen-eral principles outlined above. The degreeto which an endosymbiont’s mutation rateis altered depends on unpredictable gainsand/or losses of DNA-repair pathways experi-enced subsequent to colonization as well as onmodifications in environmental mutagenicityimposed by host cells, but in general en-dosymbionts are expected to experience sub-stantial effective population size reductions.Once such species have become so entrenchedin their host’s biology that transmission is en-tirely vertical (strictly between host parent andoffspring), Ng is expected to approximate thatof the host species (7).

Mitochondrial Genomes

One of the defining features of eukaryotesis the presence of a mitochondrion, appar-ently acquired in the stem eukaryote viathe colonization of an α-Proteobacteriumthat subsequently propagated throughout theentire eukaryotic domain (55, 56). Relativeto their eubacterial ancestors, all mitochon-drial genomes are extraordinarily degeneratewith respect to gene content. None containmore than 70 genes and most considerablyfewer. Although the scaling of mitochondrialgenome content with genome size is similarto that for nuclear and prokaryotic genomes,the placement of phylogenetic groups is dra-matically different (89). Most notably, in con-trast to the similar architectures of the nuclear

www.annualreviews.org • Microbial Genome Architecture 339

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 14: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

genomes of animals and land plants, the mito-chondrial genomes of these two lineages haveevolved down radically different pathways.Animal mitochondrial genomes are minis-cule and even more compact than those ofprokaryotes, ranging from 14 to 20 kb insize and consisting of 90% to 95% codingDNA with frequent overlaps between theinitiation and termination codons of adja-cent genes. In contrast, land-plant mitochon-drial genomes are ∼180 to 600 kb in sizewith >90% noncoding DNA. Most unicel-lular eukaryotes (including the green algae,from which land plants emerged) contain mi-tochondrial genomes with intermediate levelsof noncoding DNA, presumably reflecting theancestral state from which animals and landplants diverged.

If nothing else, these observations recon-firm the view that eubacterial ancestry im-poses few constraints on the architectural fea-tures of descendant lineages. But why is thegenomic content of animal and land-plant mi-tochondria different, whereas that of their nu-clear genomes is similar? Surveys of silent-site variation in mitochondrial genes providea clue, as the average 2Ngu for most majorlineages of animals is in the range of 0.01 to0.07, whereas that for land plants is gener-ally <0.0004 (Table 2). This disparity is dra-matically different from the situation in thenuclear genome, where the average 2Ngu forthese two groups is similar. Thus, by the cri-teria outlined above, these observations implythat the plant mitochondrion is an extraordi-narily hospitable habitat for mutationally haz-ardous noncoding DNA, whereas the animalmitochondrion imposes the opposite environ-ment.

Although the divergence in mitochondrial2Ngu between animals and land plants couldbe due to differences in Ng or u, mutationaldifferences appear to be the primary factor.Based on a broad survey of interspecific diver-gence at silent sites, the ratio of mitochondrialto nuclear mutation rates in most animals is es-timated to be ∼10 to 25, whereas that in landplants is ∼0.05, and that in unicellular species

is ∼1.0 (89). When measures of silent-site di-versity (π s) for both mitochondrial (m) andnuclear (n) genes within a species are availablealong with measures of silent-site divergencebetween species, the ratio of diversity to diver-gence, (2Ngmum/2Ngnun)/(um/un), provides anestimate of the ratio of Ng in the two ge-nomic compartments, which averages ∼1.3in animals and ∼0.5 in unicellular eukary-otes (89). Unfortunately, data are not avail-able for a comparable computation in landplants, but given that Ng for nuclear genesis comparable in plants and animals, thereis no compelling reason to think that land-plant mitochondria would have unusuallylow Ng.

Thus, even though organelle genomes arepresent in multiple copies per host cell, therapid loss of heteroplasmy within individualscauses the overall genealogical depth of themitochondrion to be similar to that for nu-clear genes of the host species (7). Because ufor plant mitochondria is comparable to thatfor prokaryotes (on a per-generation basis)(Table 1), were it not for differences in Ng,plant mitochondrial genomes might be ex-pected to evolve a prokaryote-like genomestructure, but the tremendous decrease(�250 times) in Ng in land plants causes asubstantial decline in the ability of selection toremove mutationally hazardous excess DNArelative to the situation in microbes.

With the differences in 2Ngu between an-imal and land-plant mitochondria in mind,a broad array of superficially disconnecteddifferences in the genomic features of thesetwo lineages can be accommodated by themutational-burden hypothesis (89). For ex-ample, although mitochondria do not harborspliceosomal introns, land-plant mitochon-dria typically contain 20 to 30 Group-II in-trons, substantially more than the 0 to 8 foundin green algal species. As noted above, such in-trons are expected to have stringent sequencerequirements for proper splicing, and the highlevel of average 2Ngu for animal mitochondriaappears to rule out the possibility of their res-idence. Consistent with this hypothesis, the

340 Lynch

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 15: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

only animal mitochondria that harbor intronsare contained within cnidarians, which appar-ently have low (plant-like) mitochondrial mu-tation rates.

Arthropod Endosymbionts

Although the mitochondrion represents themost spectacularly successful case of en-dosymbiosis, many more phylogenetically re-stricted endosymbionts are known, the eu-bacterial inhabitants of various insect lineageshaving received the most attention. Becausemost such species are strictly vertically inher-ited, like mitochondria, their Ng is expectedto be comparable to that of their hosts. Thisidea can be checked with data for the γ-Proteobacterium Buchnera aphidicola, an ob-ligate endosymbiont of aphids. The estimateof π s for B. aphidicola derived from a widegeographic sample of its host Uroleucon am-brosiae is low and not greatly different fromthat for the host’s mitochondrion, 0.0042 ver-sus 0.0036, whereas the ratio of silent-site di-vergence for B. aphidicola versus mitochon-dria in different host species is 0.60 (47, 61).Employing the approach described above,B. aphidicola Ng is ∼1.9 times that of the hostmitochondrion. Thus, for this species at least,Ng is indeed substantially reduced with re-spect to the average free-living prokaryote.However, given the exceptionally low π s forthe host species, it is possible that 2Ngu for theB. aphidicola–U. ambrosiae assemblage is not anaccurate reflection of the historical conditionsunder which this genome evolved, so the sur-vey of other such species would be useful.

All arthropod endosymbiont genomes ex-hibit a suite of modifications that likely reflectthe consequences of small Ng (97). For exam-ple, three sequenced B. aphidicola genomes,each from a different aphid lineage, havean extraordinarily high A:T content (∼74%),and comparative analyses suggest elevatedlong-term rates of molecular evolution andthe accumulation of mildly deleterious muta-tions in protein-coding and ribosomal-RNAgenes (4, 61, 73, 96, 118, 121, 125). Con-

taining just 550 to 630 protein-coding genes,B. aphidicola genomes are among the smallestknown in prokaryotes (77), and many hun-dreds of genes (including those involved inDNA repair) were lost early in the establish-ment of the lineage, presumably because ofthe reduced efficiency of selection for genemaintenance in a stable host cell environment.A similar syndrome of events is recorded inother related lineages, Blochmannia (an en-dosymbiont of ants) and Wigglesworthia (anendosymbiont of tsetse flies) (1, 14, 32, 46,49).

Noting that 2Ngu for B. aphidicola is ap-proximately 25 times smaller than the aver-age for other prokaryotes and approximately8 times smaller than the average for ani-mal mitochondria (Table 2), the genomes ofγ-proteobacterial endosymbionts might beexpected to evolve elevated levels of noncod-ing DNA (58), but they are often nearly de-void of mobile elements and contain approx-imately the same proportion of noncodingDNA as other prokaryotes (97). Although thegenomes in this group are less streamlinedthan animal mitochondrial genomes, neithercomparison should be taken at face value assupporting or conflicting with the mutational-burden hypothesis. Because relaxed selectionneed not always promote the evolution ofgenomic obesity, particularly in this uniquegroup of endosymbionts, the preceding obser-vations are by no means incompatible with theidea that patterns of microbial genome evolu-tion are inconsistent with basic population-genetic principles, contrary to the claims ofsome authors (58).

For example, species with a strictly clonalmode of inheritance impose an exception-ally unstable habitat for mobile elements.An overly aggressive element may cause theextinction of its host, whereas an overlyquiescent element eventually becomes com-pletely inactivated by mutations (2). Thisfiltering process will likely bias the surviv-ing pool of asexual host lineages toward themobile-element free state, a hypothesis con-sistent with the observation that prokaryotes

www.annualreviews.org • Microbial Genome Architecture 341

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 16: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

in early (facultative) stages of host associationsgenerally have exceptionally high levels ofmobile-element insertions (97). In addition,Wolbachia, a broadly distributed endosym-biont with a proclivity for horizontal transferand recombination, contains large numbers ofinsertion elements (3, 65). Furthermore, ge-nomic expansion in the face of relaxed selec-tion requires that the mutational spectrum bebiased toward insertions. Although the ideathat deletion mutations outnumber insertionswas questioned above, the matter remains tobe resolved for B. aphidicola and allies, whichhave the unique attribute of having lost con-ventional mismatch repair pathways. By pro-moting recombination between nonorthol-ogous sequences, defective mismatch repairenhances the deletion process (98).

Two final examples demonstrate thatgenome size reduction is by no means uni-versal in arthropod endosymbionts. First, So-dalis glossinidius, like Wigglesworthia species,is an endosymbiont of tsetse flies, but, un-like the latter, has only 51% coding DNA(123), by far the lowest value for any prokary-ote except for the intracellular parasite My-coplasma leprae (48%). The A/T content ofthis species (45%) is much higher than that inother γ-proteobacterial endosymbionts, andthe genome harbors a much higher density of

pseudogenes than any other endosymbiont.Perhaps it is no coincidence that unlike thespecies described above, S. glossinidius has re-tained its DNA-repair genes.

Second, thousands of parasitoid waspspecies use endosymbiotic viruses (from thePolydnaviridae family) to facilitate progenydevelopment within parasitized lepidopteranhosts. Most viruses are pathogens with sub-stantial capacity for horizontal transmissionand presumably high Ng, as reflected in theirstreamlined genomes (Figure 1). However,because polydnaviral chromosomes are inte-grated directly into the wasp genome, theyshould be subject to the same drift and mu-tation processes as the host. Consistent withthe hypothesis that relatively low 2Ngu en-courages the accumulation of excess DNA inanimals, the genomes of these viruses are ex-traordinarily large (∼200 to 600 kb), with only17% to 29% coding DNA (39, 126). The non-coding DNA of these species contains nu-merous mobile elements, and 10% to 20% ofthe protein-coding genes contain spliceoso-mal introns (both highly unusual features forDNA viruses). These observations illustratethat when placed in a small Ng environment,even viral genomes eventually succumb to apredictable suite of “pathological” responsesto random genetic drift.

SUMMARY POINTS

1. The scaling of most aspects of genomic architecture and gene structure with genomesize fall on a continuum from viruses to prokaryotes to single-celled eukaryotes tomulticellular eukaryotes. This suggests that general population-genetic mechanisms,transcending cellular and metabolic features, are the predominant drivers of interspe-cific divergence in genome architecture.

2. It is commonly thought that genomic streamlining in microbes is a direct consequenceof biased mutation pressure toward deletions or an indirect consequence of selectionfor high metabolic efficiency, but neither hypothesis enjoys convincing empirical sup-port.

3. An alternative hypothesis is that nonfunctional DNA is opposed by weak naturalselection because it increases the degenerative mutation rate of associated genes. Theextent to which such selection can operate efficiently depends on the ratio of thepower of mutation (u) to that of genetic drift (1/2Ng), 2Ngu.

342 Lynch

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 17: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

4. In microbial species, particularly prokaryotes, 2Ngu is generally large enough to pre-vent significant colonization of introns, mobile elements, and excess intergenic DNA,whereas 2Ngu in multicellular eukaryotes is sufficiently low to provide a highly permis-sive environment for the expansion of nonfunctional DNA. The quantitative nature ofthese results strongly implies the population-genetic environment of a species definesthe genomic pathways that are open to evolutionary exploration.

5. This general theory also provides a potentially unifying explanation for genome struc-ture modifications observed in other contexts, including the divergent evolutionarytrajectories taken by animal versus land-plant mitochondria and the unique genomicattributes of many microbial endosymbionts.

ACKNOWLEDGMENTS

This work has been supported by grants from the National Institutes of Health, NationalScience Foundation, and the Indiana MetaCyte Initiative funded by the Lilly Foundation.

LITERATURE CITED

1. Akman L, Yamashita A, Watanabe H, Oshima K, Shiba T, et al. 2002. Genome sequenceof the endocellular obligate symbiont of tsetse flies, Wigglesworthia glossinidia. Nat. Genet.32:402–7

2. Arkhipova I, Meselson M. 2000. Transposable elements in sexual and ancient asexual taxa.Proc. Natl. Acad. Sci. USA 97:14473–77

3. Baldo L, Bordenstein S, Wernegreen JJ, Werren JH. 2006. Widespread recombinationthroughout Wolbachia genomes. Mol. Biol. Evol. 23:437–49

4. Bastolla U, Moya A, Viguera E, van Ham RC. 2004. Genomic determinants of proteinfolding thermodynamics in prokaryotic organisms. J. Mol. Biol. 343:1451–66

5. Bennett MD. 1972. Nuclear DNA content and minimum generation time in herbaceousplants. Proc. R. Soc. London Sci. Ser. B 181:109–35

6. Bergthorsson U, Ochman H. 1998. Distribution of chromosome length variation innatural isolates of Escherichia coli. Mol. Biol. Evol. 15:6–16

7. A key theoreticalanalysis of thepopulation-geneticenvironment withinorganelle genomes.

7. Birky CWJ, Maruyama T, Fuerst P. 1983. An approach to population and evo-lutionary genetic theory for genes in mitochondria and chloroplasts, and someresults. Genetics 103:513–27

8. Blanc VM, Adams J. 2004. Ty1 insertions in intergenic regions of the genome of Saccha-romyces cerevisiae transcribed by RNA polymerase III have no detectable selective effect.FEMS Yeast Res. 4:487–91

9. Blumenstiel JP, Hartl DL, Lozovsky ER. 2002. Patterns of insertion and deletion incontrasting chromatin domains. Mol. Biol. Evol. 19:2211–25

10. Blumenthal T, Gleason KS. 2003. Caenorhabditis elegans operons: form and function. Nat.Rev. Genet. 4:112–20

11. Bremer H, Dennis PP. 1996. Modulation of chemical composition and other parametersof the cell by growth rate. In Escherichia coli and Salmonella: Cellular and MolecularBiology, ed. FC Neidhardt, R Curtiss III, JL Ingraham, ECC Lin, KB Low, et al., pp.1553–69. Washington, DC: ASM Press. 2nd ed.

www.annualreviews.org • Microbial Genome Architecture 343

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 18: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

12. Caballero A. 1994. Developments in the prediction of effective population size. Heredity73:657–79

13. Campbell DA, Thomas S, Sturm NR. 2003. Transcription in kinetoplastid protozoa:Why be normal? Microb. Infect. 5:1231–40

14. Canback B, Tamas I, Andersson SG. 2004. A phylogenomic study of endosymbioticbacteria. Mol. Biol. Evol. 21:1110–22

15. Casjens S. 1998. The diverse and dynamic structure of bacterial genomes. Annu. Rev.Genet. 32:339–77

16. Cavalier-Smith T. 1978. Nuclear volume control by nucleoskeletal DNA, selection forcell volume and cell growth rate, and the solution of the DNA C-value paradox. J. Cell.Sci. 34:247–78

17. Cavalier-Smith T. 2005. Economy, speed and size matter: evolutionary forces drivingnuclear genome miniaturization and expansion. Ann. Bot. 95:147–75

18. Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, et al. 2004. Unbiased mappingof transcription factor binding sites along human chromosomes 21 and 22 points towidespread regulation of noncoding RNAs. Cell 116:499–99

19. Charlesworth B. 1985. The population genetics of transposable elements. In PopulationGenetics and Molecular Evolution, ed. T Ohta, K Aoki, pp. 213–32. New York: Springer

20. Charlesworth B. 1996. The changing sizes of genes. Nature 384:315–1621. A simplelayman’sdescription ofeffectivepopulation size.

21. Charlesworth B. 2002. Effective population size. Curr. Biol. 12:R716–1722. Cohen BA, Mitra RD, Hughes JD, Church GM. 2000. A computational analysis of whole-

genome expression data reveals chromosomal domains of gene expression. Nat. Genet.26:183–86

23. Commoner B. 1964. Roles of deoxyribonucleic acid in inheritance. Nature 202:960–6824. Cooper JE, Feil EJ. 2004. Multilocus sequence typing—what is resolved? Trends Microbiol.

12:373–7725. Cooper TF, Lenski RE, Elena SF. 2005. Parasites and mutational load: an experimental

test of a pluralistic theory for the evolution of sex. Proc. R. Soc. London Biol. Sci. 272:311–1726. Cox RA. 2003. Correlation of the rate of protein synthesis and the third power of the

RNA: protein ratio in Escherichia coli and Mycobacterium tuberculosis. Microbiology 149:729–37

27. Cox RA. 2004. Quantitative relationships for specific growth rates and macromolecularcompositions of Mycobacterium tuberculosis, Streptomyces coelicolor A3(2) and Escherichia coliB/r: an integrative theoretical approach. Microbiology 150:1413–26

28. Dai L, Zimmerly S. 2002. Compilation and analysis of group II intron insertions inbacterial genomes: evidence for retroelement behavior. Nucleic Acids Res. 30:1091–102

29. Daley JM, Palmbos PL, Wu D, Wilson TE. 2005. Nonhomologous end joining in yeast.Annu. Rev. Genet. 39:431–51

30. Darnell JEJ. 1978. Implications of RNA-RNA splicing in evolution of eukaryotic cells.Science 202:1257–60

31. Davis RE, Hodgson S. 1997. Gene linkage and steady state RNAs suggest trans-splicingmay be associated with a polycistronic transcript in Schistosoma mansoni. Mol. Biochem.Parasitol. 89:25–39

32. Degnan PH, Lazarus AB, Wernegreen JJ. 2005. Genome sequence of Blochmannia penn-sylvanicus indicates parallel evolutionary trends among bacterial mutualists of insects.Genome Res. 15:1023–33

33. The only directestimate of thegenomic mutationrate in anymulticellularspecies.

33. Denver DR, Morris K, Lynch M, Thomas WK. 2004. High mutation rate andpredominance of insertions in the Caenorhabditis elegans nuclear genome. Nature

430:679–82

344 Lynch

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 19: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

34. Doolittle WF. 1978. Genes in pieces: Were they ever together? Nature 272:581–8235. Doolittle WF, Sapienza C. 1980. Selfish genes, the phenotype paradigm and genome

evolution. Nature 284:601–336. Drake JW. 1991. A constant rate of spontaneous mutation in DNA-based microbes. Proc.

Natl. Acad. Sci. USA 88:7160–6437. Drake JW, Hwang CB. 2005. On the mutation rate of herpes simplex virus type 1. Genetics

170:969–7038. Esnault C, Maestre J, Heidmann T. 2000. Human LINE retrotransposons generate

processed pseudogenes. Nat. Genet. 24:363–6739. Espagne E, Dupuy C, Huguet E, Cattolico L, Provost B, et al. 2004. Genome sequence

of a polydnavirus: insights into symbiotic virus evolution. Science 306:286–8940. Farh KK, Grimson A, Jan C, Lewis BP, Johnston WK, et al. 2005. The widespread impact

of mammalian microRNAs on mRNA repression and evolution. Science 310:1817–2141. Finlay BJ. 2002. Global dispersal of free-living microbial eukaryote species. Science

296:1061–6342. Finlay BJ, Esteban GF, Clarke KJ, Olmo JL. 2001. Biodiversity of terrestrial protozoa

appears homogeneous across local and global spatial scales. Protist 152:355–6643. Finlay BJ, Monaghan EB, Maberly SC. 2002. Hypothesis: the rate and scale of dispersal

of freshwater diatom species is a function of their global abundance. Protist 153:261–7344. Outlines whythe evolution ofcomplex, modularstructures for generegulation requiresa small effectivepopulation size.

44. Force A, Cresko W, Pickett FB, Proulx S, Amemiya C, Lynch M. 2005. The originof gene subfunctions and modular gene regulation. Genetics 170:433–46

45. French S. 1992. Consequences of replication fork movement through transcription unitsin vivo. Science 258:1362–65

46. Fry AJ, Wernegreen JJ. 2005. The roles of positive and negative selection in the molecularevolution of insect endosymbionts. Gene 355:1–10

47. Funk DJ, Wernegreen JJ, Moran NA. 2002. Intraspecific variation in symbiont genomes:bottlenecks and the aphid-Buchnera association. Genetics 157:477–89

48. Garfinkel DJ, Nyswaner KM, Stefanisko KM, Chang C, Moore SP. 2005. Ty1 copynumber dynamics in Saccharomyces. Genetics 169:1845–57

49. Gil R, Silva FJ, Zientz E, Delmotte F, Gonzalez-Candelas F, et al. 2003. The genomesequence of Blochmannia floridanus: comparative analysis of reduced genomes. Proc. Natl.Acad. Sci. USA 100:9388–93

50. Gilbert W. 1987. The exon theory of genes. Cold Spring Harbor. Symp. Quant. Biol. 52:901–5

51. Gil-Lamaignere C, Roilides E, Hacker J, Muller FM. 2003. Molecular typing for fungi –a critical review of the possibilities and limitations of currently and future methods. Clin.Microbiol. Infect. 9:172–85

52. A classicalpaperdemonstrating therole that linkageplays in definingthe long-termeffectivepopulation size.

52. Gillespie JH. 2000. Genetic drift in an infinite population: the pseudohitchhikingmodel. Genetics 155:909–19

53. Gilson PR, McFadden GI. 1996. The miniaturized nuclear genome of eukaryotic en-dosymbiont contains genes that overlap, genes that are cotranscribed, and the smallestknown spliceosomal introns. Proc. Natl. Acad. Sci. USA 93:7737–42

54. Giovannoni SJ, Tripp HJ, Givan S, Podar M, Vergin KL, et al. 2005. Genome streamliningin a cosmopolitan oceanic bacterium. Science 309:1242–45

55. Gray MW, Burger G, Lang F. 1999. Mitochondrial evolution. Science 283:1476–8156. Gray MW, Lang BF, Cedergren R, Golding GB, Lemieux C, et al. 1998. Genome

structure and gene content in protist mitochondrial DNAs. Nucleic Acids Res. 26:865–78

www.annualreviews.org • Microbial Genome Architecture 345

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 20: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

57. Gregory TR. 2001. Coincidence, coevolution, or causation? DNA content, cell size, andthe C-value enigma. Biol. Rev. 76:65–101

58. Gregory TR, DeSalle R. 2005. Comparative genomics in prokaryotes. In The Evolutionof the Genome, ed. TR Gregory, pp. 585–75. Boston: Elsevier/Academic. 740 pp.

59. Hahn MW, Stajich JE, Wray GA. 2003. The effects of selection against spurious tran-scription factor binding sites. Mol. Biol. Evol. 20:901–6

60. Hawk JD, Stefanovic L, Boyer JC, Petes TD, Farber RA. 2005. Variation in efficiency ofDNA mismatch repair at different sites in the yeast genome. Proc. Natl. Acad. Sci. USA102:8639–43

61. Herbeck JT, Funk DJ, Degnan PH, Wernegreen JJ. 2003. A conservative test of ge-netic drift in the endosymbiotic bacterium Buchnera: slightly deleterious mutations in thechaperonin groEL. Genetics 165:1651–60

62. Hill WG, Robertson A. 1966. The effect of linkage on limits to artificial selection. Genet.Res. 8:269–94

63. Houle D, Nuzhdin SV. 2004. Mutation accumulation and the effect of copia insertions inDrosophila melanogaster. Genet. Res. 83:7–18

64. Ingraham JL, Maaløe O, Neidhardt FC. 1983. Growth of the Bacterial Cell. Sunderland,MA: Sinauer. 172 pp.

65. Jiggins FM. 2002. The rate of recombination in Wolbachia bacteria. Mol. Biol. Evol.19:1640–43

66. Johnson KP. 2004. Deletion bias in avian introns over evolutionary timescales. Mol. Biol.Evol. 21:599–602

67. Kampa D, Cheng J, Kapranov P, Yamanaka M, Brubaker S, et al. 2004. Novel RNAsidentified from an in-depth analysis of the transcriptome of human chromosomes 21 and22. Genome Res. 14:331–42

68. Kimura M. 1962. On the probability of fixation of mutant genes in populations. Genetics47:713–19

69. Komaki K, Ishikawa H. 2000. Genomic copy number of intracellular bacterial symbiontsof aphids varies in response to developmental stage and morph of their host. Insect. Biochem.Mol. Biol. 30:253–58

70. Kondrashov AS. 2003. Direct estimates of human per nucleotide mutation rates at 20 locicausing Mendelian diseases. Hum. Mutat. 21:12–27

71. Kreahling J, Graveley BR. 2004. The origins and implications of Aluternative splicing.Trends Genet. 20:1–4

72. Kunz BA, Ramachandran K, Vonarx EJ. 1998. DNA sequence analysis of spontaneousmutagenesis in Saccharomyces cerevisiae. Genetics 148:1491–505

73. Lambert JD, Moran NA. 1998. Deleterious mutations destabilize ribosomal RNA inendosymbiotic bacteria. Proc. Natl. Acad. Sci. USA 95:4458–62

74. Lambowitz AM, Zimmerly S. 2004. Mobile group II introns. Annu. Rev. Genet. 38:1–3575. Lampson BC, Inouye M, Inouye S. 2005. Retrons, msDNA, and the bacterial genome.

Cytogenet. Genome Res. 110:491–9976. Lathe WCIII, Snel B, Bork P. 2000. Gene context conservation of a higher order than

operons. Trends Biochem. Sci. 25:474–7977. Latorre A, Gil R, Silva FJ, Moya A. 2005. Chromosomal stasis versus plasmid plasticity

in aphid endosymbiont Buchnera aphidicola. Heredity 95:339–4778. Lawrence JG, Hendrix RW, Casjens S. 2001. Where are the pseudogenes in bacterial

genomes? Trends Microbiol. 9:535–4079. Lawrence JG, Roth JR. 1996. Selfish operons: horizontal transfer may drive the evolution

of gene clusters. Genetics 143:1843–60

346 Lynch

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 21: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

80. Lee JM, Sonnhammer EL. 2003. Genomic gene clustering analysis of pathways in eu-karyotes. Genome Res. 13:875–82

81. Lenski RE, Winkworth CL, Riley MA. 2003. Rates of DNA sequence evolution in exper-imental populations of Escherichia coli during 20,000 generations. J. Mol. Evol. 56:498–508

82. Lercher MJ, Blumenthal T, Hurst LD. 2003. Coexpression of neighboring genes inCaenorhabditis elegans is mostly due to operons and duplicate genes. Genome Res. 13:238–43

83. Lercher MJ, Urrutia AO, Hurst LD. 2002. Clustering of housekeeping genes provides aunified model of gene order in the human genome. Nat. Genet. 31:180–83

84. Lev-Maor G, Sorek R, Shomron N, Ast G. 2003. The birth of an alternatively splicedexon: 3’ splice-site selection in Alu exons. Science 300:1288–91

85. Lynch M. 2002. Intron evolution as a population-genetic process. Proc. Natl. Acad. Sci.USA 99:6118–23

86. Lynch M. 2006. The origins of eukaryotic gene structure. Mol. Biol. Evol. 23:450–6887. Lynch M, Conery JS. 2003. The origins of genome complexity. Science 302:1401–488. Lynch M, Hong X, Scofield DG. 2006. Nonsense-mediated decay and the evolu-

tion of eukaryotic gene structure. In Nonsense-Mediated mRNA Decay, ed. LE Maquat,pp. 197–211. Georgetown, TX: Landes Biosci.

89. Lynch M, Koskella B, Schaack S. 2006. Mutation pressure and the evolution of organellegenomic architecture. Science 311:1727–30

90. Lynch M, Scofield DG, Hong X. 2005. The evolution of transcription-initiation sites.Mol. Biol. Evol. 22:1137–46

91. Mahillon J, Leonard C, Chandler M. 1999. IS elements as constituents of bacterialgenomes. Res. Microbiol. 150:675–87

92. Maldonado R, Jimenez J, Casadesus J. 1994. Changes of ploidy during the Azotobactervinelandii growth cycle. J. Bacteriol. 176:3911–19

93. Maside X, Bartolome C, Assimacopoulos S, Charlesworth B. 2001. Rates of movementand distribution of transposable elements in Drosophila melanogaster: in situ hybridizationvs Southern blotting data. Genet. Res. 78:121–36

94. Mira A, Ochman H, Moran NA. 2001. Deletional bias and the evolution of bacterialgenomes. Trends Genet. 17:589–96

95. Moran JV, DeBerardinis RJ, Kazazian HHJ. 1999. Exon shuffling by L1 retrotransposi-tion. Science 283:1530–34

96. The firstattempt todemonstrate thesusceptibility ofbacterialendosymbionts tothe accumulation ofmildly deleteriousmutations.

96. Moran NA. 1996. Accelerated evolution and Muller’s rachet in endosymbiotic bac-teria. Proc. Natl. Acad. Sci. USA 93:2873–78

97. Moran NA, Plague GR. 2004. Genomic changes following host restriction in bacteria.Curr. Opin. Genet. Dev. 14:627–33

98. Nilsson AI, Koskiniemi S, Eriksson S, Kugelberg E, Hinton JC, et al. 2005. Bacterialgenome size reduction by experimental evolution. Proc. Natl. Acad. Sci. USA 102:12112–16

99. Nuzhdin SV, Mackay TF. 1995. The genomic rate of transposable element movementin Drosophila melanogaster. Mol. Biol. Evol. 12:180–81

100. Ochman H. 2003. Neutral mutations and neutral substitutions in bacterial genomes. Mol.Biol. Evol. 20:2091–96

101. Ohnishi G, Endo K, Doi A, Fujita A, Daigaku Y, et al. 2004. Spontaneous mutagenesis inhaploid and diploid Saccharomyces cerevisiae. Biochem. Biophys. Res. Commun. 325:928–33

102. Ophir R, Graur D. 1997. Patterns and rates of indel evolution in processed pseudogenesfrom humans and murids. Gene 205:191–202

www.annualreviews.org • Microbial Genome Architecture 347

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 22: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

103. Orgel LE, Crick FH. 1980. Selfish DNA: the ultimate parasite. Nature 284:604–7104. Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N. 1999. The use of gene

clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96:2896–901105. Petrov DA. 2002. Mutational equilibrium model of genome size evolution. Theor. Popul.

Biol. 61:531–44106. Price MN, Huang KH, Arkin AP, Alm EJ. 2005. Operon formation is driven by coregu-

lation and not by horizontal gene transfer. Genome Res. 15:809–19107. Puchta H. 2005. The repair of double-strand breaks in plants: mechanisms and conse-

quences for genome evolution. J. Exp. Bot. 56:1–14108. Ranea JA, Grant A, Thornton JM, Orengo CA. 2005. Microeconomic principles explain

an optimal genome size in bacteria. Trends Genet. 21:21–25109. Rest JS, Mindell DP. 2003. Retroids in archaea: phylogeny and lateral origins. Mol. Biol.

Evol. 20:1134–42110. Ripley LS. 1990. Frameshift mutation: determinants of specificity. Annu. Rev. Genet.

24:189–213111. Ripley LS. 1999. Predictability of mutant sequences. Relationships between mutational

mechanisms and mutant specificity. Ann. N. Y. Acad. Sci. 870:159–72112. Rogozin IB, Kochetov AV, Kondrashov FA, Koonin EV, Milanesi L. 2001. Presence of

ATG triplets in 5’ untranslated regions of eukaryotic cDNAs correlates with a ‘weak’context of the start codon. Bioinformatics 17:890–900

113. Rogozin IB, Makarova KS, Natale DA, Spiridonov AN, Tatusov RL, et al. 2002. Con-gruent evolution of different classes of noncoding DNA in prokaryotic genomes. NucleicAcids Res. 30:4264–71

114. Rolfe DF, Brown GC. 1997. Cellular energy utilization and molecular origin of standardmetabolic rate in mammals. Physiol. Rev. 77:731–58

115. Sargentini NJ, Smith KC. 1994. DNA sequence analysis of spontaneous and gamma-radiation (anoxic)-induced lacId mutations in Escherichia coli umuC122::Tn5: differentialrequirement for umuC at G.C vs A.T sites and for the production of transversions vstransitions. Mutat. Res. 311:175–89

116. Schaaper RM, Dunn RL. 1991. Spontaneous mutation in the Escherichia coli lacI gene.Genetics 129:317–26

117. Shankar R, Grover D, Brahmachari SK, Mukerji M. 2004. Evolution and distributionof RNA polymerase II regulatory sites from RNA polymerase III dependant mobile Aluelements. BMC Evol. Biol. 4:37

118. Shigenobu S, Watanabe H, Hattori M, Sakaki Y, Ishikawa H. 2000. Genome sequenceof the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407:81–86

119. Sorek R, Ast G, Graur D. 2002. Alu-containing exons are alternatively spliced. GenomeRes. 12:1060–67

120. Spellman PT, Rubin GM. 2002. Evidence for large domains of similarly expressed genesin the Drosophila genome. J. Biol. 1:5

121. Tamas I, Klasson L, Canback B, Naslund AK, Eriksson AS, et al. 2002. 50 million yearsof genomic stasis in endosymbiotic bacteria. Science 296:2376–79

122. Thomas CA. 1971. The genetic organization of chromosomes. Annu. Rev. Genet. 5:237–56

123. Toh H, Weiss BL, Perkin SA, Yamashita A, Oshima K, et al. 2006. Massive genomeerosion and functional adaptations provide insights into the symbiotic lifestyle of Sodalisglossinidius in the tsetse host. Genome Res. 16:149–56

124. Toro N. 2003. Bacteria and Archaea group II introns: additional mobile genetic elementsin the environment. Environ. Microbiol. 5:143–51

348 Lynch

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.

Page 23: Streamlining and Simplification of Microbial Genome ...lynchlab/PDF/Lynch152.pdfSimplification of Microbial Genome Architecture Michael Lynch ... and eukaryotic genomes. ... likely

ANRV286-MI60-15 ARI 9 August 2006 11:23

125. van Ham RC, Kamerbeek J, Palacios C, Rausell C, Abascal F, et al. 2003. Reductivegenome evolution in Buchnera aphidicola. Proc. Natl. Acad. Sci. USA 100:581–86

126. Webb BA, Strand MR, Dickey SE, Beck MH, Hilgarth RS, et al. 2006. Polydnavirusgenomes reflect their dual roles as mutualists and pathogens. Virology 347:160–74

127. Witherspoon DJ, Robertson HM. 2003. Neutral evolution of ten types of mariner trans-posons in the genomes of Caenorhabditis elegans and C. briggsae. J. Mol. Evol. 56:751–69

128. Yang HP, Tanikawa AY, Kondrashov AS. 2001. Molecular nature of 11 spontaneous denovo mutations in Drosophila melanogaster. Genetics 157:1285–92

129. Zhang Z, Gerstein M. 2003. Patterns of nucleotide substitution, insertion and deletionin the human genome inferred from pseudogenes. Nucleic Acids Res. 31:5338–48

130. Zimmerly S, Hausner G, Wu X. 2001. Phylogenetic relationships among group II intronORFs. Nucleic Acids Res. 29:1238–50

www.annualreviews.org • Microbial Genome Architecture 349

Ann

u. R

ev. M

icro

biol

. 200

6.60

:327

-349

. Dow

nloa

ded

from

arj

ourn

als.

annu

alre

view

s.or

gby

IN

DIA

NA

UN

IVE

RSI

TY

- B

loom

ingt

on o

n 08

/31/

07. F

or p

erso

nal u

se o

nly.