DNA Sequencing and the Modern Revolution in Studies of Microbial Diversity

Preview:

DESCRIPTION

Talk by Jonathan Eisen at the California Academy of Sciences December 16, 2010

Citation preview

1

DNA Sequencingand the

Modern Revolution in Studies of Microbial Diversity

Jonathan A. EisenUC Davis

Talk at CalacademyDecember 17, 2010

Monday, November 26, 12

Monday, November 26, 12

Social Networking in Science

Monday, November 26, 12

Bacterial evolve

Monday, November 26, 12

5

Outline

• Introduction: Diversity of microbes

• I: The Tree of Life

• II: Genome Sequencing

• III: Microbes in the Field

• IV: Metagenomics

Monday, November 26, 12

Introduction

Diversity of Microbes

6Monday, November 26, 12

Diversity of function

7

D. Diversity of form

Monday, November 26, 12

Many major pathogens are bacteria

Monday, November 26, 12

Bacteria and archaea are key commensals of many eukaryotes

Monday, November 26, 12

Extreme conditions are dominated by bacteria and archaea

Monday, November 26, 12

Microbes run global cycles

Monday, November 26, 12

The first photosynthetic cells were similar to cyanobacteria.

Photosynthetic Organisms Changed Earth’s Atmosphere

Monday, November 26, 12

13

6.15 Metabolic Pathways

Monday, November 26, 12

Diversity of form: prokaryotes

14

D. Diversity of form

Monday, November 26, 12

More shape diversity

15Monday, November 26, 12

16Monday, November 26, 12

Diversity of form II: complexity and size

17Monday, November 26, 12

Fruiting bodies

Photo 26.24 Fruiting body of gliding bacterium Stigmatella aurantiaca. SEM. 18Monday, November 26, 12

Diversity of form III: biofilms

Growth and division,formation of matrix

Mature biofilm

Binding to surface

Irreversible attachment

Matrix

Free-swimmingprokaryotes

Single-species biofilm

Signalmolecules

Signalmolecules

Attraction ofother organisms

19Monday, November 26, 12

Diversity of form: microbial eukaryotes

20

D. Diversity of form

Monday, November 26, 12

Part I:

The Tree of Life

21Monday, November 26, 12

22

Darwin and a Single Tree of Life

George Richmond. Darwin Heirlooms Trust

Darwin Origin of Species 1859

Set stage for “tree thinking”

Monday, November 26, 12

23

Ernst Haeckel 1866

www.mblwhoilibrary.org

PlantaeProtistaAnimalia

Monday, November 26, 12

The Microbe Problem

Most trees of life did not deal with microbes very well

Trees were not based on comparing homologous traits between all organisms

Monday, November 26, 12

26http://mcb.illinois.edu/faculty/profile/1204

Carl Woese

Monday, November 26, 12

12.3 From Gene to Protein

27Monday, November 26, 12

28

The Ribosome

Monday, November 26, 12

29

rRNA Systematics

• All cellular organisms have ribosomes

• All have homologous subunits of the ribosomes including specific ribosomal proteins and ribosomal RNAs (i.e., these are universally homologous genes)

• Woese determined the sequences of ribosomal RNAs from different species

• The sequences are highly similar but have some variation

• Each position in a rRNA can be considered a distinct character trait

• Each position has multiple possible character states (A, C, U, G)

Monday, November 26, 12

Alignments

• Method of assigning homology to individual residues in different sequences

• Allows one to have multiple traits within individual genes

• Each column in alignment = a different character

• Each residue (ACTG) = state

30Monday, November 26, 12

Alignments

• Similar in concept to lining up bones from different species

31Monday, November 26, 12

Woese 1987 - rRNA

Microbiological Reviews 51:22132

Monday, November 26, 12

33

4.7 Eukaryotic Cells (Part 1)

Monday, November 26, 12

34

4.4 A Prokaryotic Cell

Monday, November 26, 12

35

26.23 Some Would Call It Hell; These Archaea Call It Home

Monday, November 26, 12

36

The Tree of Life2006

adapted from Baldauf, et al., in Assembling the Tree of Life, 2004Monday, November 26, 12

37

The Tree of Life2006

adapted from Baldauf, et al., in Assembling the Tree of Life, 2004Monday, November 26, 12

Why tree useful?

• Reclassification of many organisms, including diversity of pathogensChanges how to design treatments

• Interpret comparative dataConvergence vs. homology

38Monday, November 26, 12

Part II:

Genome Sequencing

39Monday, November 26, 12

Fleischmann et al. 1995

Monday, November 26, 12

Whole Genome Shotgun Sequencing

Monday, November 26, 12

Whole Genome Shotgun Sequencing

Monday, November 26, 12

Whole Genome Shotgun Sequencing

Warner Brothers, Inc.

Monday, November 26, 12

Whole Genome Shotgun Sequencing

shotgun

Warner Brothers, Inc.

Monday, November 26, 12

Whole Genome Shotgun Sequencing

shotgun

Warner Brothers, Inc.

Monday, November 26, 12

Whole Genome Shotgun Sequencing

shotgun

sequenceWarner Brothers, Inc.

Monday, November 26, 12

Whole Genome Shotgun Sequencing

shotgun

sequenceWarner Brothers, Inc.

Monday, November 26, 12

Assemble Fragments

Monday, November 26, 12

Assemble Fragments

sequencer output

Monday, November 26, 12

Assemble Fragments

sequencer output

Monday, November 26, 12

Assemble Fragments

sequencer output

assemble fragments

Monday, November 26, 12

Assemble Fragments

sequencer output

assemble fragments

Closure &

Annotation

Monday, November 26, 12

Microbial genomes

From http://genomesonline.orgMonday, November 26, 12

General Steps in Analysis of Complete Genomes

• Identification/prediction of genes• Characterization of gene features• Characterization of genome features• Prediction of gene function• Prediction of pathways• Integration with known biological data• Comparative genomics

44

Monday, November 26, 12

Vibrio cholerae Metabolism

Monday, November 26, 12

Genome Sequences Have Revolutionized Microbiology

• Predictions of metabolic processes

• Better vaccine and drug design

• New insights into mechanisms of evolution

• Genomes serve as template for functional studies

• New enzymes and materials for engineering and synthetic biology

Monday, November 26, 12

Genome Size

Monday, November 26, 12

Genome Structure:

More Variable

than Once

Monday, November 26, 12

Monday, November 26, 12

Figure 7.6 - Gene content

Monday, November 26, 12

Figure 7.7 - Gene content E. coli

Monday, November 26, 12

Figure 7.10 - K12 vs O157H7

Monday, November 26, 12

Lateral Transfer

from Doolittle, 1999Monday, November 26, 12

from Lerat et alMonday, November 26, 12

Part III:

Microbes in the field

55

A. Studying microbes

Monday, November 26, 12

How to study microbes

• Key questions about microbes in environment:Who are they? (i.e., what kinds of microbes are they)What are they doing? (i.e., what functions and processes do they possess)

56Monday, November 26, 12

57Monday, November 26, 12

Figure 26.24 Extreme Halophiles

58Monday, November 26, 12

Deep Sea Ecosystems

59Monday, November 26, 12

• For any particular environment, there are many different ways one could go about characterizing the microbes there

• 1. Observe directly in the field

• 2. Grow in the laboratory

• 3. CSI Microbiology (collect & analyze DNA from field)

60Monday, November 26, 12

Method 1:Observe in the field

61

A. Method 1

Monday, November 26, 12

Field Observations an Important Tool

62Monday, November 26, 12

Field Observations an Important Tool

62Monday, November 26, 12

Field Observations an Important Tool

62Monday, November 26, 12

Field Observations an Important Tool

62Monday, November 26, 12

Field Observations an Important Tool

62Monday, November 26, 12

Field Observations an Important Tool

62Monday, November 26, 12

Field Observations an Important Tool

62Monday, November 26, 12

Field Observations an Important Tool

63Monday, November 26, 12

Field Observations an Important Tool

63Monday, November 26, 12

Field Observations an Important Tool

63Monday, November 26, 12

Field Observations an Important Tool

63Monday, November 26, 12

Field Observations an Important Tool

63Monday, November 26, 12

Field Observations an Important Tool

63Monday, November 26, 12

Field Observations an Important Tool

63Monday, November 26, 12

Field Observations an Important Tool

64Monday, November 26, 12

Field Observations an Important Tool

64Monday, November 26, 12

Field Observations an Important Tool

64Monday, November 26, 12

Field Observations an Important Tool

64Monday, November 26, 12

Field Observations an Important Tool

64Monday, November 26, 12

Field Observations an Important Tool

64Monday, November 26, 12

Field Observations an Important Tool

64Monday, November 26, 12

Field Observations an Important Tool

64Monday, November 26, 12

Method 2:Culturing

65

B. Method 2

Monday, November 26, 12

Method 2: Culturing

66Monday, November 26, 12

Examples of Benefits of Culturing:

• Allows one to connect processes and properties to single types of organisms

• Enhances ability to do experiments from genetics, to physiology to genomics

• Provides possibility of large volumes of uniform material for study

• Can supplement appearance based classification with other types of data. Many types are useful, though the standard is analysis of rRNA sequences.

67Monday, November 26, 12

Optimal salt concentration for different species

68Monday, November 26, 12

• Some stresses of high saltOsmotic pressure on cellsDesiccation

Halophile adaptations

69

H20

Monday, November 26, 12

• Some stresses of high saltOsmotic pressure on cellsDesiccation

• Halophile adaptationsIncreased osmolarity inside cell

ProteinsCarbohydratesSalts

Membrane pumpsDesiccation resistance

Halophile adaptations

70

H20

H20

Monday, November 26, 12

• Some stresses of high saltOsmotic pressure on cellsDesiccation

• Halophile adaptationsIncreased osmolarity inside cell

ProteinsCarbohydratesSalts - only done in extremely halophilic archaea

Membrane pumpsDesiccation resistance

Halophile adaptations

71Monday, November 26, 12

• Some stresses of high saltOsmotic pressure on cellsDesiccation

• Halophile adaptationsIncreased osmolarity inside cell

ProteinsCarbohydratesSalts - only done in extremely halophilic archaea

Membrane pumpsDesiccation resistance

Halophile adaptations

72

High internal salt requires ALL cellular components to be adapted to salt, charge. For example, all proteins must change surface charge and other properties.

Monday, November 26, 12

Extreme halophiles are a monophyletic group

73Monday, November 26, 12

74

Uses of extremophiles

Type of environment

Examples Example of mechanism of survival

Practical Uses

High temp (thermophiles)

Deep sea vents, hotsprings

Amino acid changes

Heat stable enzymes

Low temp (psychrophile)

Antarctic ocean, glaciers

Antifreeze proteins

Enhancing cold tolerance of crops

High pressure (barophile)

Deep sea vents, hotsprings

Solute changes Industrial processes

High salt (halophiles

Evaporating pools Incr. internal osmolarity

Soy sauce production

High pH (alkaliphiles)

Soda lakes Transporters Detergents

Low pH (acidophiles)

Mine tailings Transporters Bioremediation

Desiccation (xerophiles)

Deserts Spore formation Freeze-drying additives

High radiation (radiophiles)

Nuclear reactor waste sites

Absorption, repair damage

Bioremediation, space travel

Monday, November 26, 12

Method III:CSI Microbiology

75Monday, November 26, 12

Culturing Microscopy

CountCount

Great Plate Count Anomaly

76Monday, November 26, 12

<<<<

Great Plate Count Anomaly

77

Culturing Microscopy

CountCountMonday, November 26, 12

Great Plate Count Anomaly

78

Problem because appearance not

effective for “who is out there?” or “what are they

doing?”

<<<<

Culturing Microscopy

CountCountMonday, November 26, 12

Great Plate Count Anomaly

79

Problem because appearance not

effective for “who is out there?” or “what are they

doing?”

<<<<

Culturing Microscopy

CountCount

Solution?

Monday, November 26, 12

Great Plate Count Anomaly

80

Problem because appearance not

effective for “who is out there?” or “what are they

doing?”

<<<<

Culturing Microscopy

CountCount

Solution?

DNA

Monday, November 26, 12

Collect from environment

Analysis of uncultured microbes

81Monday, November 26, 12

Collect from environment

Analysis of uncultured microbes

81Monday, November 26, 12

Polymerase Chain Reaction- PCR

82Monday, November 26, 12

DNA extraction

PCRSequence

rRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

Yeast

Makes lots of copies of the rRNA genes in sample

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

Yeast

83

rRNA1 5’

...TACAGTATAGGTGGAGCTAGCGATCGATC

GA... 3’

PCR and phylogenetic analysis of rRNA genes

Monday, November 26, 12

Deep Sea Ecosystems

84Monday, November 26, 12

Chemosymbionts

Monday, November 26, 12

Analysis of uncultured microbes

86

NOTES 3419

A. pisum P

A. piswn S Tx. nivea

L awaaa symLL equizenata syrCud orbgcdar s,ym

rs. gesgosterorn - I/\ -- V IN. gonorrhoeae

B. Uhar.opkiuns sym

5% C. magncisca sym

Tns. sp. L-12

A. tnefaciens

R. ricketsil

FIG. 4. Unrooted phylogenetic tree showing the position of the S. velum symbionts in relation to that of other Proteobacteria species onthe basis of 16S rRNA gene sequences. The tree was constructed from evolutionary distances in Table 1. Members of the alpha and betasubclasses of the Proteobacteria are bracketed; all others are of the gamma subclass. Chemoautotrophic symbionts (sym) are listed inboldface type. Full species names listed in Table 1. Scale bar represents percent similarity.

dicted size bands for S. velum genomic DNA (Fig. 3A): AvaIand BclI, 1,080 bp; EcoNI, 1,109 bp; and Nco and Stul, 998bp (data not shown). We suggest that this technique isgenerally useful for the confirmation of the presence ofPCR-generated sequences in cells with multiple types ofDNA.The restriction patterns of 16S rRNA coding regions for

DNA extracted from S. velum gills were identical for all nineclams examined; representative results are shown in Fig. 3.This, along with the lack of variability in the partial sequenceof 16S rDNA for three individuals, suggests that there is asingle dominant bacterial species within S. velum and thatthe host-symbiont association is species specific. This resultis in agreement with the findings of Distel et al. (12) forlamellibranch bivalve and tubeworm chemoautotrophic sym-bionts.

Single bands were evident for all enzymes predicted to cutoutside or near the ends of the gene such as AvaI, Bcll,EcoNI, PvuII, XhoI (Fig. 3), and NcoI (band size, 9,600 bp;data not shown). Some of these enzymes generated restric-tion fragments larger than that of a typical bacterial ribo-somal operon (which includes the 5S, 16S, and 23S rRNAgenes [-5 kb]), indicating that the single bands observedwere not generated by double cuts within multiple operons.Furthermore, only two bands were observed for enzymespredicted to cut near the middle of the 16S rRNA gene suchas EcoRI (Fig. 3B) and StuI (bands of 4,400 and 19,500 bp;data not shown). Thus, all enzymes in all animals generatedpatterns consistent with the presence of only one copy of the16S rRNA gene in the symbiont genome (Fig. 3). However,

it should be noted that a large duplication of the regioncontaining the rRNA operon with no subsequent changes atany of the nine restriction sites could escape detection bythis analysis.These results suggest that the symbiont genome contains

but a single rRNA operon. Bacterial rRNA operons (rm),which include the 5S, 16S, and 23S rRNA genes, varyconsiderably in number among bacteria. In contrast tofree-living species of Proteobacteria, which have 4 to 7 rmloci (18), only one copy has been detected in other endosym-bionts including both the primary (P) and secondary (S)symbionts of the pea aphid, Acyrthosiphon pisum (33) (in-cluded in Fig. 4). Multiple rRNA operons have generallybeen thought necessary to support a high rate of rRNAsynthesis in rapidly dividing cells (3, 22). Unterman andBaumann (32) suggested that the aphid symbionts thereforegrow slowly, with doubling times of 2 days to parallel thegrowth rate of the aphid host. They further speculated thatthe single rRNA operon in the aphid symbiont genome is aconsequence of the adaptation to a symbiotic existence,which necessitates a slow growth rate. Although the divisionrate of S. velum symbionts is not known, it is unlikely thatthey grow slowly, since they must produce all of the biomassfor their invertebrate host. Studies of rn copy number andgrowth rates of endosymbionts and their free-living relativesfrom a variety of phylogenetic groups may help resolve thesignificance of rRNA operon redundancy.

Phylogenetic analysis of the S. velum symbionts. Phyloge-netic analysis was conducted using the Genetic Data Envi-ronment program (Steve Smith, Harvard Genome Laborato-

VOL. 174, 1992

JOURNAL OF BACTERIOLOGY, May 1992, p. 3416-3421 Vol. 174, No. 100021-9193/92/103416-06$02.00/0Copyright © 1992, American Society for Microbiology

Phylogenetic Relationships of Chemoautotrophic BacterialSymbionts of Solemya velum Say (Mollusca: Bivalvia) Determined

by 16S rRNA Gene Sequence AnalysisJONATHAN A. EISEN,lt STEVEN W. SMITH,2 AND COLLEEN M. CAVANAUGH`*Department of Organismic and Evolutionary Biology, 1 and Harvard Genome Laboratory,2

Biological Laboratories, Harvard University, Cambridge, Massachusetts 02138

Received 4 November 1991/Accepted 9 March 1992

The protobranch bivalve Solemya velum Say (Mollusca: Bivalvia) houses chemoautotrophic symbiontsintracellularly within its gills. These symbionts were characterized through sequencing of polymerase chainreaction-amplified 16S rRNA coding regions and hybridization of an Escherichia coli gene probe to S. velumgenomic DNA restriction fragments. The symbionts appeared to have only one copy of the 16S rRNA gene. Thelack of variability in the 16S sequence and hybridization patterns within and between individual S. velumorganisms suggested that one species of symbiont is dominant within and specific for this host species.Phylogenetic analysis of the 16S sequences of the symbionts indicates that they lie within the chemoautotrophiccluster of the gamma subdivision of the eubacterial group Proteobacteria.

Procaryote-eucaryote associations in which marine inver-tebrates harbor chemoautotrophic bacteria as endosym-bionts appear to be widespread in marine habitats such asdeep-sea hydrothermal vents and coastal sediments (8, 15).In such symbioses, the procaryotes utilize the energy re-leased by the oxidation of reduced inorganic substrates, suchas hydrogen sulfide, to fix carbon dioxide via the Calvin-Benson cycle (7, 13). The hosts appear to derive nutritionfrom their endosymbionts and in turn provide the symbiontssimultaneous access to the substrates from anoxic and oxicenvironments which are necessary for energy generation.Maintenance of such intracellular symbionts presents anovel metazoan acquisition of procaryotic energy generationand autotrophic carbon fixation.While the existence of chemoautotroph-invertebrate sym-

bioses is now generally accepted, little is actually knownabout the symbionts observed in the tissues of any of thehosts because none have been cultured. Comparison ofrRNA sequences has greatly facilitated the identification ofbacteria, including unculturable microorganisms, and theelucidation of their natural relationships (38). Phylogeneticanalysis of 16S rRNA sequences enabled Distel et al. (12) toestablish that the chemoautotrophic symbionts of the hydro-thermal vent tubeworm and five species of bivalves of thesubclass Lamellibranchia are related and cluster in thegamma subdivision of the Proteobacteria (formerly purplephotosynthetic bacteria), one of the 11 major groups ofeubacteria (30).

In this investigation we sought to establish the phyloge-netic relationships and the species specificities of the sym-bionts of the protobranch bivalve Solemya velum Say, anAtlantic coast clam which has been studied as a shallow-water model of invertebrate-chemoautotroph associations(7, 9, 10). The phylogenetic placement of the S. velumsymbionts, to date limited to sequence analysis of the 5SrRNA, indicates that these symbionts also fall in the Proteo-bacteria gamma subdivision (31). However, the small size of

* Corresponding author.t Present address: Department of Biological Sciences, Stanford

University, Stanford, CA 94305.

the 5S rRNA molecule (-120 bp) precludes resolution thatcan be attained with larger molecules such as 16S rRNA(-1,550 bp) (16). Species of the genus Solemya are, to date,the only bivalves of the subclass Protobranchia in whichchemoautotrophic symbiosis has been documented. Theprotobranchs represent an important component of studiesof chemoautotrophic symbioses, since they may be theclosest living group to the ancestral bivalve condition, be-cause they dominate the deep sea and are present along agradient from the deep sea bottom to the shore (1).PCR amplification. We used the polymerase chain reaction

(PCR) (28) to amplify 16S rRNA coding regions from amixture of procaryotic and eucaryotic DNA extracted fromthe symbiont-containing gills of S. velum. S. velum werecollected from eelgrass beds near Woods Hole, Mass., andplaced in filtered (passed through filters with a pore size of0.2 ,um) seawater to cleanse body surfaces prior to dissec-tion. The gills, which contain -109 bacterial symbionts per g(wet weight), and feet, in which symbionts have not beenobserved (7), were dissected, frozen in liquid nitrogen, andstored at -85° C. Frozen tissue was homogenized in lysisbuffer, and DNA was isolated by using hexadecyltrimethy-lammonium bromide (4). DNA from Escherichia coli JM109,prepared by the miniprep method (4), was used as a positivecontrol.

Amplification of 16S rRNA genes by PCR was carried outessentially by the method of Weisburg et al. (34) usingeubacterial universal primers and 200 ng of template DNA.DNA products (Fig. 1) amplified from S. velum gill tissue(lane 1) and from the positive-control E. coli (lane 4) wereprominent single bands of approximately 1,500 bp. Amplifi-cation was not detected when DNA template was not added(lane 2), nor when DNA from S. velum foot tissue was usedas the template (lane 3).The strong amplification from gill tissue DNA and lack of

amplification from foot tissue DNA (Fig. 1) supports theconclusions from studies of enzyme activity, electron mi-croscopy (9), and 5S rRNA sequences (31) that the bacteriaare abundant within, and specific to, the gill tissue. Thisconclusion was further supported by lack of hybridization of

3416

Monday, November 26, 12

Collect from environment

Analysis of uncultured microbes

87Monday, November 26, 12

Collect from environment

Analysis of uncultured microbes

87Monday, November 26, 12

DNA extraction

PCRSequence

rRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

rRNA2

Makes lots of copies of the rRNA genes in sample

rRNA1 5’

...ACACACATAGGTGGAGCTAGCGATCGAT

CGA... 3’

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

rRNA2

88

rRNA2 5’

...TACAGTATAGGTGGAGCTAGCGATCGATC

GA... 3’

PCR and phylogenetic analysis of rRNA genes

Yeast T A C A G TYeast

Monday, November 26, 12

DNA extraction

PCRSequence

rRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

rRNA2

Makes lots of copies of the rRNA genes in sample

rRNA1 5’...ACACACATAGGTGGAGCTA

GCGATCGATCGA... 3’

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

rRNA2

89

rRNA2 5’..TACAGTATAGGTGGAGCTAG

CGACGATCGA... 3’

PCR and phylogenetic analysis of rRNA genes

rRNA3 5’...ACGGCAAAATAGGTGGATT

CTAGCGATATAGA... 3’

rRNA4 5’...ACGGCCCGATAGGTGGATT

CTAGCGCCATAGA... 3’

rRNA3 C A C T G T

rRNA4 C A C A G T

Yeast T A C A G T

Yeast

rRNA3 rRNA4

Monday, November 26, 12

Major phyla of bacteria and archaea (as of 2002)

No cultures

Some cultures90

Monday, November 26, 12

Uses of rDNA PCRBohannan and Hughes 2003

Hugenholtz 2002

91

Monday, November 26, 12

92

Monday, November 26, 12

Censored

Censored

93Monday, November 26, 12

94Monday, November 26, 12

Part IV:

Metagenomics

95Monday, November 26, 12

4.

Microbes in the world I:rRNA PCR

Perna et al. 2003Monday, November 26, 12

Metagenomics

shotgun

clone

Monday, November 26, 12

Novel Form of Phototrophy

Beja et al. 2000

Monday, November 26, 12

Monday, November 26, 12

Acid Mine Drainage 2004

environmental sample, however, variation within each speciespopulation might complicate assembly. If intraspecies variation isdominated by limited local polymorphism or homologous recom-bination, it should be possible to define a composite genome foreach species population. Conversely, if the genomic heterogeneitywithin a species is dominated by large rearrangements, deletions, orinsertions, it may be impossible to define composite genomes forspecies populations from natural communities.A small insert plasmid library (average insert size 3.2 kilobases

(kb)) was constructed from the biofilm DNA for random shotgunsequencing (see Supplementary Information). A total of 76.2million base pairs (bp) of DNA sequence was generated from103,462 high-quality reads (averaging 737 bp per read). Analysisof raw shotgun data (Supplementary Figs S1–5) indicated thepresence of both bacterial and archaeal genomes at sequencecoverages of up to 10£, which would be sufficient to produce ahigh-quality assembly from a conventional microbial genomeproject20,21. The shotgun data set was assembled with JAZZ, awhole-genome shotgun assembler22. Anticipating polymorphisms,we permitted alignment discrepancies beyond those expected fromsequencing error if they were consistent with end-pairing con-straints. Over 85% of the shotgun reads were assembled intoscaffolds longer than 2 kb (a scaffold is a reconstructed genomicregion that may contain gaps of a known size range). The combinedlength of the 1,183 scaffolds is 10.83 megabases (Mb). The assemblyis internally self consistent, with 97.2% of end pairs from the sameclone assembled with the appropriate orientation and separation, asexpected for a low rate of mispairing error (tracking and chimaericclones).The first step in assignment of scaffolds to organism types was to

separate the scaffolds by average G!C content. These were sub-sequently subdivided using read depth (coverage). Dinucleotidefrequencies did not allow for further subdivision. Notably, separa-tion of scaffolds into low G!C (,43.5%; Supplementary Fig. S3a)and high G!C ($43.5%) content ‘bins’ was not significantlycompromised by local heterogeneities in G!C content becausethe scaffolds were binned after assembly. As the scaffolds aretypically tens of kilobases long, local fluctuations in G!C contentare averaged over the length of each scaffold, allowing, in most cases(.99%), clear assignment to bins of high or low G!C content.

The high G!C scaffolds at approximately 10£ coverage (70scaffolds up to 137 kb in length, totalling 2.23Mb) were identifiedby the presence of a single 16S rRNA gene as belonging to thegenome of a Leptospirillum group II species. The average G!Ccontent (55.8%) is comparable to the G!C content (54.9–58%) ofL. ferriphilum19. The total high G!C scaffold length is close to theestimated genome size of Leptospirillum ferrooxidans23 (1.9Mb).This suggests that essentially the entire Leptospirillum group IIgenome was recovered from the community DNA.

The low G!C scaffolds at approximately 10£ coverage wereassembled into 59 scaffolds of up to 138 kb in length, totalling1.82Mb. The single 16S rRNA gene identified in these scaffolds was99% identical to that of the fer1 isolate; however, alignment of thescaffolds to the fer1 genome revealed an average of 22% divergenceat the nucleotide level (Supplementary Fig. S6). The total scaffoldlength is close to the genome size of fer1 (1.9Mb; Allen et al.,unpublished data), and local gene order and content are highlyconserved (Supplementary Fig. S7). Therefore, these 59 scaffoldsrepresent a nearly complete genome of a previously unknown,uncultured Ferroplasma species distinct from fer1. We designatethis as Ferroplasma type II. The dominance of this organism typewas unexpected before the genomic analysis.

We assigned the roughly 3£ coverage, high G!C scaffolds toLeptospirillum group III on the basis of rRNAmarkers (474 scaffoldsup to 31 kb, totalling 2.66Mb). Comparison of these scaffolds withthose assigned to Leptospirillum group II indicates significantsequence divergence and only locally conserved gene order, con-firming that the scaffolds belong to a relatively distant relative ofLeptospirillum group II. A partial 16S rRNA gene sequence fromSulfobacillus thermosulfidooxidans was identified in the un-assembled reads, suggesting very low coverage of this organism. Ifany Sulfobacillus scaffolds .2 kb were assembled, they would begrouped with the Leptospirillum group III scaffolds.

We compared the 3£ coverage, low G!C scaffolds (580 scaffolds,4.12Mb) to the fer1 genome in order to assign them to organismtypes (Supplementary Fig. S6). Scaffolds with $96% nucleotideidentity to fer1 were assigned to an environmental Ferroplasma typeI genome (170 scaffolds up to 47 kb in length and comprising1.48Mb of sequence). The remaining low-coverage, low G!Cscaffolds are tentatively assigned to G-plasma. The largest scaffoldin this bin (62 kb) contains the G-plasma 16S rRNA gene. The 410scaffolds assigned to G-plasma comprise 2.65Mb of sequence. Apartial 16S rRNAgene sequence fromA-plasmawas identified in theunassembled reads, suggesting low coverage of this organism. Anyscaffolds from A-plasma.2 kb would be included in the G-plasmabin. Although eukaryotes are present in the AMD system, they werein low abundance in the biofilm studied. So far, no scaffolds fromeukaryotes have been detected.

As independent evidence that the Leptospirillum group II andFerroplasma type II genomes are nearly complete, we located a fullcomplement of transfer RNA synthetases in each genome data set.An almost complete set of these genes was also recovered fromLeptospirillum group III. TheG-plasma bin containsmore than a fullset of tRNA synthetases, consistent with inclusion of some A-plasmascaffolds. In addition, we established that the Leptospirillumgroup II, Leptospirillum group III, Ferroplasma type I, Ferroplasmatype II and G-plasma bins contained only one set of rRNA genes.

Figure 1 The pink biofilm. a, Photograph of the biofilm in the Richmond mine (hand

included for scale). b, FISH image of a. Probes targeting bacteria (EUBmix; fluoresceinisothiocyanate (green)) and archaea (ARC915; Cy5 (blue)) were used in combination with a

probe targeting the Leptospirillum genus (LF655; Cy3 (red)). Overlap of red and green

(yellow) indicates Leptospirillum cells and shows the dominance of Leptospirillum.

c, Relative microbial abundances determined using quantitative FISH counts.

articles

NATURE | doi:10.1038/nature02340 | www.nature.com/nature2 © 2004 Nature Publishing GroupMonday, November 26, 12

inputs of fixed carbon or nitrogen from external sources. As withLeptospirillum group I, both Leptospirillum group II and III have thegenes needed to fix carbon by means of the Calvin–Benson–Bassham cycle (using type II ribulose 1,5-bisphosphate carboxy-lase–oxygenase). All genomes recovered from the AMD system

contain formate hydrogenlyase complexes. These, in combinationwith carbon monoxide dehydrogenase, may be used for carbonfixation via the reductive acetyl coenzyme A (acetyl-CoA) pathwayby some, or all, organisms. Given the large number of ABC-typesugar and amino acid transporters encoded in the Ferroplasma type

Figure 4 Cell metabolic cartoons constructed from the annotation of 2,180 ORFs

identified in the Leptospirillum group II genome (63% with putative assigned function) and

1,931 ORFs in the Ferroplasma type II genome (58% with assigned function). The cell

cartoons are shown within a biofilm that is attached to the surface of an acid mine

drainage stream (viewed in cross-section). Tight coupling between ferrous iron oxidation,

pyrite dissolution and acid generation is indicated. Rubisco, ribulose 1,5-bisphosphate

carboxylase–oxygenase. THF, tetrahydrofolate.

articles

NATURE | doi:10.1038/nature02340 | www.nature.com/nature 5© 2004 Nature Publishing Group

Monday, November 26, 12

Metagenomics Challenge

Monday, November 26, 12

Metagenomics Challenge

Who is out there?What are they doing?

Monday, November 26, 12

Glassy Winged Sharpshooter

• Feeds on xylem sap• Vector for Pierce’s

Disease • Potential bioterror agent• Collaboration with Nancy

Moran to sequence symbiont genomes

• Funded by NSF• Published in PLOS

Biology 2006

Monday, November 26, 12

Sharpshooter Shotgun Sequencing

shotgun

Wu et al. 2006 PLoS Biology 4: e188.Collaboration with Nancy Moran’s lab

Monday, November 26, 12

Monday, November 26, 12

ABCDEFG

TUVWXYZ

Binning challenge

No reference genome? What do you do?

Phylogeny ....Monday, November 26, 12

CFB Phyla

Monday, November 26, 12

Wu et al. 2006 PLoS Biology 4: e188.

Baumannia makes amino acids

Sulcia makes vitamins and cofactors

110

Monday, November 26, 12

Part V:

Knowing What We Don’t Know

111Monday, November 26, 12

112Monday, November 26, 12

112Monday, November 26, 12

113Monday, November 26, 12

113Monday, November 26, 12

113Monday, November 26, 12

113Monday, November 26, 12

113Monday, November 26, 12

113Monday, November 26, 12

113Monday, November 26, 12

113Monday, November 26, 12

113Monday, November 26, 12

114Monday, November 26, 12

114Monday, November 26, 12

114Monday, November 26, 12

114Monday, November 26, 12

Recommended