Upload
irina-morozova
View
215
Download
0
Embed Size (px)
Citation preview
Comparative sequence analysis of the icm/dot
mosomal loci. We demonstrate that these genes are present in all the L. pneumophila strains examined herein, but
1. Introduction
Legionella pneumophila, the causative agent of
Legionnaires disease, an occasionally fatal pneu-
monia, as well as much more common mild u-
like lung infections, is found in fresh waterthroughout the world. The Philadelphia 1 isolate
of L. pneumophila, named after the site of the
originally described outbreak in 1976 (Fraser
et al., 1977), is a member of the most prevalent
serogroup 1 (Fields et al., 2002). Isolates associ-
4) 12* Corresponding author. Fax: +212-851-5215.display a wide range of sequence variation among the dierent strains, none of which are clearly associated with vir-
ulence potential. The strains fall within seven phylogenetic groups, but discrepancies among the gene trees indicate a
complicated evolutionary history for the icm/dot loci, with perhaps two independent gene acquisition events and
subsequent genomic rearrangements. Signicant ndings include a probable t-SNARE domain in IcmG that may in-
dicate a direct role for this putative inner membrane protein in altering the hosts membrane fusion machinery, apotential functional domain in the central hydrophobic portion of IcmK that may allow it to participate in forming the
pore of the secretion complex, and strict conservation of the amino acid physicochemical characteristics in the IcmP
region corresponding to the trbA domain that could play a role in molecular transfer.
2004 Elsevier Inc. All rights reserved.
Keywords: Legionella pneumophila; icm/dot genes; Evolution; Phylogenetic analysis; Virulencegenes in Legionella
Irina Morozova,a,1 Xiaoyan Qu,a,1 Shundi Shi,a,1 Gifty Asamani,a
Joseph E. Greenberg,a Howard A. Shuman,b and James J. Russoa,*
a Columbia Genome Center, Columbia University College of Physicians and Surgeons, 1150 St. Nicholas Avenue,
New York, NY 10032, USAb Department of Microbiology, Columbia University College of Physicians and Surgeons, 701 W. 168th Street,
New York, NY 10032, USA
Received 18 August 2003, revised 21 November 2003
Abstract
The icm/dot genes in Legionella pneumophila are essential for the ability of the bacteria to survive within macro-
phages in lung infections such as Legionnaires disease, or amoebae in nature. The 22 genes of the complex, thought toencode a transport apparatus for transfer of eector molecules into the host cell cytoplasm, are located in two chro-Plasmid 51 (200E-mail address: [email protected] (J.J. Russo).1 These three authors contributed equally to the work.
0147-619X/$ - see front matter 2004 Elsevier Inc. All rights reservdoi:10.1016/j.plasmid.2003.12.0047147
www.elsevier.com/locate/yplasated with at least 15 other serogroups have been
ed.
described in the ensuing years (Helbig et al., 2002;Yu et al., 2002). In addition, L. pneumophila is one
of about 42 known species within the genus Le-
gionella (Fields et al., 2002; Yu et al., 2002), many
of which can be associated with clinical symptoms.
128 I. Morozova et al. / PlasmiAs part of its life cycle, Legionella bacteria are
taken up and survive within phagocytic cells (e.g.,
amoebae in the environment, macrophages in the
human lung). The bacteria replicate within intra-cellular vacuoles, and eventually kill the original
host cell, whereupon they may infect nearby
phagocytes (Swanson and Hammer, 2000). Among
other virulence genes, two regions with some fea-
tures of pathogenicity islands, the so-called icm/dot
gene clusters, appear to be essential for their
ability to survive within and kill macrophages and
amoebae (Andrews et al., 1998; Berger and Isberg,1993; Sadosky et al., 1993; Segal and Shuman,
1997; Vogel et al., 1998).
The icm/dot2 cluster I includes seven genes
(dotAD, icmV,W,X) and the larger cluster II
contains the remaining 17 members (icmT,S,R,
Q,P,O,N,M,L,K,E,G,C,D,J,B,F). Their encoded
proteins are thought to translocate eector mole-
cules into the host cell that somehow prevent thelatter from killing the proliferating bacteria,
probably by preventing phagosomelysosome fu-
sion in macrophages (Christie, 2001; Nagai et al.,
2002). The icm/dot loci are highly similar to the
transfer region of plasmid R64 and other IncI1
plasmids, and it has previously been suggested that
icm/dot virulence genes share a common ancestor
with plasmid conjugation genes (Komano et al.,2000; Segal and Shuman, 1999; Segal et al., 1998;
Sexton and Vogel, 2002). It is unclear if the icm/dot
genes derive from a single plasmid after which they
separated into the two gene clusters, or there were
multiple gene transfer events. Of the more than
100 bacteria for which complete genome sequence
is available, only Coxiella burnettii has homologs
of the full icm/dot genes; in Coxiella, all the icm/dotgenes are contained in a single locus (Seshadri
2 Many of these genes were discovered at about the same
time in two laboratories and referred to as either icm (intra-
cellular multiplication) or dot (defective in organellar track-
ing). Where a particular gene has two names, we use the icmdesignation in this paper.et al., 2003). Besides the icm/dot genes, Legionella,like many other bacteria, contain most members of
a Type IV secretion system, the lvh/lvr genes that
have virulence properties in some organisms,
though apparently not in L. pneumophila (Segal
et al., 1999).
Since most of the icm/dot proteins are clearly
implicated in the ability of Legionella to grow
and survive within macrophages, it would be ofinterest to know if any of them are missing or
considerably dierent in strains of lower patho-
genicity. In the present work, we demonstrated
the presence of several of the icm/dot genes in at
least seven L. pneumophila species and one or
more lvh/lvr genes in most of the Legionella species
tested.
The evolution of the icm/dot genes may be dis-tinct from the majority of the other Legionella
genes, particularly the housekeeping genes, either
because they are part of the bacterias virulencegene set, or because of their presumed plasmid
origin. Virulence genes are often subject to diver-
sifying selection and evolve faster than the rest of
the genome to avoid the hosts response to theinfection. But Legionella, with their largely intra-cellular lifestyle, are only briey exposed to the
mammalian immune system, and are not known to
establish infections in serial hosts; therefore they
are unlikely to undergo adaptive evolution. In-
deed, the mip gene, which encodes a possible vir-
ulence factor, was found to have relatively few
polymorphisms in L. pneumophila strains even
though it encodes an outer membrane protein(Bumbaugh et al., 2002). In addition, although the
quite variable dotA gene product was found to be
less conservative in its outer domains, the ratio of
synonymous and nonsynonymous nucleotide sub-
stitutions in this region did not indicate adaptive
evolution according to these same investigators
(Bumbaugh et al., 2002). In this study we ad-
dressed the question of whether the remaininggenes of the icm/dot loci show the same elevated
level of variability as dotA relative to other por-
tions of the genome, particularly housekeeping
genes.
A possible plasmid origin of the icm/dot genes
would account for dierences in evolutionary his-
d 51 (2004) 127147tory between these genes as a group and the rest of
Legionellas chromosomal genes. According to thishypothesis, the icm/dot loci would constitute rela-
tively pliable regions susceptible to repeated re-
gional gene rearrangments. Indications of multiple
rearrangements in the icm/dot region have indeed
been described (Ko et al., 2002a,b).
We sequenced 18 icm/dot genes and 4 other
genes in 18 dierent strains of L. pneumophila.
Comparative sequence analysis reveals a widespectrum of variability among the icm/dot genes,
some as conservative as mip and houskeeping
genes, others more variable than dotA. Protein
functional motif search along with the distribution
of variable/conservative regions along the gene
sequence gave additional information on the lo-
cation of functionally important domains in some
icm/dot gene products. We did not observe clear-cut associations of any particular gene variations
with either known serogroups or virulence phe-
notypes. Phylogenetic analysis indicated that dif-
ferent L. pneumophila strains displayed distinct
acquisition histories for some subsets of the icm/
dot genes, consistent with an evolutionary scenario
for the L. pneumophila species encompassing generearrangements as well as repeated horizontal gene
transfer events.
2. Materials and methods
2.1. Bacterial strains
The Legionella species and L. pneumophila
strains used in this study are enumerated in
Table 1.
2.2. Hybridization
Specic primers were designed to amplify all or
a portion of each gene in the Philadelphia 1 strainof L. pneumophila. In general, PCR was carried
out using 150200 ng DNA, 1.5mM MgCl2, 1reaction buer, 0.2 lM each dNTP, 10 pmol eachprimer, and 2U Taq polymerase (Invitrogen) in
50 ll reaction volumes with the following PCRprole (5min at 95 C; 35 cycles of 95 C, 30 s;
Leg 27 L. steigerwaltii SC-18-C9
Leg 28 L. parisiensis PF-209C-C2
tel, w
6 in 1
I. Morozova et al. / Plasmid 51 (2004) 127147 129Table 1
Legionella strains
L. pneumophila
Leg # Serogroup Isolate
Leg 1 1 Bellingham 1
Leg 2 11
Leg 3 1 Philadelphia 1
Leg 4 2 Togus 1
Leg 5 3 Bloomington 2
Leg 6 4 Los Angeles 1
Leg 7 5 Dallas
Leg 8 6 Chicago 2
Leg 9 7 Chicago 8
Leg 10 8 Concord 3
Leg 11 9 IN-23-G1-C2
Leg 30 13 82A31053
Leg 31 1 Knoxville 1
Leg 32 10 Leiden 1
Leg 33 11 797-PA-H
Leg 34 12 570-CO-H
Leg 35 1 Amsterdam B-1
Leg 36 1 Amsterdam B-2
Sources of strains: Leg 35 and 36, provided by Dr. R. van Ke
Netherlands ower show. Leg 35 was identied in 28 and Leg 3Dr. B. Fields at the CDC.Leg 29 L. rubrilucens WA-270A-C2
ere derived from the Legionnaires disease outbreak at the 1999out of 29 patients. The rest of the strains were obtained fromOther Legionella species
Leg # Legionella species Isolate
Leg 12 L. dumoi NY-23
Leg 13 L. longbeachae 1 LB-4
Leg 14 L. longbeachae 2 Tucker 1
Leg 15 L. gormanii LS-13
Leg 16 L. micdadei TATLOCK
Leg 17 L. wadsworthii 81-716
Leg 18 L. oakridgensis Oak Ridge-10
Leg 19 L. feeleii 1 WO-44C-C3
Leg 20 L. feeleii 2 691-WI-H
Leg 21 L. sainthelensis Mt.St.Helens-4
Leg 22 L. jordanis BL-54D
Leg 23 L. spiritensis Mt.St.Helens-9
Leg 25 L. jamestowniensis JA-26-G1-E2
Leg 26 L. cherrii ORW
lasmi52 C, 30 s; 72 C, 30 s; 7min at 72 C). The PCRproduct was radiolabeled using random primer
labeling kits RTS RadPrime (Life Technologies) or
Redi-Prime 2 (Amersham) with [a-32P]dATP ac-cording to the manufacturers instructions. EcoRI-digested Southern blots of 15 species of Legionella
other than pneumophila (17 strains in all), and 18
strains of L. pneumophila were probed with each
gene-specic amplimer under standard conditions(overnight hybridization at 65 C in 0.5MNaHPO4, pH 7.2, 7% sodium dodecyl sulfate
(SDS), 1% bovine serum albumin, 1mM ethy-
lenediamine tetraacetic acid, 125 lg/ml shearedsingle-stranded salmon sperm DNA), and then
washed to a relatively low stringency (75mM
NaCl/7.5mM Na3citrate/0.1% SDS, 65 C). In afew cases, hybridization temperature was reduced(45 C) and washing eliminated (just a single300mM NaCl/30mM Na3citrate/0.1%SDS room
temperature rinse).
2.3. PCR and sequencing
The same primer pairs were used to attempt to
amplify genes from various Legionella strains andspecies. When amplication failed, at least one
additional attempt was made with alternative pri-
mer pairs; moreover, in some cases it was neces-
sary to adjust the annealing temperatures for
individual strains or genes. After PCR, oligonu-
cleotides were dephosphorylated and primers de-
graded with shrimp alkaline phosphatase and
exonuclease I, respectively (incubation at 37 C,90min; enzymatic denaturation at 72 C, 15min).The same primers were then used for bidirectional
sequencing. With genes longer than about 500
bases, additional internal oligonucleotides were
designed for priming sequencing reactions. Most
sequencing reactions were done with ABI big dye
terminator kits or Amersham 377 energy transfer
kits, according to the manufacturers instructions,and following isopropanol precipitation, the se-
quencing products were separated on ABI 377 gel
systems (PerkinElmer). Individual sequence reads
were assembled into contigs using the Phrap
assembler (Green P., http://bozeman.mbt.wash-
ington.edu/) or SeqMan (Lasergene System,
130 I. Morozova et al. / PDNASTAR, Madison, WI). The quality of eachbase in each sequence was checked both auto-matically and manually. In a few cases where there
was uncertain base calling even after repeated se-
quencing attempts, these positions were eliminated
from the analyses.
The sequences obtained in this study have been
submitted to GenBank (National Center for
Biotechnology Information, Bethesda, MD). Ac-
cession numbers and the sequences themselves areavailable at http://genome3.cpmc.columbia.edu/~
legion/comp_proj.htm.
2.4. Additional gene sequences
Homologs of the icm/dot genes from C. burnetii
[sequence data were provided by The Institute for
Genome Research (TIGR) under an academic li-cense agreement] and Legionella longbeachae
(GenBank Accession No. gi18693262) were used
as outgroups in L. pneumophila gene analysis.
2.5. Sequence alignment and analysis
The ClustalW (Thompson et al., 1994) program
was selected for aligning nucleotide or translatedamino acid sequences. BioEdit version 5.0.6 (TA
Hall, http://www.mbio.ncsu.edu/BioEdit/bioedit.
html) and GeneDoc programs (H.B. Nicholas Jr,
http://www.psc.edu/biomed/genedoc/) were used to
manipulate the alignments and to build the protein
hydrophilicity proles. The number of nonsynony-
mous (leading to amino acid substitutions) (Kn) and
synonymous (silent) (Ks) nucleotide substitutionswere calculated using the MEGA 2.1 package
(Kumar et al., 2001, http://www.megasoftware .net/).
Both Kn and Ks were calculated per corresponding
site to avoid the inuence of overall gene composi-
tion. Four physico-chemical properties (volume,
polarity, charge, and hydrophobicity) were used to
characterize the results of amino acid substitutions
in comparisons of translated homologous se-quences (Bogardt et al., 1980; Kawashima and
Kanehisa, 2000). Corresponding dG values were
obtained usingMiyatasmatrix (Miyata et al., 1979)andwere calculated per one amino acid substitution
so that they would not depend on the rates of nu-
cleotide substitutions per se. Protein secondary
d 51 (2004) 127147structure predictionwas done using SSpro (Pollastri
1990) and PARACEL BLASTER machine were
used for the searches against the TIGR databases
Philadelphia 1 strain of L. pneumophila. The left
side of the autoradiogram shown in Fig. 1 depicts
some typical results. A table (Supplementary Table
SI) compiling the results for every gene and strain
is available online at http://genome3.cpmc.colum-
bia.edu/~legion/comp_project/comp_proj.htm. Inaddition to 16S rRNA genes, positive signals were
obtained in most species for housekeeping (e.g.,
asd) genes; in contrast, only some icm/dot genes
were detected in the non-pneumophila species (see
right side of autoradiogram in Fig. 1), chiey in
L. longbeachae (Leg14), and to a lesser extent in
Legionella dumo, Legionella wadsworthii, Le-
Fig. 1. Gene distribution in dierent legionellae as scored by
hybridization. Comparison of hybridization results using Phil-
adelphia 1 PCR amplied probes in L. pneumophila and other
Legionella species for 16S rRNA, aspartate b-semialdehydedehydrogenase (asd), one of the lvh and three of the icm genes.
From left to right: strains Leg 717. Presence of multiple bands
for 16S rRNA likely due to multiple copies of the rDNA in the
bacterial species (there are at least three partial or complete loci
in the Philadelphia 1 strain of L. pneumophila based on the
genomic sequence). Variation in banding patterns for other
genes could be due to presence or absence of paralogs, or to
EcoRI restriction site polymorphisms within or near the gene,
in the dierent organisms. In the case of the lvhB4 and other lvh/
lvr genes, variable patterns could reect the fact that they can be
located on a plasmid, as supported by recent data from our
laboratory (not shown).
lasmifor completed bacterial genomes (http://www.tigr.org) and the NCBI nonredundant databases
(http://www.ncbi.nlm.nih.gov).
2.7. Domain search
The SMART server (http://smart.embl-heidel-
berg.de (Letunic et al., 2002)) and PFAM database
(Bateman et al., 2002) were used to search for pro-tein functional domains and coiled-coil structures.
2.8. Phylogenetic analyses
Tree reconstruction and visualization were ac-
complished using the MEGA 2.1 package (Kumar
et al., 2001). The Li distance approach (Li et al.,
1985) was used for building the distance matrix.The neighbor-joining (NJ) tree-building algorithm
(Saitou and Nei, 1987), which builds a branching
tree diagram from the distance matrix by succes-
sively clustering pairs together, was used for phy-
logenetic inference. Condence levels of inferred
relationships were estimated following 1000 boot-
strap iterations. To address uncertainties of tree
branching, the split decomposition method of theprogram SplitsTree (Huson, 1998) was utilized.
Unlike most tree building methods, which force
data into a tree-like phylogeny, this method por-
trays the data in a mesh-like graph allowing con-
icting phylogenetic information to be visualized,
estimated, and compared.
3. Results
3.1. Gene composition of Legionella species
Low stringency hybridization to EcoRI-di-
gested Legionella DNA was carried out using la-et al., 2002), APSSP2 (Raghava, 2000), and PHDprograms (Rost, 1996).
2.6. Homology search
The generic BLAST program (Altschul et al.,
I. Morozova et al. / Pbeled amplied regions of selected genes from thed 51 (2004) 127147 131gionella gormanii, Legionella micdadei, Legionella
feeleii, and Legionella sainthelensis. Intermediateresults were obtained for the lvh/lvr genes (lvrA-E,
lvhB2-11, D4). For the most part, they generated
strong hybridization signals in L. dumo, L.
longbeachae (Tucker 1), L. wadsworthii, L. oak-
ridgensis, and L. cherrii, but weak or no signals in
the remaining species tested. These genes have
previously been shown not to be essential for
growth of L. pneumophila in macrophages (Segalet al., 1999). As an alternative to hybridization,
primer pairs from Philadelphia 1 were used in an
attempt to amplify genes directly from the other
strains and species. The results were usually con-
sistent with those obtained using hybridization;
with few exceptions, if negative results were ob-
imply that dierent strain virulence phenotypes arenot accounted for by simple presence or absence of
these orthologs. Therefore, to determine if more
subtle genetic features were involved, we carried
out a comparative sequence analysis on all these
genes in the dierent L. pneumophila strains as well
as the L. longbeachae icm/dot genes available from
GenBank. (Although we were able to amplify a
few of the icm/dot genes in the non-pneumophilaspecies, the comparative sequencing described be-
low was restricted to the pneumophila strains.)
3.2. Level of interstrain and interspecies variation in
L. pneumophila
gene
(96%
d ave
r dat
ubmi
r dot
ophil
132 I. Morozova et al. / Plasmid 51 (2004) 127147tained using one approach, negative results were
also obtained with the alternative procedure (seesupplementary Table SII at the above URL for all
the PCR results). Still, it is important to realize
that under the stringency conditions we utilized for
hybridization, we would not expect to identify
genes with less than 70% identity at the nucleic
acid level; similarly, at least 90% conservation of
primer sequence would be required for consistently
successful PCR amplication. Thus, an unob-served signal may be due either to true absence of a
gene, or perhaps more likely, substantial variation
in the genes sequence compared to that of Phila-delphia 1.
Among the L. pneumophila strains, high signal
strength was obtained for nearly every gene
(housekeeping, icm/dot, and lvr/lvh). This would
Table 2
Variations among Legionella genes
16S rRNAa Non-icm
DNA
Within L. pneumophila 99.2% 89100%
Between Legionella spp. 9199% (96%) 6999%
With Coxiellad 85.5%
The data in the above table represent averages or ranges (ana Based on our data and that of Adeleke et al. (1996).b Based on information available for 8 non-icm/dot genes, ou
et al. (1998), and Avison and Simm (2002).c Based on our data plus gene sequences for L. longbeachae s
(2002b) have shown a wider range of nucleic acid homology fo
comparisons the L. pneumophila subsp. fraseri and subsp. pneum
d TIGR data compared with the Philadelphia 1 strain of L. pneumThere are now about 48 known Legionella
species (Perez-Luz et al., 2002) and about 15L. pneumophila serogroups, comprising approxi-
mately 70 known serogroups in the genus overall.
Despite detailed analyses, there are complications
in some of the assignments (see review by Benson
and Fields, 1998). Appreciating that taxonomic
positioning cannot always accurately reect evo-
lutionary distance (Rosello-Mora and Amann,
2001), Table 2 summarizes icm and non-icm se-quence diversity based on our data for dierent
strains of L. pneumophila, and in a lesser number
of cases, other Legionellae, as well as published
gene sequence data. The dierences between Le-
gionella species are within or close to the standard
boundaries of speciation (for review, see Rosello-
Mora and Amann, 2001): 95% homology for 16S
sb icm/dot genesc
Protein DNA Protein
) 97.9% 9698% (97%) 94100% (98%)
7599% 6279% (70%) 5891% (74%)
3966% 2363%
rages) of the percent homology for dierent genes.
a plus that of Ratcli et al. (1997), Doyle et al. (1998), Ratcli
tted to GenBank by Rogers et al. (2002) (AF288617). Ko et al.
A within L. pneumophila (78100%), when they include in the
a.ophila.
rRNA and 70% for other genes. As can be seenfrom the table, icm/dot genes have a higher level of
inter-strain diversity (6279% homology, with a
mean value of 70%), than non-icm/dot genes,
though as of today, only L. pneumophila vs
L. longbeachae comparisons for several icm genes
are available.
There is a considerable range of variability for
the dierent icm genes among the L. pneumophilastrains examined both at the nucleotide and pro-
tein levels (Table 3). Some genes have a very low
percentage of variable positions, and even silent
substitutions are rare, while others, such as icmX,
have many polymorphic sites. There were no ma-jor insertions or deletions in the sequenced genes,
though there were a few 1 or 2 amino acid inser-
tions and deletions (e.g., in icmG in Leg7 and
Leg31; icmX in several strains).
The number of synonymous (Ks) and nonsyn-
onymous (Kn) nucleotide substitutions was deter-
mined per corresponding site, and the mean of all
pairwise strain comparisons was calculated foreach gene. The icmX,W, V, and dotA genes, which
are all members of icm/dot region I (the small icm
locus), are quite variable, showing consistently
higher Ks and Kn values (with the exception of Kn
Table 3
Sequence variations in icm/dot and non-icm genes
Gene Number
of strains
sequenced
Gene
length in
Leg 3
% nished Number of
polymorphic sites
Mean pairwise value per
site
Kn/Ksratio
dG per
one aa
changenucl aa Kn Ks
icmF 15 2922 100 278 50 0.005 0.075 0.067 0.923
icmB 16 3030 100 276 16 0.002 0.107 0.019 0.615
icmJ 17 627 99 46 5 0.002 0.105 0.019 1.167
icmD 14 399 99 39 4 0.002 0.092 0.022 0.566
icmC 18 582 100 52 10 0.006 0.111 0.054 0.688
icmG 18 807 99 94 27 0.012 0.131 0.092 1.063
icmK 18 1083 98 141 30 0.008 0.169 0.047 0.373
icmL 18 639 100 57 3 0.001 0.087 0.011 0.633
icmM 17 285 100 20 6 0.008 0.063 0.127 1.433
icmN 15 570 88 59 7 0.004 0.07 0.057 0.120
icmP 14 1131 90 95 12 0.002 0.077 0.026 0.034
icmQ 16 576 98 34 3 0.002 0.079 0.025 0.135
icmR 17 363 100 40 10 0.01 0.083 0.120 1.148
icmS 17 345 100 32 3 0.003 0.164 0.018 1.299
icmT 17 261 100 19 2 0.002 0.097 0.021 0.850
Mean for
the locus
0.005 0.101 0.048 0.736
e sub
one
rogen
I. Morozova et al. / Plasmid 51 (2004) 127147 133icmV 17 456 100 56
icmW 17 456 100 34
icmX 18 1404 99 298
dotA 5 3189 100 557Mean for
the locus
tphA 17 1257 99 159
asd 35 1020 99 29
rpp 17 690 100 74
RNAseH 11 573 100 22
mip 17 699 100 54
Both nonsynonymous (Kn) and synonymous (Ks) nucleotid
inuence of gene composition and dG values are calculated per
remainder. Abbreviations: asdaspartate b-semialdehyde dehydinammatory peptide.*Data from Bumbaugh et al., 2002.19 0.024 0.147 0.163 0.927
3 0.002 0.119 0.017 0.998
78 0.036 0.307 0.117 1.131
139 0.042 0.352 0.118
0.026 0.231 0.104 1.019
37 0.008 0.097 0.082 0.987
3 0.001 0.031 0.032 0.374
11 0.006 0.147 0.040 1.084
7 0.006 0.046 0.130 1.769
4 0.002 0.070 0.025
stitutions are calculated per corresponding site to avoid the
amino acid change. Numbers in bold vary the most from the
ase, rppagellar L-ring protein precursor, mipmacrophage
in the case of icmW), compared to most of thegenes from icm/dot region II. The icmX,V and
dotA genes have Kn values approximately 10 times
higher than most of the rest of the icm/dot genes.
The ratio of nonsynonymous to synonymous
nucleotide substitutions is usually taken as an in-
dicator of the functional and structural restrictions
on gene variability and is independent of the time
of gene diversication. The icm/dot genes show awide distribution in their Kn/Ks ratios, with, for
example, icmV having a ratio nearly 15 times
higher than that of icmL. The highly conserved
genes (icmL, W, S, B, J, T, D, P, and Q) have
lower Kn/Ks ratios than even the very conservative
housekeeping gene encoding aspartate b-semial-dehyde dehydrogenase (asd), which shows as much
as 62% homology even with its relatively distantVibrio cholerae ortholog. In contrast, the most
134 I. Morozova et al. / Plasmivariable genes (icmV, M, R, and X) have Kn/Ksratios close to or even higher than dotA, which
is considered a relatively variable gene (Bumbaugh
et al., 2002).
Not all amino acid substitutions in the geneswith
low Kn/Ks ratios are conservative, as assessed by
changes in amino acid physico-chemical properties,and there are cases of genes with relatively conser-
vative amino acid substitutions that nonetheless
have a high level of gene variability as judged byKn/
Ks ratios (Fig 2). For example, the IcmJ, S sand W
Fig. 2. Comparison of Kn/Ks and dG values for icm/dot and
non-icm/dot genes. Kn/Ks ratios and dG values for the icm/dot
genes shown in order from highest (icmV) to lowest (icmL) Kn/Ks values. Dashed lines correspond to locus II mean values.protein products, despite displaying relatively lowKn/Ks values, have amino acid substitutions that
result in drastic changes in their properties; on the
other hand, three genes (icmN, P, and Q) have
close to locus II average (0.05)Kn/Ks ratios, but their
encoded proteins have extremely low dG values,
indicating that only substitutions in amino acids
with similar physico-chemical properties have been
permitted. Since nucleotide substitutions may exerttheir inuence on the function of the nal protein
product at any of several levels (e.g., DNA, mRNA
or protein), Kn/Ks ratios reect general restrictions
on gene and protein variability. On the other hand,
dG values reect variation purely in protein
structural and functional features, indicating some
restrictions on the amino acid substitutions at
the level of the nal functioning product. In thissense, icmN, P, and Q may be considered the most
conservative of the icm/dot genes.
There is no obvious correlation between the
predicted cell localization of the protein products
of these genes and their variability levels. While
IcmN is thought to be an outer membrane protein
and not necessary for macrophage killing, IcmK is
an indispensable periplasmic or outer membrane,IcmP is an inner membrane, and IcmQ is a soluble
cytoplasmic protein required for pore formation
(Andrews et al., 1998; Coers et al., 2000; Dumenil
and Isberg, 2001; Segal and Shuman, 1998a;
Watarai et al., 2001).
Overall, the levels of sequence variation found
among the non-icm genes in L. pneumophila strains
(last group of genes in Table 3) and most of theicm genes from locus II were comparable to the
level of diversity in, for example, Salmonella ent-
erica housekeeping genes reported by Boyd et al.
(1997) (where the mean nonsynonymous to syn-
onymous nucleotide substitution ratio was 0.032).
The level of polymorphism among icm genes from
locus I (second group in Table 3) and some locus II
members exceeds signicantly that for both Le-gionella and Salmonella housekeeping genes and
most of the genes from icm/dot locus II, and cor-
responds to the variability level for the spaM and
spaN genes of the S. enterica inv-spa pathogen
invasion complex (Boyd et al., 1997).
The order of the icm/dot genes was apparently
d 51 (2004) 127147the same in all 18 strains we examined, as assessed
by our ability to amplify these genes using primersfrom the expected surrounding genes.
3.3. Paralogs of icm/dot genes in Philadelphia strain
of L. pneumophila
It is not unusual to nd distant homologs
among the genes of a single organism. These may
represent members of a gene family that carry outrelated but not identical functions, or they may no
longer have any functional properties in common.
Among the icm/dot genes, four partial homologs
(paralogs) for the 30 part of icmL (134 aa), one forits 50 portion (79 aa), and one for icmC wereidentied in a search of the now essentially com-
plete Philadelphia 1 genome (http://genome3.
180 aa. IcmC and IcmC1 have 40% identity over171 aa.
3.4. Further analysis of individual icm/dot genes
Multiple alignments of the icm/dot genes in all
the L. pneumophila strains under study permitted
more detailed sequence analyses. The sequence
variation patterns at both the nucleotide andamino acid levels, and dG and hydrophilicity
proles along the length of each ORF were de-
termined, as well as potential structural and
functional motifs. In Fig. 3, the distribution of
nucleotide and amino acid substitutions along the
nucleotide and corresponding amino acid se-
quences are compared for all the sequenced icm
s in t
ar ha
I. Morozova et al. / Plasmid 51 (2004) 127147 135cpmc.columbia.edu/~legion/). The icmL paralogsare located in dierent regions of the genome, and
the icmC1 paralog is separated from the locus II
icmC gene by 23 kbps. In each case, the paralogs
are surrounded by genomic housekeeping genes.
The average protein sequence homology between
IcmL and its 30 paralogs is relatively low but clear:31% identity and 52% similarity over an approxi-
mately 120 aa (amino acid) stretch. For compari-son, the L. pneumophila IcmL has 91% amino acid
identity with L. longbeachae IcmL over a 220 aa
stretch; 39% identity to C. burnetii IcmL over 200
aa; and 2530% identity to traM genes (Klebsiella
oxytoca, Pseudomonas syringae, Escherichia coli,
and Salmonella typhimurium plasmids) over 160
Fig. 3. Distribution of nucleotide and amino acid substitution
(bottom halves of each bar) and amino acid substitution (upper bis indicated with a vertical hatchmark.genes in L. pneumophila strains. Apparently, inmany cases, nonsynonymous substitutions (lead-
ing to amino acid changes in encoded proteins) are
distributed unevenly along the sequence. The gene
regions with low or no nonsynonymous substitu-
tions and close to average number of synonymous
substitutions are of special interest since the ob-
served conservatism cannot be explained merely by
too little evolutionary time for the compared se-quences to diverge. These regions, conservative at
the protein level, especially those preserved also in
distant homologous proteins, may correspond to
important protein domains, so where possible,
comparisons were made with distant homologs in
other bacteria in conjunction with the functional
he icm genes among L. pneumophila strains. Every nucleotide
lves) along the icm sequences from all the L.pneumophila strains
motifs predictions. A more detailed description ofsome of the icm genes (icmP, G, N, and K) follows.
3.5. IcmP
IcmP is believed to be an inner membrane
protein, possibly involved in DNA transfer, and
absolutely indispensable for macrophage killing(Segal and Shuman, 1998a). The gene product is
predicted to have a signal peptide (aa 135), trans-
membrane regions (aa 1739 and 92114) and a
trbA domain (aa 204372). trbA is one of the genes
found within the transfer region of IncI1 plasmids
such as R64, and is absolutely required for conjugal
transfer of these plasmids (Furuya and Komano,
1996). Although distant homologs of icmP arefound in Coxiella, Pseudomonas, and Salmonella,
they display a low overall level of sequence simi-
larity (1835% identity at the protein level); only in
the region of the trbA domain slightly increased
homology is found. Nonetheless, all the homologs
have very comparable hydrophilicity proles over
their entire lengths (Fig. 4). Since the gene has not
been allowed to accumulate signicant variableamino acid positions, it is likely to share a closely
related function in these fairly diverse genera.
Taking the 15 L. pneumophila strains as a
group, both synonymous and nonsynonymous
substitutions are distributed evenly along the icmP
Fig. 4. Hydrophilicity proles of IcmP and distant homologs.
RedL. pneumophila IcmP; blueCoxiella IcmP homolog;
greenPseudomonas sp. PyR19 plasmid conjugal-transfer re-
lated sequence SAT (gi 2642198); brownSalmonella typhimu-
rium R64 plasmid trbA gene (gi 20521502).
Fig. 5. Hopp and Woods hydrophilicity proles for IcmG and
its homologs. BlueL.pneumophila IcmG; redTraP of plas-
mid R64 gi 4903119; greenC. burnetii IcmG homolog.
quen
cum m
RE d
romy
136 I. Morozova et al. / Plasmid 51 (2004) 127147Fig. 6. Alignment of t-SNARE domains in assorted proteins. Se
japonicum Blr2548 protein (BAC47813); Clostridium acetobutyli
aeruginosa probable chemotaxis transducer (AE004706)t-SNA
minal end (D21267); SNAP25 N-terminal end (D21267); SacchaThe conservative amino acids are highlighted.ces (from top to bottom): L. pneumophila IcmG; Bradrhizobium
ethyl-accepting chemotaxis protein (AE007559); Pseudomonas
omains predicted by SMART system; human SNAP25 C-ter-
ces cerevisiae SEC9p proteinputative t-SNARE (NP_011523).
Fig. 9. Hydrophilicity prole of IcmK and distant homologs.
RedL. pneumophila; blueL. longbeachae; greenC. burnetii;
brownShigella TraN; blackKlebsiella TraN.
lasmisequence, but the gene appears to be very conser-vative, both at the nucleotide and amino acid
levels, with the lowest dG value of all the icm and
housekeeping proteins sequenced, especially in the
trbA region.
3.6. IcmG
IcmG has also been predicted to be an innermembrane protein; mutation of this gene leads to a
partial reduction in the bacterias ability to killmacrophages (Segal et al., 1998). When Legionella
pneumophila strains are compared, IcmG shows
elevated variability, both at the nucleotide and
protein levels. Variable positions are almost evenly
distributed along the sequence, except in the vi-
cinity of the C- and N-termini that lack evensynonymous substitutions.
Fig. 5 shows hydrophilicity prole comparisons
for icmG inLegionella and two distant homologs,C.
burnetii IcmG and plasmid TraP. Despite relatively
low sequence homology among the three genes (less
than 20% at the protein level), their predicted sec-
ondary structures (not shown) and hydrophilicity
proles display signicant similarity. Preservationof the protein structure in some cases may be more
important for a proteins function than the aminoacid sequence itself, and probably because of this,
structure-based methods of searching for distant
homologs are more ecient than sequence-based
approaches (Pawlowski et al., 2001; Sauder et al.,
2000). Examples of related bacterial proteins with
very low sequence identity but nearly identicalstructures are not uncommon (Bauer et al., 2001;
Ginalski et al., 2000; Girardeau et al., 2000).
For the IcmCTraQ (not shown) and IcmG
TraP comparisons, the protein similarity at these
higher structural levels is indeed stronger than at
the sequence level. Thus, despite sequence dis-
crepancies, the major function of these distant
homologs may remain intact. Local dissimilaritiesof the protein proles, as in the case of IcmG
TraP at positions 165185 (Fig. 5), require addi-
tional analysis. The Legionella and Coxiella IcmG
proteins, unlike their TraP homolog, are predicted
to have a t-SNARE domain precisely in this region
(aa 142210 in the Legionella IcmG protein; aa 95
I. Morozova et al. / P194 in C. burnetii homolog, which correspond topositions 153221 in aligned sequences in Fig. 5)and this similarity extends beyond the coiled-coil
structural features predicted for all three homologs
in this area (positions 123179) (Segal and Shu-
man, 1998b). [Weimbs et al. (1997, 1998) even
screen out coiled-coil features when performing
t-SNARE domain searches.] Proteins with t-
SNARE domains play important roles in mem-
brane fusion in eukaryotes (Weber et al., 1998).While the t-SNARE domains are highly diverse,
they usually possess a central glutamine (Q) resi-
due and preserve the overall domain structure
(Gotte and von Mollard, 1998; Weimbs et al.,
1998). There are only a few bacterial proteins
known to have similarity to the t-SNARE domain
(SMART Accession No. SM0397); most of these
are bacterial sensor and chemotaxis integralmembrane proteins. Several examples of these are
aligned with IcmG in Fig. 6. It will be interesting
to see if the t-SNARE domain is conserved in non-
pneumophila Legionella species with icm/dot loci. If
it is required for IcmG function during infection,
this feature may dierentiate the global function
of the Legionella icm/dot system from that of its
homologs in other organisms.
3.7. IcmN
IcmN is a putative outer membrane lipoprotein,
containing a signal peptide, and is dispensable for
macrophage killing (Segal et al., 1998). The se-
quence is well conserved, especially at the protein
levelamino acid substitutions among L. pneumo-phila strains occur only in the N-terminal half of the
protein, and the alternative amino acids always
have very similar physico-chemical properties (Figs.
2 and 3). An alignment of L. pneumophila and L.
longbeachae sequences also reveals that the C-ter-
minal half (after aa 90) is more conserved than the
N-terminal portion (Fig. 7). Starting at aa 83, the
IcmN protein shows weak homology to the OmpAdomain (Pfam F00691), which is found in bacterial
porin-like integral-membrane proteins and lipo-
proteins, most of which, like IcmN, have a con-
served OmpA domain within the C-terminal half
and a variable N-terminal portion. Some members
of this protein group have antigenic determinants,
d 51 (2004) 127147 137but IcmN does not display obvious hypervariable
. Dot
IcmN
ane pr
lasmiFig. 7. Alignment of IcmN gene product with distant homologs
(top to bottom with NCBI accession numbers): L. pneumophila
hypothetical protein (NP_249524); E. coli putative outer membr
138 I. Morozova et al. / Pregions. The alignment with several distant homo-
logs reveals two extremely conserved motifs:
QGVD at aa 147 and RVEIT at the C-terminus
(boxed in Fig. 7).
3.8. IcmK
The IcmK product is putatively a periplasmic orouter membrane protein, and possesses a secretion
signal peptide (Andrews et al., 1998); the protein is
needed for pore formation (Kirby et al., 1998),
indispensable for macrophage killing, but not
necessary for conjugation (Andrews et al., 1998;
Segal and Shuman, 1998a). It is homologous to the
plasmid traN gene product. According to the Pfam
database, the TraN domain starts at position 62 ofboth the protein alignment (Fig. 8) and the hy-
drophilicity prole for L. pneumophila icmK and
its distant homologs (Fig. 9); the alignment shown
before that point is uncertain owing to very low
homology.
As seen in the alignment, the homology level
between the orthologs is quite low with
lasmiI. Morozova et al. / Phydrophobic portion among distant homologs
suggests that this region is a functionally impor-
tant domain.
3.9. Phylogenetic relationships between strains
based on icm gene sequence
The dot/icm genes were presumably introduced
into the Legionella genomes from a plasmid
(Komano et al., 2000; Segal and Shuman, 1998a),
Fig. 8. Alignment of IcmK and TraN gene products. Dots represen
Sequences from top to bottom: L. pneumophila, L. longbeachae (AF2
ColIb-P9 TraN (BAA75158) (has only 1 aa dierence with Salmonell
oxytoca plasmid pACM1 primase (AF139719).d 51 (2004) 127147 139possibly prior to their separating into two loci. It is
unknown, though, if this was a one-time event or
the region(s) were lost and re-introduced repeat-edly during Legionella evolution. Often when gene
transfer occurs from a distant organism with dif-
ferent nucleotide content, the transferred region is
evident due to its dierent GC content compared
to the rest of the genome. In the case of Legion-
ellas icm/dot loci, their GC content is equivalent tothe genome average (38%). Moreover, the regions
t identical amino acids and dashes are gaps in the alignment.
88617), and C. burnetti IcmK; E. coli (Shigella sonnei) plasmid
a typhimurium IncI1 plasmid R64 TraN, BAB91663); Klebsiella
15 am
lasmiare distinct from those of their homologs in Cox-
iella and the R64 plasmid where most icm/dot gene
homologs are around 44 and 50% GC, respec-
tively. It is possible that the transfer occurred froma dierent plasmid with similar GC content to that
of Legionella. Based on the dierences between
phylogenetic trees built for mip and dotA (Bum-
baugh et al., 2002) and dotA and rpoB genes (Ko
et al., 2002a,b), it has been suggested that repeated
Fig. 10. icmK variability proles. A window size of
140 I. Morozova et al. / Pevents of genetic exchange or loss and acquisition
led to the current complex composition of these
loci.To determine if the rates of molecular evolution
of icm genes are disparate in dierent L. pneu-
mophila strains, a comparison of icm genes from
all available strains was undertaken, using their
C. burnetii orthologs as outgroups (Sexton and
Vogel, 2002). The distances in synonymous and
nonsynonymous substitutions per corresponding
site were analyzed separately, as was done byWhittam and Bumbaugh (2002). All analyzed
genes from all the L. pneumophila strains showed
approximately equal relative substitution rates
(data not shown). It is probable, though, that
minor dierences were missed, using such distant
homologs from Coxiella. In the future, when more
of the closer homologs, e.g., icm/dot genes from
other Legionella species, are available, it should bepossible to obtain a ner resolution.A detailed phylogenetic analysis was carried
out. Phylogenetic trees were built for 18 icm genes,
3 housekeeping genes, and the icmB/tphA inter-
genic region as well as combined trees built foricm locus subregions (i.e., concatenated icm genes
from extensive portions of the two loci or the en-
tire loci). The presented trees were built by two
methods: NJ, with 1000 bootstrap iterations to
estimate condence level for the tree topology, and
ino acids or codons was used. See text for details.
d 51 (2004) 127147the split decomposition method which displays
branching alternatives in a single representation.
While trees were built for each gene and severalicm/dot subregions, only some representative ex-
amples are included in Fig. 11.
Based on the combined phylogenetic trees, the
strains consistently group into seven subsets: [Leg
5, 1, 9], [6, 11, 32], [{36, 10}, {30, 35}], [3, 4, 8, 34],
[2, 33], and [7, 31], though the separation between
groups {36, 10} and {30, 35} is less consistent (cf.
Figs. 11A and C). This clustering is almost iden-tical, with a few exceptions, for the icm genes of
the two loci, houskeeping genes and the icmB/tphA
intergenic region and is supported by high boot-
strap values on almost all of the trees. Exceptions
to this clustering were most frequently found with
the Leg6 strain, which, for 6 icm and 3 house-
keeping genes, merges with the (5, 1, 9) group (see
for example the tree for icmK in Fig. 11D). Itappears that in the case of trees built for genes of
lasmiI. Morozova et al. / Pthe small locus (icmV, W, and X) and those at one
end of the large icm locus (icmF, tphA, icmB, J, D,
C, and G), Leg 6 belongs to the (11, 32) group,
whereas based on trees built for many of the genes
at the other end of locus II (icmK, L, N, R, S, and
T), this strain falls into the (5, 1, 9) group. In onlya very few cases was the clustering violated by
other strains (e.g., Leg 30 and 35 are in separate
branches in icmV, W, X, F, and B individual gene
trees).
Despite the largely consistent strain clustering,
the relationship between clusters is not as clear,
that is, the groups as a whole can switch their
relative positions in dierent trees and sometimescannot be positioned unambiguously (for example,
see Fig. 11D). In many cases, these cluster re-
Fig. 11. Phylogenetic trees. The gene sets do not include dotA,B,C, i
Combined NJ tree for all icm genes. (B) Aligned NJ trees for the two i
the text. Left: locus II (all locus II icm genes except icmF). Right: locus
for all icm genes. This gure shows the strain clustering, emphasizing t
of the strains. (D) Split decomposition tree for IcmK, demonstrating th
alternative branching.d 51 (2004) 127147 141locations have low bootstrap values, making it
dicult to judge whether they correspond to ac-
tual gene transfer or to recombination events.
In all the trees Leg 7 and 31 constitute a sepa-
rate group, so distant from the remaining strain
clusters that it almost has the appearance of anoutgroup. But when strains 7 and 31 are consid-
ered independently, they seem to be almost as
distant from each other as from the remaining
strains (Fig. 11C). Thus, they probably do not
form an actual group, but are merely the two most
divergent strains of L. pneumophila examined. It
was previously shown that L. pneumophila strain
Dallas, serogroup 5, which corresponds to our Leg7 strain, belongs to L. pneumophila subspecies
fraseri (Brenner et al., 1988) and that the dotA and
cmO or icmE. Numbers at the nodes are bootstrap values. (A)
cm loci. Notable dierences in the tree topologies are detailed in
I (icmX,W, and V only). (C) Combined split decomposition tree
he divergence of Leg 7 and 31 from each other and from the rest
e complicated picture of group branching. Rectangles represent
investigators have shown that among nine strains
of L. pneumophila, eight from serogroup 1 in-
lasmicluding three commonly used in laboratory studies
(AA100, JR32, and Lp01), the presence or absence
of two loci involved in Type IV secretion (traI andlvh) and the rtxA locus, may correlate to some
extent with the strains pathogenicities (Samrak-andi et al., 2002). More specically, the lvh and
rtxA loci were found more commonly in strains
generally associated with disease, whereas the traI
locus was not. These authors also were able
to detect and discriminate these genes by hybridi-
zation in some non-pneumophila species. Morerecently, dissection of an expanded locus sur-
rounding a set of the so-called tra/trb genes, pre-
sumably involved in pilus assembly, distinct from
the traI locus of the AA100 strain, as well as from
the icm/dot and lvr/lvh loci, revealed it to be a likely
pathogenicity island, containing additional genesmip genes from this strain were most distant fromtheir homologs in other L. pneumophila strains
(Bumbaugh et al., 2002).
The observed strain clustering does not correlate
with serogroups. Thus, while both Leg 2 and 33
belong to serogroup 11 and also to one cluster, none
of the ve strains of serogroup 1 for which we have
sequences (Leg 1, 3, 31, 35, and 36), group together.
Trees built for the locus I icm genes vary themost from the locus II genes (compare the two
combined trees in Fig. 11B, left and right). The
initial trees were aligned by rotating branches
around internal nodes, while preserving the
branching pattern, to accentuate the dierences
between the two resulting topologies. Branches
corresponding to strains Leg 5, 35, 1, and 6 could
not be aligned.
4. Discussion
The icm/dot gene loci are present in each of the
L. pneumophila serogroups and strains we se-
quenced. Moreover, based on our ability to am-
plify and sequence across genes of interest usingprimers in expected surrounding adjacent genes, it
appears that gene order within these clusters is also
retained within the L. pneumophila strains. Other
142 I. Morozova et al. / Pfor putative virulence factors such as methioninesulfoxide reductases, as well as plasmid mobilityelements; while present in Philadelphia 1-derived
strains, it appears to be missing in part or in its
entirety from JR32 and several clinical isolates
(Brassinga et al., 2003). Interestingly, this locus
contains paralogs of the lvrA, B, and C genes of
the lvr/lvh Type IV secretion locus. Perhaps the
most intriguing nding was the presence of a 30 kb
unstable genetic element in strain Olda but not inPhiladelphia 1 strains, possibly phage derived, in-
volved in phase variation (Luneberg et al., 2001).
When integrated into the chromosome, the strain
is virulent, but when excised and replicating as a
high-copy plasmid, it resultes in a mutant pheno-
type with a modied lipopolysaccharide O-antigen
epitope associated with reduced virulence.
At this point, we have insucient evidence todetermine if the icm/dot genes are absent or present
in most other Legionella species, with the excep-
tion of L. longbeachae where good hybridization
signals were obtained for icm C, D, G, K, L,M, O,
P, and T; weak signals with J, Q, R, S, V, and X;
and no signal for icm B, E, and F. Six L. long-
beachae icm/dot genes from the center of locus II
have been submitted to GenBank by T. Rogers, S.List, R.M. Doyle, and M.W. Heuzenroeder. In
cases where we do not obtain positive hybridiza-
tion signals using L. pneumophila probes, it is
probable that the orthologs are too dissimilar
in their sequence, at least in the region between
where the primers were designed, to be detected by
even the reduced stringency hybridization or am-
plication used in this study. Their characteriza-tion thus awaits large-scale sequencing of other
species, or the use of degenerate oligonucleotide-
based PCR. Terry Alli et al. (2003) recently re-
ported the presence of the icm/dot loci in every
Legionella species they examined based on hy-
bridization, even under high stringency conditions,
using pooled regional probes. While we did get
weak signals with many icm/dot genes in non-pneumophila species similar to the ones they dis-
played in their paper, we are unable to explain the
several cases of disagreement, except that we used
single gene probes which might have been too
species-specic. Since we probed the same blots
subsequently with several other probes for 16S
d 51 (2004) 127147rRNA, housekeeping or lvh/lvr genes and obtained
lasmiexcellent signals, the absence of hybridization withthose icm/dot gene probes can not be due to the
quality of the DNA itself.
The two icm/dot clusters may have been subject
to substantial changes in the course of their intra-
species evolution. Given that the icm/dot loci are
present in all the L. pneumophila strains from the
15 dierent serogroups we examined and the fact
that these strains have a 100-fold range in theirability to replicate within macrophages (data not
shown), it might be expected that the strains dif-ferences in virulence depend on sequence varia-
tions within the genes, especially in functionally
important gene and protein regions, such as those
responsible for ecient transport of eector mol-
ecules. Of course, it is also possible that altered
regulation of these genes (when and where they areexpressed), or in the eector molecules themselves,
can contribute to the pathogenic phenotype. It is
worth noting that even though the entire icm gene
set is present in the Coxiella genome (Seshadri
et al., 2003), its lifestyle is very dierent from that
of Legionella. In particular, Coxiella does not seem
to depend on the disruption of phagosomelyso-
some fusion for its survival, which is considered tobe the main function of the icm/dot system in Le-
gionella. In the current study, we assessed the level
of diversity among genes of the dot/icm loci, fo-
cusing on the putative functional domains that are
preserved even in distant homologs.
The dot/icm genes display a wide range of
variability, some being more conservative than
an average houskeeping gene (icmP, Q, D, T, J,B, S, W, and L), while others are 510 times
more variable (icmM, R, V, X, and dotA), as
indicated by the ratio of nonsynonymous and
synonymous nucleotide substitutions. Low vari-
ability at the sequence level, though, does not
necessarily mean that all the observed amino
acid substitutions are conservative with regard to
their physico-chemical properties. For example,it appears that IcmT, J, S, and W proteins are
permitted rather dramatic amino acid substitu-
tions. In contrast, IcmN, P, and Q are extremely
conservative at this level, but not as much at the
sequence level. In general, genes from locus I
show higher diversity compared to locus II, both
I. Morozova et al. / Pat the gene and protein levels.A second category of intra-species variation ispositional, with some portions of the genes and
their products more dissimilar than others. For
instance, the IcmK and IcmV proteins have many
more amino acid substitutions in their N-terminal
than their C-terminal portions. Most variation at
the amino acid level is found at the ends of IcmP,
but centrally in IcmG. At the same time, the silent
nucleotide changes are often distributed evenlyalong the gene indicating that the preservation of
amino acid sequence in some regions is not simply
due to time of gene divergence, but rather to the
presence of important functional domainsespe-
cially when the sequence, or at least the protein
structure, is preserved in distant orthologs. It is
interesting in this regard that remote homology
detection by structural methods has helped pre-dict the function of many otherwise uncharacter-
ized proteins in several sequenced genomes
(Pawlowski et al., 1999, 2001; Rychlewski et al.,
1998).
For some icm/dot genes (icmP, G, N, and K) the
combination of relatively high regional sequence
conservatism and the presence of predicted do-
mains and sequence and/or structure preservationin distant homologs in the same areas serve as
indicators of the presence of a functional domain,
though they await experimental proof. Features
such as the t-SNARE-like domain in IcmG and its
Coxiella homolog, occur rarely enough in bacterial
genes as to make them noteworthy. If the
t-SNARE domain is functional in IcmG, it may
compete with the hosts membrane fusion SNAREsystem, potentially altering its normal vesicular
tracking pathways, and preventing phagosome
lysosome fusion, for the bacterias own ends. Thusthese ndings may provide the impetus for future
experimental studies to more directly determine
the function of these proteins.
Phylogenetic analysis for individual genes as
well as locus subregions largely reveal similarstrain groupings, as in Fig. 11C. However, some
branches either switch their positions on dierent
trees or cannot be unambiguously positioned.
Though it is tempting to speculate that these rep-
resent instances of lateral transfer within the locus,
it is not possible to determine this with any
d 51 (2004) 127147 143certainty.
lasmiNot only are the locus I genes more variablethan most of locus II, but interestingly, genes of
the smaller locus (icmW, V, X, and dotA) have
accumulated more silent nucleotide substitutions
per site (Ks values) than most of those from locus
II. If both loci were acquired, probably from a
plasmid, at the same time, this may mean that
locus I is evolving at a higher rate. Alternatively,
under the assumption that the evolutionary rateshave been the same and unchanged for both loci,
genes from the smaller locus must be older than
most of those in the large icm cluster. This, taken
with the fact that the most disparate branching
patterns are observed when either individual or
combined trees for icm/dot locus I vs locus II are
compared, leads to the assumption that the icm/
dot region has a rather complex history of geneacquisition and rearrangment events. In Coxiella
all the icm genes are located next to each other
whereas in L. pneumophila they are split into two
icm/dot loci that are located on opposite sides of
the circular genome (http://genome3.cpmc.colum-
bia.edu/~legion/index.html). This may serve as an
additional indication that two loci in Legionella
were acquired separately or rearranged after-wards.
So far, full icm/dot gene sets have only been
found in two relatively close species (Legionella
and Coxiella), and this system diers substantially
from the known Type IV systems. Nonetheless,
given the presence of limited but obvious homol-
ogy of most icm/dot genes from both loci and tra/
trb genes, it is possible to suggest that they mayhave derived from the same ancestor. This ances-
tor may be of plasmid origin or assembled from
various chromosomal components in ancestral
bacteria; in the latter case, these genes may sub-
sequently have been incorporated into a plasmid,
support for which would come from the fact that
many dierent bacteria possess tra-like genes (e.g.,
Type IV secretion systems).Other researchers have also pointed out that
the icm/dot region may have a complicated evo-
lutionary history in L. pneumophila. Bumbaugh
et al. (2002) compared dotA and mip (a 24 kDa
surface protein with peptidyl-prolyl-cis/trans
isomerase activity that may be involved in es-
144 I. Morozova et al. / Ptablishment of infections, but not intracellularsurvival (Cianciotto et al., 1990), in 17 clinicaland environmental isolates. Compared to mip,
DotA, a cytoplasmic membrane spanning protein,
was extremely and perhaps unexpectedly variable,
and the neighbor-joining trees produced for the
two genes were discordant at several branch
points with high bootstrap values. The authors
considered this an indication of lateral gene
transfer and recombination and relatively recentgene dispersal. Ko et al. (2002b) compared the
dotA and rpoB alleles in 79 Korean isolates of L.
pneumophila from six clonal populations. The
most parsimonious tree produced using rpoB
distinguished four closely related L. pneumophila
pneumophila subspecies and two closely related L.
pneumophila fraseri subspecies. In contrast, in the
case of dotA, one of the pneumophila subspeciesseemed more closely related to the fraseri sub-
species than to the other three pneumophila. Some
caution should be exercised, however, in that
these authors previously showed that the rpoB
trees, themselves, diered substantially from 16S
rRNA and mip trees, which was the basis for
distinguishing the six clonal populations (Ko
et al., 2002a). Our comparisons, taking intoconsideration nearly all the members of the icm
dot loci, may point out additional subpopula-
tions, especially for those genes showing sub-
stantial variation.
In the future, comparisons with icm and lvh
plasmid gene orthologs may be especially inter-
esting. Since the lvh/lvr locus is likely to have been
inherited as a plasmid unit, as we discovered dur-ing the sequencing of the Philadelphia 1 genome
(manuscript in preparation), with a substantially
higher GC content (43%) than the rest of the ge-
nome (Segal et al., 1999), we intend to compare its
history with that of the icm/dot islands, which have
only some of the classic features of pathogenicity
islands (apparent absence of essential genes, all-or-
none presence of the complete gene set), but notothers (GC content the same as the remainder of
the genome, separation into two subsets). The
separate tra/trb locus also appears to be a patho-
genicity island, the central core of which has an
elevated GC content (Brassinga et al., 2003), and is
thus another good candidate for such comparative
d 51 (2004) 127147sequence analysis.
base. Nucleic Acids Res. 30, 276280.
Bauer, F., Schweimer, K., Kluver, E., Conejo-Garcia, J.-R.,
I. Morozova et al. / Plasmid 51 (2004) 127147 145Forssmann, W.-G., Rosch, P., Adermann, K., Sticht, H.,
2001. Structure determination of human and murine b-defensins reveals structural conservation in the absence of
signicant sequence similarity. Protein Sci. 10, 24702479.
Benson, R., Fields, B., 1998. Classication of the genus
Legionella. Semin. Respir. Infect. 13, 9099.
Berger, K.H., Isberg, R.R., 1993. Two distinct defects in
intracellular growth complemented by a single genetic locus
in Legionella pneumophila. Mol. Microbiol. 7, 719.
Bogardt, R.A., Jones, B.N., Dwulet, F.E., Garner, W.H.,
Lehman, L.D., Gurd, F.R., 1980. Evolution of the amino
acid substitution in the mammalian myoglobin gene. J. Mol.
Evol. 15, 197218.
Boyd, E.F., Li, J., Ochman, H., Selander, R.K., 1997. Com-
parative genetics of the inv-spa invasion gene complex of
Salmonella enterica. J. Bacteriol. 179, 19851991, id: 0021-Acknowledgments
Strains Leg 1Leg 34 were kindly provided by
Dr. Barry Fields at the CDC; Leg 35 and Leg 36,
specimens from an outbreak at a Dutch owershow, were a generous gift from Dr. Ruud van
Ketel at the University of Amsterdam. We thank
Huitao Sheng for assistance in sequence submis-
sion and Dr. Pavel Morozov for helpful comments
throughout the course of this work. This work was
supported by NIH Grant U01 1 AI 44371 awarded
to J.J.R., and funds generously provided by the
Columbia Genome Center.
References
Adeleke, A., Pruckler, J., Benson, R., Rowbotham, T., Hala-
blab, M., Fields, B., 1996. Legionella-like amebal patho-
gensphylogenetic status and possible role in respiratory
disease. Emerg. Infect Dis. 2, 225230.
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman,
D.J., 1990. Basic local alignment search tool. J. Mol. Biol.
215, 403410, doi: 10.1006/jmbi.1990.9999.
Andrews, H.L., Vogel, J.P., Isberg, R.R., 1998. Identication of
linked Legionella pneumophila genes essential for intracellu-
lar growth and evasion of the endocytic pathway. Infect.
Immun. 66, 950958, id: 0019-9567/98/$04.00+0.
Avison, M.B., Simm, A.M., 2002. Sequence and genome
context analysis of a new molecular class D b-lactamasegene from Legionella pneumophila. J. Antimicrob. Chemo-
ther. 50, 331338, doi: 10.1093/jac/dkf135.
Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L.,
Eddy, S.R., Griths-Jones, S., Howe, K.L., Marshall, M.,
Sonnhammer, E.L., 2002. The Pfam protein families data-9193/97/$04.00+0.Brassinga, A.K.C., Hiltz, M.F., Sisson, G.R., Morash, M.G.,
Hill, N., Garduno, E., Edelstein, P.H., Garduno, R.A.,
Homan, P.S., 2003. A 65-kilobase pathogenicity island is
unique to Philadelphia-1 strains. J. Bacteriol. 185, 4630
4637, doi: 10.1128/JB185.15.4630-4637.2003.
Brenner, D.J., Steigerwalt, A.G., Epple, P., Bibb, W.F.,
McKinney, R.M., Starnes, R.W., Colville, J.M., Selander,
R.K., Edelstein, P.H., Moss, C.W., 1988. Legionella pneu-
mophila serogroup lansing 3 isolated from a patient with
fatal pneumonia, and descriptions of L. pneumophila subsp.
pneumophila subsp. nov., L. pneumophila subsp. fraseri subsp.
nov., and L. pneumophila subsp. pascullei subsp. nov. J. Clin.
Microbiol. 26, 16951703.
Bumbaugh, A.C., McGraw, E.A., Page, K.L., Selander, R.K.,
Whittam, T.S., 2002. Sequence polymorphism of dotA and
mip alleles mediating invasion and intracellular replication
of Legionella pneumophila. Curr. Microbiol. 44, 314322,
doi: 10.1007/s0024-01-0024-6.
Christie, P.J., 2001. Type IV secretion: intercellular transfer of
macromolecules by systems ancestrally related to conjuga-
tion machines. Mol. Microbiol. 40, 294305, doi: 10.1046/
j.1365-2958.
Cianciotto, N.P., Eisenstein, B.I., Mody, C.H., Engleberg,
N.C., 1990. A mutation in the mip gene results in an
attenuation of Legionella pneumophila virulence. J. Infect.
Dis. 162, 121126.
Coers, J., Kagan, J.C., Matthews, M., Nagai, H., Zuckman,
D.M., Roy, C.R., 2000. Identication of Icm protein
complexes that play distinct roles in the biogenesis of an
organelle permissive for Legionella pneumophila intracellular
growth. Mol. Microbiol. 38, 719736, doi: 10.1046/j.1365-
2958.2000.02176.x.
Doyle, R.M., Steele, T.W., McLennan, A.M., Parkinson, I.H.,
Manning, P.A., Heuzenroeder, M.W., 1998. Sequence
analysis of the mip gene of the soilborne pathogen Legion-
ella longbeachae. Infect. Immun. 66, 14921499, id: 0019-
9567/98/$04.00+0.
Dumenil, G., Isberg, R., 2001. The Legionella pneumophila
IcmR protein exhibits chaperone activity for IcmQ by
preventing its participation in high-molecular-weight com-
plexes. Mol. Microbiol. 40, 11131127, doi: 10.1046/j.1365-
2958.2001.02454.x.
Fields, B.S., Benson, R.F., Besser, R.E., 2002. Legionella and
Legionnaires disease: 25 years of investigation. Clin.Microbiol. Rev. 15, 506526, doi: 10.1128/CMR.15.3.506-
526.2002.
Fraser, D.W., Tsai, T.R., Orenstein, W., Parkin, W.E.,
Beecham, H.J., Sharrar, R.G., Harris, J., Mallison, G.F.,
Martin, S.M., McDade, J.E., Shepard, C.C., Brachman,
P.S., 1977. Legionnaires disease: description of anepidemic of pneumonia. N. Engl. J. Med. 297, 1189
1197.
Furuya, N., Komano, T., 1996. Nucleotide sequence and
characterization of the trbABC region of the IncI1 plasmid
R64: existence of the pnd gene for plasmid maintenance
within the transfer region. J. Bacteriol. 178, 14911497, id:0021-9193/96/$04.00+0.
146 I. Morozova et al. / Plasmid 51 (2004) 127147Ginalski, K., Venclovas, C., Lesyng, B., Fidelis, K., 2000.
Structure-based sequence alignment for the beta-trefoil
subdomain of the clostridial neurotoxin family provides
residue level information about the putative ganglioside
binding site. FEBS Lett. 482, 119124, doi: 10.1016/S0014-
5793(00)01954-2.
Girardeau, J.P., Bertin, Y., Callebaut, I., 2000. Conserved
structural features in class i major mbrial subunits (Pilin)
in gram-negative bacteria. Molecular basis of classication
in seven subfamilies and identication of intrasubfamily
sequence signature motifs which might be implicated in
quaternary structure. J. Mol. Evol. 50, 424442, ISSN:
0022-2844.
Gotte, M., von Mollard, G.F., 1998. A new beat for the
SNARE drum. Trends Cell. Biol. 8, 215218, doi: 10.1016/
S0962-8924(98)01272-0.
Helbig, J.H., Bernander, S., Castellani Pastoris, M., Etienne, J.,
Gaia, V., Lauwers, S., Lindsay, D., Luck, P.C., Marques,
T., Mentula, S., Peeters, M.F., Pelaz, C., Struelens, M.,
Uldum, S.A., Wewalka, G., Harrison, T.G., 2002. Pan-
European study on culture-proven legionnaires disease:distribution of Legionella pneumophila serogroups and
monoclonal subgroups. Eur. J. Clin. Microbiol. Infect Dis.
21, 710716, doi:10.1007/s10096-002-0820-3.
Huson, D., 1998. SplitsTree: analyzing and visualizing evolu-
tionary data. Bioinformatics 14, 6873.
Kawashima, S., Kanehisa, M., 2000. AAIndex: amino acid
index database. Nucleic Acids Res. 28, 374.
Kirby, J.E., Vogel, J.P., Andrews, H.L., Isberg, R.R., 1998.
Evidence for pore-forming ability by Legionella pneu-
mophila. Mol. Microbiol. 27, 323336, doi: 10.1046/j.1365-
2958.1998.00680.x.
Ko, K.S., Lee, H.K., Park, M.Y., Lee, K.-H., Yun, Y.-J., Woo,
S.-Y., Miyamoto, H., Kook, Y.-H., 2002a. Application of
RNA polymerase beta-subunit gene (rpoB) sequences for
the molecular dierentiation of Legionella species. J. Clin.
Microbiol. 40, 26532658, doi: 10.1128/JCM.40.7.2653-
2658.2002.
Ko, K.S., Lee, H.K., Park, M.-Y., Park, M.-S., Lee, K.-H.,
Woo, S.-Y., Yun, Y.-J., Kook, Y.-H., 2002b. Population
genetic structure of Legionella pneumophila inferred from
rna polymerase gene (rpoB) and DotA gene (dotA) se-
quences. J. Bacteriol. 184, 21232130, doi: 10.1128/
JB.184.8.2123-2130.2002.
Komano, T., Yoshida, S., Narahara, K., Furuya, N., 2000. The
transfer region of IncI1 plasmid R64: similarities
between R64 tra and Legionella icm/dot genes. Mol.
Microbiol. 35, 13481359, doi: 10.1046/j.1365-2958.2000.
01769.x.
Kumar, S., Tamura, K., Jakobsen, I.B., Nei, M., 2001.
MEGA2: molecular evolutionary genetics analysis software.
Bioinformatics 17, 12441245.
Letunic, I., Goodstadt, L., Dickens, N.J., Doerks, T., Schultz,
J., Mott, R., Ciccarelli, F., Copley, R.R., Ponting, C.P.,
Bork, P., 2002. Recent improvements to the SMART
domain-based sequence annotation resource. Nucleic AcidsRes. 30, 242244.Li, W.H., Wu, C.I., Luo, C.C., 1985. A new method for
estimating synonymous and nonsynonymous rates of nu-
cleotide substitution considering the relative likelihood of
nucleotide and codon changes. Mol. Biol. Evol. 2, 150174,
id: 0737-4038/85/0202-0201$02.00.
Luneberg, E., Mayer, B., Daryab, N., Koolstra, O., Zahringer,
U., Rohde, M., Swanson, J., Frosch, M., 2001. Chromo-
somal insertion and excision of a 30 kb unstable genetic
element is responsible for phase variation of lipopolysac-
charide and other virulence determinants in Legionella
pneumophila. Mol. Microbiol. 39, 12591271, doi: 10.1046/
j.1365-2958.2001.02314.x.
Miyata, T., Miyazawa, S., Yasunaga, T., 1979. Two types of
amino acid substitutions in protein evolution. J. Mol. Evol.
12, 219236.
Nagai, H., Kagan, J.C., Zhu, X., Kahn, R.A., Roy, C.R., 2002.
A bacterial guanine nucleotide exchange factor activates
ARF on Legionella phagosomes. Science 295, 679
682.
Pawlowski, K., Rychlewski, L., Zhang, B., Godzik, A., 2001.
Fold predictions for bacterial genomes. J. Struct. Biol. 134,
219231, doi: 10.1006/jsbi.2001.4394.
Pawlowski, K., Zhang, B., Rychlewski, L., Godzik, A., 1999.
The Helicobacter pylori genome: from sequence analysis to
structural and functional predictions. Proteins: Struct.,
Funct., Genet. 36, 2030, 3.0.CO;2-X" locator-type-
"doi">doi: 10.1002/(SICI)1097-0134(19990701)36.13.0.CO;2-X.
Perez-Luz, S., Fernandez, J., Rodriguez-Valera, F., Pascual, L.,
Moreno, C., Amo, A., Apraiz, D., Catalan, V., 2002.
Sequence diversity of the internal transcribed spacer (its)
region of the rRNA operons among dierent serogroups of
Legionella pneumophila isolates. Syst. Appl. Microbiol. 25,
212219, doi:10.1078/072320202320386370.
Pollastri, G., Przybylski, D., Rost, B., Baldi, P., 2002. Improv-
ing the prediction of protein secondary structure in three
and eight classes using recurrent neural networks and
proles. Proteins 47, 228235, online ISSN: 1097-0134;
print ISSN:0887-3585.
Raghava, G.P.S., 2000. Protein secondary structure prediction
using nearest neighbor and neural network approach. CASP
4, 7576.
Ratcli, R., Donnellan, S.C., Lanser, J.A., Manning, P.A.,
Heuzenroeder, M.W., 1997. Interspecies sequence dier-
ences in the Mip protein from the genus Legionella:
implications for function and evolutionary relatedness.
Mol. Microbiol. 25, 11491158.
Ratcli, R.M., Lanser, J.A., Manning, P.A., Heuzenroeder,
M.W., 1998. Sequence-based classication scheme for the
genus Legionella targeting the mip gene. J. Clin. Microbiol.
36, 15601567, id: 0095-1137/98/$04.00+0.
Rosello-Mora, R., Amann, R., 2001. The species concept for
prokaryotes. FEMS Microbiol. Lett. 25, 3967, doi:
10.1016/S0168-6445(00)00040-1.
Rost, B., 1996. PHD: predicting one-dimensional protein
structure by prole based neural networks. Methods Enz-ymol. 266, 525539.
Rychlewski, L., Zhang, B., Godzik, A., 1998. Fold and function
predictions for Mycoplasma genitalium proteins. Fold Des.
3, 229238, ISSN: 1359-0278.
Sadosky, A., Wiater, L.A., Shuman, H.A., 1993. Identication
of Legionella pneumophila genes required for growth within
pathogen Coxiella burnetii. Proc. Natl. Acad. Sci. USA
100, 54555460, doi 10.1073.
Sexton, J.A., Vogel, J.P., 2002. Type IVB secretion by
intracellular pathogens. Trac 3, 178185, doi: 10.1034/
j.1600-0854.2002.030303.x.
I. Morozova et al. / Plasmid 51 (2004) 127147 147and killing of human macrophages. Infect. Immun. 61,
53615373.
Saitou, N., Nei, M., 1987. The Neighbor-Joining Method: a
new method for reconstructing phylogenetic trees. Mol.
Biol. Evol. 4, 406425, id: 0737-4038/87/0.
Samrakandi, M.M., Cirillo, S.L.G., Ridenour, D.A., Bermu-
dez, L.E., Cirillo, J.D., 2002. Genetic and phenotypic
dierences between Legionella pneumophila strains. J. Clin.
Microbiol. 40, 13521362, doi: 10.1128/JCM.40.4.1352-
1362.2002.
Sauder, J.M., Arthur, J.W., Dunbrack Jr., R.L., 2000.
Large-scale comparison of protein sequence alignment
algorithms with structure alignments. Proteins: Struct.,
Funct., Genet. 40, 622, online ISSN:1097-0134, print
ISSN:0887-3585.
Segal, G., Shuman, H.A., 1997. Characterization of a new
region required for macrophage killing by Legionella
pneumophila. Infect. Immun. 65, 50575066, id: 0019-9567/
$04.00+0.
Segal, G., Shuman, H.A., 1998a. Intracellular multiplication
and human macrophage killing by Legionella pneumophila
are inhibited by conjugal components of IncQ plasmid
RSF1010. Mol. Microbiol. 30, 197208.
Segal, G., Shuman, H.A., 1998b. How is the intracellular fate of
the Legionella pneumophila phagosome determined. Trends
Microbiol. 6, 253255, doi: 10.1016/S0966-842X(98)01308-0.
Segal, G., Shuman, H.A., 1999. Possible origin of the Legionella
pneumophila virulence genes and their relation to Coxiella
burnetii. Mol. Microbiol. 33, 669670, doi: 10.1046/j.1365-
2958.1999.01511.x.
Segal, G., Purcell, M., Shuman, H.A., 1998. Host cell
killing and bacterial conjugation require overlapping sets
of genes within a 22-kb region of the Legionella
pneumophila genome. Proc. Natl. Acad. Sci. USA 95,
16691674.
Segal, G., Russo, J.J., Shuman, H.A., 1999. Relationships
between a new type iv secretion system and the icm/dot
virulence system of Legionella pneumophila. Mol. Microbiol.
34, 799809, doi: 10.1046/j.1365-2958.1999.01642.x.
Seshadri, R., Paulsen, I.T., Eisen, J.A., Read, T.D., Nelson,
K.E., Nelson, W.C., Ward, N.L., Tettelin, H., Davidsen,
T.M., Beanan, M.J., Deboy, R.T., Daugherty, S.C.,
Brinkac, L.M., Madupu, R., Dodson, R.J., Khouri, H.M.,
Lee, K.H., Carty, H.A., Scanlan, D., Heinzen, R.A.,
Thompson, H.A., Samuel, J.E., Fraser, C.M., Heidelberg,
J.F, 2003. Complete genome sequence of the Q-feverSwanson, M.S., Hammer, B.K., 2000. Legionella pneumophila
pathogenesis: a fateful journey from amoebae to macro-
phages. Annu. Rev. Microbiol. 54, 567613.
Terry Alli, O.A., Zink, S., von Lackum, N.K., Abu-Kwaik, Y.,
2003. Comparative assessment of virulence traits in Legion-
ella spp. Microbiology 149, 631641, doi: 10.1099/
mic.0.25980-0.
Thompson, J.D., Higgins, D.G., Gilbson, T.J., 1994. CLUS-
TAL W: improving the sensitivity of progressive multiple
sequence alignment through sequence weighting, position-
specic gap penalties and weight matrix choice. Nucleic
Acids Res. 22, 46734680.
Vogel, J.P., Andrews, H.L., Wong, S.K., Isberg, R.R., 1998.
Conjugative transfer by the virulence system of Legionella
pneumophila. Science 279, 873876.
Watarai, M., Andrews, H.L., Isberg, R.R., 2001. Formation of
a brous structure on the surface of Legionella pneumophila
associated with exposure of DotH and DotO proteins after
intracellular growth. Mol. Microbiol. 39, 313329, doi:
10.1046/j.1365-2958.2001.02193.x.
Weber, T., Zemelman, B.V., McNew, J.A., Westermann, B.,
Gmachl, M., Parlati, F., Sollner, T.H., Rothman, J.E., 1998.
SNAREpins: minimal machinery for membrane fusion. Cell
92, 759772.
Weimbs, T., Low, S.H., Chapin, S.J., Mostov, K.E., Bucher, P.,
Hofmann, K., 1997. A conserved domain is present in
dierent families of vesicular fusion proteins: a new super-
family. Proc. Natl. Acad. Sci. USA 94, 30463051.
Weimbs, T., Mostov, K., Low, S.H., Hofmann, K., 1998.
A model for structural similarity between dierent
SNARE complexes based on sequence relationships.
Trends Cell Biol. 8, 260262, doi: 10.1016/S0962-
8924(98)01285-9.
Whittam, T.S., Bumbaugh, A.C., 2002. Inferences from whole-
genome sequences of bacterial pathogens. Curr. Opin.
Genet. Dev. 12, 719725, doi: 10.1016/S0959-
437X(02)0036-1.
Yu, V.L., Ploue, J.F., Castellani Pastoris, M., Stout, J.E.,
Schousboe, M., Widmer, A., Summersgill, J., File, T.,
Heath, C.M., Paterson, D.L., Chereshsky, A., 2002. Distri-
bution of Legionella species and serogroups isolated by
culture in patients with sporadic community-acquired legi-
onellosis: an international collaborative survey. J. Infect.
Dis. 186, 127128, id: 0022-1899/2002/18601-0020$15.00.
Communicated by R. Novick
Comparative sequence analysis of the icm/dot genes in LegionellaIntroductionMaterials and methodsBacterial strainsHybridizationPCR and sequencingAdditional gene sequencesSequence alignment and analysisHomology searchDomain searchPhylogenetic analyses
ResultsGene composition of Legionella speciesLevel of interstrain and interspecies variation in L. pneumophilaParalogs of icm/dot genes in Philadelphia strain of L. pneumophilaFurther analysis of individual icm/dot genesIcmPIcmGIcmNIcmKPhylogenetic relationships between strains based on icm gene sequence
DiscussionAcknowledgementsReferences