7
Variability of the tandem repeat region of the Escherichia coli tolA gene Kai Zhou, Kristof Vanoirbeek, Abram Aertsen, Chris W. Michiels * Laboratory of Food Microbiology and Leuven Food Science and NutritionResearchCentre (LFoRCe), Katholieke Universiteit Leuven, Kasteelpark Arenberg 22, B-3001 Leuven, Belgium Received 9 February 2012; accepted 3 May 2012 Available online 1 June 2012 Abstract An intragenic tandem repeat (TR) region has been previously reported in the tolA gene of Escherichia coli. In silico analysis of 123 E. coli tolA sequences from Genbank and PCR analysis of the tolA TR region from 111 additional E. coli strains revealed that this TR region is highly variable. Nine different TR sizes with 8 up to 16 repeat units were found in in silico analysis and 6 of these were also found by PCR analysis. The 13-unit TR emerged as the predominant type using both approaches (47.2% and 86.5%, respectively). Remarkably, TRs in pathogenic strains appeared to be more variable than those in non-pathogens. To demonstrate the occurrence of TR variation in a clonal population, a selection system for TR deletion events was constructed by inserting the 13-unit TR region of MG1655 in frame into a plasmid-borne chloramphenicol acetyltransferase (cat) gene. The resulting cat gene no longer conferred chloramphenicol resistance unless the insert size was reduced by TR contraction. Using this system, Cm-resistant revertants with a TR contraction were recovered at a frequency of 1.1 10 7 , and contraction was shown to be recA-dependent and enhanced in a DNA repair-deficient mutS background. Ó 2012 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved. Keywords: Escherichia coli; tolA; Tandem repeats; Variability 1. Introduction DNA repeats occur in the non-coding and coding genome regions of both eukaryotic and prokaryotic organisms. Generally, they are more abundant in complex genomes and, as such, they were first of all recognized and studied in eukaryotes. Repeat regions often show reduced stability and their variability has been associated with some diseases in human, such as myotonic dystrophy, Huntington’s disease, fragile X syndrome and colon cancer (Hannan, 2010). The most common type of DNA repeats found in bacteria are non- spaced direct repeats of two or more sequences varying in length from one to several hundred base pairs, designated tandem repeats (TRs). Bacterial TRs, in particular the so- called variable number tandem repeats (VNTRs), have received much attention because their polymorphic character which has proven useful in development of powerful typing schemes in a wide range of bacterial pathogens (Lindstedt, 2005). In general, the functional significance of TRs has been less well studied in bacteria than in eukaryotes, a notable exception being the contingency loci, a subgroup of TRs with small repeat units, typically less than 6 bp (Moxon et al., 2006). Contingency loci are commonly found in the coding region or as promotors of genes involved in bacterial inter- action with a host or in stress adaptation, and can modulate gene function by expansion or contraction in the number of repeat units due to strand slippage mispairing. A high number of functional contingency loci have been documented in Haemophilus influenzae, Neisseria meningitidis and Heli- cobacter pylori. In contrast, Escherichia coli appears to have only a few contingency loci affecting gene function. Note- worthy examples are a stretch of G residues in the xylB gene of E. coli J93 that is necessary for the capacity to utilize xylose (Funchain et al., 2000), a heptanucleotide TR in a sialic acid acetyltransferase that modifies capsular polysaccharide in E. coli K1 (Deszo et al., 2005) and a triplet TR in the * Corresponding author. Tel.: þ32 16 321578; fax: þ32 16 321960. E-mail address: [email protected] (C.W. Michiels). Research in Microbiology 163 (2012) 316e322 www.elsevier.com/locate/resmic 0923-2508/$ - see front matter Ó 2012 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved. http://dx.doi.org/10.1016/j.resmic.2012.05.003

Variability of the tandem repeat region of the Escherichia coli tolA gene

Embed Size (px)

Citation preview

Research in Microbiology 163 (2012) 316e322www.elsevier.com/locate/resmic

Variability of the tandem repeat region of the Escherichia coli tolA gene

Kai Zhou, Kristof Vanoirbeek, Abram Aertsen, Chris W. Michiels*

Laboratory of Food Microbiology and Leuven Food Science and Nutrition Research Centre (LFoRCe), Katholieke Universiteit Leuven, Kasteelpark Arenberg 22,

B-3001 Leuven, Belgium

Received 9 February 2012; accepted 3 May 2012

Available online 1 June 2012

Abstract

An intragenic tandem repeat (TR) region has been previously reported in the tolA gene of Escherichia coli. In silico analysis of 123 E. colitolA sequences from Genbank and PCR analysis of the tolATR region from 111 additional E. coli strains revealed that this TR region is highlyvariable. Nine different TR sizes with 8 up to 16 repeat units were found in in silico analysis and 6 of these were also found by PCR analysis. The13-unit TR emerged as the predominant type using both approaches (47.2% and 86.5%, respectively). Remarkably, TRs in pathogenic strainsappeared to be more variable than those in non-pathogens. To demonstrate the occurrence of TR variation in a clonal population, a selectionsystem for TR deletion events was constructed by inserting the 13-unit TR region of MG1655 in frame into a plasmid-borne chloramphenicolacetyltransferase (cat) gene. The resulting cat gene no longer conferred chloramphenicol resistance unless the insert size was reduced by TRcontraction. Using this system, Cm-resistant revertants with a TR contraction were recovered at a frequency of 1.1 � 10�7, and contraction wasshown to be recA-dependent and enhanced in a DNA repair-deficient mutS background.� 2012 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

Keywords: Escherichia coli; tolA; Tandem repeats; Variability

1. Introduction

DNA repeats occur in the non-coding and coding genomeregions of both eukaryotic and prokaryotic organisms.Generally, they are more abundant in complex genomes and,as such, they were first of all recognized and studied ineukaryotes. Repeat regions often show reduced stability andtheir variability has been associated with some diseases inhuman, such as myotonic dystrophy, Huntington’s disease,fragile X syndrome and colon cancer (Hannan, 2010). Themost common type of DNA repeats found in bacteria are non-spaced direct repeats of two or more sequences varying inlength from one to several hundred base pairs, designatedtandem repeats (TRs). Bacterial TRs, in particular the so-called variable number tandem repeats (VNTRs), havereceived much attention because their polymorphic character

* Corresponding author. Tel.: þ32 16 321578; fax: þ32 16 321960.

E-mail address: [email protected] (C.W. Michiels).

0923-2508/$ - see front matter � 2012 Institut Pasteur. Published by Elsevier Ma

http://dx.doi.org/10.1016/j.resmic.2012.05.003

which has proven useful in development of powerful typingschemes in a wide range of bacterial pathogens (Lindstedt,2005). In general, the functional significance of TRs hasbeen less well studied in bacteria than in eukaryotes, a notableexception being the contingency loci, a subgroup of TRs withsmall repeat units, typically less than 6 bp (Moxon et al.,2006). Contingency loci are commonly found in the codingregion or as promotors of genes involved in bacterial inter-action with a host or in stress adaptation, and can modulategene function by expansion or contraction in the number ofrepeat units due to strand slippage mispairing. A high numberof functional contingency loci have been documented inHaemophilus influenzae, Neisseria meningitidis and Heli-cobacter pylori. In contrast, Escherichia coli appears to haveonly a few contingency loci affecting gene function. Note-worthy examples are a stretch of G residues in the xylB gene ofE. coli J93 that is necessary for the capacity to utilize xylose(Funchain et al., 2000), a heptanucleotide TR in a sialic acidacetyltransferase that modifies capsular polysaccharide in E.coli K1 (Deszo et al., 2005) and a triplet TR in the

sson SAS. All rights reserved.

317K. Zhou et al. / Research in Microbiology 163 (2012) 316e322

peroxiredoxin gene ahpC (Ritz et al., 2001). TRs with largerepeat units are also quite common in bacterial genomes. Forexample, about 75 TRs with unit size larger than 9 bp arereadily found in the genome of E. coli MG1655 with thetandem repeat finder (TRF) a tool specifically developed todetect TRs (Benson, 1999). Interesting studies on the effectsupon gene functioning of variable TRs having large repeatunits have been conducted in Legionella pneumophila. Thecopy number of an 18-bp repeat unit in the TR region of thefimV gene in this organism was found to be important intwitching motility, pigment production and morphology (Coiland Anne, 2010), and the copy number of a 45-bp unit in theTR region of the lcl gene that encodes a collagen-like proteinwas demonstrated to modulate bacterial adherence to hostcells (Vandersmissen et al., 2010). Other examples of experi-mentally observed variations in large intragenic TRs includea surface-associated antigen in group B streptococci (Madoffet al., 1996), an autotransporter protein with adhesin func-tion in Haemophilus (Sheets and St Geme, 2011) and a path-ogenicity/avirulence protein in Xanthomonas citri (Yang andGabriel, 1995); interestingly, these have been demonstratedto modulate pathogenicity. Also, in several other pathogens,intragenic TRs are a notable feature of cell wall or membrane-associated proteins that has been speculated to be linked topathogenicity (Bierne and Cossart, 2007; Guo and Mrazek,2008; Jordan et al., 2003; McCarthy and Lindsay, 2010; vanBelkum et al., 1998). However, in E. coli and other Enter-obacteriaceae, the occurrence, variability and functionalsignificance of intragenic TRs with large repeat units has notyet been systematically studied.

In the current work, we focus on TRs in the tolA gene of E.coli. TolA is a cytoplasmic membrane protein that is part ofthe so-called TolePal system, a complex of at least fiveproteins in the cell envelope that is essential for outermembrane stability. TolA plays a crucial role in this complex,since it forms a bridge from the cytoplasmic membrane to thePal protein which is located in the outer membrane. TolA iscomposed of three structural domains: the N-terminal domain(domain I, ca. 42 amino acids) anchors the protein in the innermembrane via a single membrane spanning region (Levengoodet al., 1991); the C-terminal domain (domain III, ca. 120amino acids) is responsible for interaction with the N-terminaldomain of colicin A and filamentous phage g3p protein(Lubkowski et al., 1999); the central helical domain (domainII, ca. 250 amino acids) connects domain I and III.Biochemical analysis showed domain II to be essential in theinteraction with porins (Derouiche et al., 1996). The infectionefficiency of the f1 phage was attenuated when domain II wasdeleted, but not if only the N-terminal half of domain II wasdeleted (Click and Webster, 1997). Moreover, it was alsosuggested that this domain is involved in resistance to deter-gents and group A colicins (Schendel et al., 1997). Domain IIis linked to domain I by a short polyglycine region; in addi-tion, it contains a 13-fold repeated KA3(D/E) motif near itsC-terminus which constitutes one-fourth of the a-helicalregion in E. coli strain K17(DE3) and which is reflected bya TR region in the tolA gene (Levengood et al., 1991;

Schendel et al., 1997). However, whether the TRs in thisregion are subject to contraction and expansion and whethersuch events have physiological or ecological significanceremain unknown.

The objective of the current work was to study the vari-ability of the tolA TR region in E. coli. First, we analyzed thevariability of this repetitive region by in silico analysis of tolAsequences from 123 E. coli strains and by PCR analysis of anadditional 111 strains. Next, we constructed a plasmid-basedselection system to study the nature and frequency of repeatdeletions in the tolA TR region and the dependence of suchevents on DNA recombination and repair pathways in E. coli.

2. Materials and methods

2.1. E. coli strains and culture conditions

A collection of E. coli strains (Table S1) comprising 21avian pathogenic E. coli (APEC) strains obtained from Dr. B.Goddeeris (KU Leuven, Belgium), 20 cytotoxic necrotizingfactor (CNF)-producing (type 1 and 2), 8 necrotoxic (NETCII), 8 enterotoxigenic (ETEC) and 13 enteropathogenic(EPEC) strains received from Dr. J. Mainil (ULg, Liege,Belgium), 20 ECOR strains from the E. coli ReferenceCollection (http://foodsafe.msu.edu/whittam/ecor/, all isolatedfrom healthy mammals) and 24 strains from our laboratorycollection isolated from the sewer system of the city of Ant-werp were used for PCR analysis of tolA tandem repeats.Knock-outs of genes involved in DNA recombination andrepair were introduced in E. coli MG1655 by phage P1transduction from the following E. coli donor strains: QC2411DrecA306 srl::Tn10 (Dukan et al., 1999), AB1157 mutS::Tn10(Wagner and Nohmi, 2000) and CC104 mutY::mini Tn10(Zhao et al., 2001).

Bacteria were propagated at 37 �C in Luria-Bertani (LB)broth (10.0 g of bacto-tryptone, 5.0 g of yeast extract, 5.0 g ofNaCl per l) or on LB agar plates (12.0 g/l of agar). Antibioticswere purchased from Applichem (Darmstadt, Germany) andadded at the following concentrations when appropriate:100 mg/ml ampicillin (Ap); 25 mg/ml of chloramphenicol(Cm); 20 mg/ml tetracycline (Tc); 50 mg/ml kanamycin (Km).

2.2. DNA techniques

Plasmids were isolated by the Miniprep kit (Fermentas, St.Leon-Rot, Germany). Restriction endonuclease digestion,ligation, agarose gel electrophoresis and electroporation wereperformed by standard protocols. PCR primers were designedusing NetPrimer (http://www.premierbiosoft.com/netprimer/).PCR was performed in a 20 ml volume for screening the copynumber of TR regions from the E. coli strain collection byDreamTaq polymerase (Fermentas) and in a 50 ml volume forcloning and sequencing purposes by Phusion polymerase(Finnzymes, Vantaa, Finland). PCR products were visualizedon 1e2.5% agarose gels. DNA purification from agarose afterelectrophoresis was performed using a gel extraction kit(Fermentas). DNA sequencing was done by the Sanger method

318 K. Zhou et al. / Research in Microbiology 163 (2012) 316e322

using the BigDye terminator V3.1 cycle sequencing kit (ABI,Foster City, CA, US) and the ABI PRISM 3100 geneticanalyzer.

2.3. In silico analysis of tandem repeats

Intragenic tandem repeats were identified in E. coli tolAsequences fished from GenBank by BLAST analysis of thetolA sequence from E. coli MG1655 (updated until Nov. 2011)using the TRF program (Benson, 1999) with the followingsettings: alignment parameters: match (2), mismatch (5),indels (7); score threshold (50). The smallest repeated motifthat was found had a size of 15 bp and this was furtherconsidered the standard repeat unit. However, it should benoted that the repeat size was not uniform and that some repeatunits were larger (16 bp, 17 bp and 18 bp).

2.4. PCR analysis of tolA tandem repeats

The tandem repeat region of the tolA genes from differentE. coli strains was amplified by colony PCR using primerstolA_TR_Fw (50- CCGAGTTAAAGCAGAAGCAA-30) andtolA_TR_Rev (50-TTAGCTCACCGAAAATATCA-30), whichwere designed to anneal the conserved region flanking the TRsin MG1655 tolA (Fig. 1). PCR products were then sized byelectrophoresis on 2.5% agarose gels and eventually selectedfor sequencing.

2.5. Construction of reporter plasmid pUC18 cat::TRtolA

TheTR region of tolAwas amplified from theE. coliMG1655genome by colony PCR with primers tolA_rep_fus_Fw (50-GCCCGATATCAAAAAGCCAAAGCAGAAGC-30), and tol-A_rep_fus_Rev (50-GAGGGATATCTTAGCTCACCGAAAA-TAT-30). Plasmid pKD3 (Datsenko and Wanner, 2000) wasopened by PCR with primer set pKD3_cat_fus_Fw (50-GTCA-GATATCCATTTTAGCTTCCTTAGCTCCTG-30) and Rev (50-GGCGGATATCGAGAAAAAAATCACTGG-30). Both ampli-cons were digested with EcoRV (recognition sequence under-lined) (Fermentas) and ligated, resulting in pKD3 cat::TRtolA

with correct orientation of the TR region, in which the catgene and thus the ability to confer Cm resistance isdisrupted. Then, the cat::TRtolA gene was amplified fromthis plasmid using primer set pKD3/4-P1_Fw (50-GTGTAGGCTGGAGCTGCTTC-30), and Rev (50-CATA-TGAATATCCTCCTTAG-30), and inserted into the SmaI site ofpUC18 by blunt ligation, resulting in pUC18 cat:TRtolA. Thisconstruct was confirmed for correct constitution and absence ofpoint mutations by sequencing and transformed into E. coliMG1655 for studying the TR deletion events.

2.6. TR variation frequency measurement

pUC18 cat::TRtolA was transformed into MG1655DrecA306 srl::Tn10, MG1655 mutS::Tn10 and MG1655mutY::miniTn10 and the frequency of TR variation eventsresulting in Cm resistance was evaluated as follows. Strains

harboring pUC18 cat::TRtolA were grown from a single colonyin 4 ml LB with Ap until exponential phase(OD600 ¼ 0.28 � 0.02). Serial dilutions of this culture wereplated on LB with Ap plates to determine the viable cell count,and on LB with Ap and Cm plates to count Cm-resistantrevertants. At least three independent cultures of each strainwere analyzed and the reversion frequency was calculated asthe mean fraction of Cm-resistant revertants in the total pop-ulation, corrected for the occurrence of Cm-resistant revertantswithout an apparent TR deletion, as determined by PCRanalysis of about 50 revertants for each independent culture.

3. Results

3.1. In silico screening for tandem repeat variations inthe E. coli tolA gene

A total of 127 E. coli tolA sequences were retrieved fromGenBank, of which four truncated alleles lacking the distalend of the gene, including the repeat region, were excludedfrom further analysis. The remaining 123 alleles werescreened for tandem repeats using the TRF program, anda consensus repeat unit of 15 bp was identified in the regionthat corresponds to the C-terminal part of domain II of theTolA protein (Fig. 1). Nine different TR lengths were foundwith a TR copy number ranging from 8 to 16. It should benoted here that the consensus sequence as well as the numberof TRs detected was dependent on the parameter settings inthe program, because the repeats at the N-terminal side of therepeat region become gradually more imperfect, and it is notclear whether these should be considered as repeats or not. Thesettings used in this work exclude some of these putativeweakly conserved repeats. The frequency distribution showedthat 47.2% (58/123) of the tolA alleles harbored 13 TRs(Fig. 2A). TRs with other repeat numbers occurred atfrequencies between 0.81% and 20.3%. Based on availableinformation concerning pathogenicity of the strains fromwhich the tolA alleles are derived (Table S1), 72.4% of thenon-pathogens (21/29) versus 40% of the pathogens (35/85)carried the 13-TR allele. Notably, tolA from the EHEC/STECgroup (which are closely related) strains showed higher thanaverage variability, with 5 different TR copy numbers and only37.5% of the strains having the 13-TR allele (12/32) and thelongest tolA type with 16 TRs being found only in thesestrains.

3.2. PCR analysis of tolA TR variability in E. coliisolates

In addition to the in silico screen, the tolATR copy numberof 114 E. coli isolates (different from those retrieved fromGenbank) was determined by PCR analysis. Primers weredesigned to anneal in conserved parts of the MG1655 tolAgene flanking the 50 and 30 ends of the TR region, as shown inFig. 1. Three strains did not yield a PCR product, possiblybecause the repeat regions were truncated, as found for sometolA sequences in Genbank (see above). The number of TRs

Fig. 1. Schematic representation of tolA of E. coli MG1655 showing corresponding protein domain structure and location of tandem repeat region. Each gray bar

represents one repeat unit. Primers used for PCR analysis of TR copy number are shown as arrows.

319K. Zhou et al. / Research in Microbiology 163 (2012) 316e322

was inferred from the apparent length of the ampliconsdetermined by agarose gel electrophoresis. Sequencing ofa number of amplicons with different lengths showed that itwas possible to discriminate single repeat copy differences andto accurately determine TR copy numbers based on the size ofthe amplicon. An example of such a gel is shown in Fig. 3. Inthis way, six tolA types having a different TR copy numbercould be identified among the remaining 111 E. coli isolates.The predominant tolA allele (86.5%) found in the isolates alsocounted 13 repeat units, but other repeat lengths were gener-ally less abundant than was the case for in silico analyzedGenbank sequences (Fig. 2B). However, some groups ofstrains showed relatively high variability. For example, the 13-TR allele accounted for only 60% (12/20) and four differenttypes were found among the 20 ECOR strains. Furthermore,three different alleles were found in the 13 EPEC- as well as inthe 20 CNF-producing pathogenic strains.

3.3. Experimental observation of TR variability ina plasmid-based selection system

While the above observations provide indirect evidence forvariability of the tolA TR region in E. coli, our next objectivewas to demonstrate changes in TR copy number directly ina clonal population. Because contraction or expansion of thetolA TR region is not associated with any documentedselectable phenotype and is likely to occur at low frequency,the TR region of MG1655 tolA was inserted in-frame imme-diately after the ATG start codon of a chloramphenicol acetyl

Fig. 2. TR screening of tolA in E. coli strains. (A): Frequency distribution of tande

silico with the TRF; (B): Frequency distribution of tandem repeat copy number in

transferase (cat) gene and cloned in the pUC18 plasmid. Thismodified cat gene, designated as cat::TRtolA, fails to conferdiscernable Cm resistance unless the number of repeats isdiminished. A similar strategy has been developed and vali-dated with a different TR in a previous study (Hashem et al.,2002). When a culture of E. coli MG1655 carrying thisplasmid was plated on LB medium containing 25 mg/ml Cm,several Cm-resistant colonies were obtained. When these werepurified and checked by PCR, a majority of them displayeddeletions in the TR region, changing the number of repeatunits from 13 to 10, 8, 6 or 4 with 6 being the predominantallele (49.4% of 176 Cm-resistant colonies analyzed) (Fig. 4).Among the different types of revertants, the allele with 6repeats also conferred the highest level of Cm resistance (datanot shown), which likely explains this bias. Since not all Cm-resistant clones had TR deletions, we assume that Cm resis-tance can also be restored by certain other mutations orperhaps by an elevated expression level, but this was notfurther investigated. Cm-resistant revertants showing a TRcontraction occurred at a frequency of 1.1 � 10�7 in expo-nential phase culture, as calculated from 12 independentexperiments. Additionally, we determined the sequence of 36alleles that had undergone a TR contraction. Since the indi-vidual repeats in the tolA TR region are unique, mostlybecause of codon degeneracy in the DNA sequence, we wereable to determine which repeat units were lost by aligning thesequences of the alleles with a TR contraction with that of thewild-type tolA allele. This analysis revealed that deletionsconsistently comprised contiguous repeats centered around

m repeat copy number in 123 E. coli tolA sequences (GenBank) determined in

tolA from 111 E. coli isolates as determined by PCR analysis.

Fig. 3. PCR products of tolATR region in 6 selected E. coli isolates. Lane M:

size markers; lane 1: ECOR 41(8 repeats); lane 2: CNF2-33KH89 (10 repeats);

lane 3: NTEC II-B56 (11 repeats); lane 4: CNF2-B177 (12 repeats); lane 5:

MG1655 (13 repeats); lane 6: ECOR 43 (15 repeats).

320 K. Zhou et al. / Research in Microbiology 163 (2012) 316e322

repeat units 5e6 of the wild-type allele (Fig. 5). Interestingly,some of the alleles generated in this way had exactly the samedeletion start and endpoints as some of the alleles retrievedfrom Genbank. For example, the TR region of tolA from E.coli H263 is identical to allele 5 and that of E. coli 1180 andO111:H-str.11128 is identical to allele 6 generated in thisexperiment.

3.4. Effect of selected DNA repair and recombinationpathways on TR variation

To investigate which cellular functions are involved in TRcontraction, the plasmid carrying the cat::TRtolA allele wastransformed into different MG1655 mutants affected in keyDNA repair and recombination pathways, and frequencies ofreversion to Cm resistance by TR variation were compared(Table 1). Significantly reduced reversion frequency was foundin recA mutants. In fact, no Cm-resistant colonies appearedduring our selection experiments, demonstrating that therepeat rearrangements in this plasmid are RecA-dependent. Incontrast, a mutS-negative background enhanced repeat dele-tion about 11-fold. Not unexpectedly, the frequency of

Fig. 4. Frequency distribution of TR copy number in 176 Cm-resistant revertants s

copy number was determined by PCR. Revertants without apparent TR contractio

revertants without repeat deletion increased in this backgroundas well, probably because the deficiency in methylation-directed mismatch repair (MMR) increases the mutationfrequency in these strains. Nevertheless, deprivation of theMutY-dependent mismatch repair pathway had no statisticallysignificant effect on the recombination frequency.

4. Discussion

Bacterial genomes contain a wide diversity of small andlarge intra- and intergenic TRs. However, except for a numberof so-called contingency loci (i.e. simple sequence repeats of1e6 bp that cause hypervariability in a locus), the role of TRsin bacterial variability has been poorly studied. This isparticularly the case for TRs with repeat units > 6 bp, like thatin the E. coli tolA gene studied in this work. Although theoccurrence and possible importance of a TR in tolA hadalready been recognized more than 20 years ago (Levengoodet al., 1991; Schendel et al., 1997), studies of its variabilityand function are lacking up to date.

In this work, in silico analysis of 123 intact tolA sequencesfrom Genbank and PCR analysis of the TR repeat number ofan additional 111 strains of different origins revealed that theTR region of tolA is highly variable, with between 8 and 16repeat units depending on the strain and 13 repeat units beingthe predominant allele emerging from both analyses. Althoughthese observations are indirect, they suggest not only that theTR region is subject to contraction and expansion in thespecies E. coli, but also that the different TolA variantsgenerated in this dynamic process may confer differingfitnesses to the strains expressing it. Interestingly, in in silicoanalysis, variants with a number of repeat units different from13 were more frequent among pathogens (60%) than amongnon-pathogens (27.6%). In particular, five different TR vari-ants occurred among the 32 EHEC/STEC strains included inin silico analysis in spite of the overall genetic relatedness ofthese strains, and, remarkably, none of the 20 O157:H7 strainsin this group had 13 repeat units. In addition, for pathogensbelonging to the EPEC (3 types in 13 strains) or CNF-producing groups (3 types in 20 strains), higher than averagevariability was observed, in this case by PCR analysis. On theother hand, APEC pathogens showed limited variability (2 TRlengths among 21 strains). Clearly, it will be of interest to

elected from a non-Cm resistance conferring a 13-unit cat:::TRtolA allele. TR

n were excluded.

Fig. 5. Maps showing deleted TR units (hatched) in 36 different revertant cat::TRtolA alleles conferring Cm resistance. The frequency of occurrence of each allele is

indicated in brackets. The original wild-type (MG1655) non-Cm resistance-conferring cat::TRtolA construct comprised 13 repeat units.

321K. Zhou et al. / Research in Microbiology 163 (2012) 316e322

conduct more extensive studies to investigate whether theTolA TR number can be correlated with pathogenicity, habitator any other ecological or virulence-related property of E. colistrains.

Several systems for facilitating detection of TR rearrange-ments have been described (reviewed in Bichara et al., 2006).In this work, we customized a previously described approachthat makes use of the ability of a chloramphenicol resistance(i.e. cat) gene, rendered inactive by an in-frame insertion ofa TR, to regain functionality when the insert is reduced in sizeby TR contraction (Hashem et al., 2002). Using this system,CmR revertants could be easily selected, and most of these hadTR deletions reducing the repeat number from 13 to 4, 6, 8 or10. It is remarkable that out of 176 CmR revertants analyzed,only even-numbered TRs were obtained in this experiment.The reason for this is not clear, since odd-numbered TRs showno systematic underrepresentation compared to the even-numbered ones in the natural TolA alleles (Fig. 2). Possibly,the molecular mechanism of TR variation is different when theTR is located on a multicopy plasmid (as in our cat::TRtolA

allele) than when it is on the chromosome. A differencebetween our own results and those of Hashem et al. (2002) isthat we did not find CmR revertants from which the entire TRhad been deleted. Another notable feature is that the hotspotfor deletion is centered around repeat units 5 and 6, i.e.somewhat left of the center of the original TR (Fig. 5). Also,the ultimate and penultimate repeat units were never involved

Table 1

Frequency of TR contraction in a 13-repeat cat::TRtolA allele leading to Cm

resistance in different genetic backgrounds. Results are expressed as average

from n independent experiments. Asterisk indicates significant difference from

wild type at the 95% confidence level using unpaired Student’s t-test (P-values

in column 3).

Genetic background

(all in MG1655)

Reversion frequency (�10�7) P

Wild-type 1.1 � 0.85 (n ¼ 12)

DrecA <0.23 � 0.01 (n ¼ 18)* 0.025

mutS::Tn10 12 � 20 (n ¼ 14)* 0.041

mutY::mini Tn10 0.87 � 0.74 (n ¼ 14) 0.51

in any deletion event in our experiment, possibly because theyare relatively distant from the apparent deletion hotspot. It canbe deduced from sequence alignment that the two ultimaterepeat units of all the analyzed natural tolA sequences are verysimilar to the MG1655 repeat units 12 and 13 (data notshown), suggesting that they are not easily deleted or that theymay be functionally important. Indeed, the C-terminal part ofTolA domain II (residues 280e293) which contains theserepeats, was recently suggested to be essential in binding thetetratricopeptide repeat (TPR) domain of YbgF in the TolePalcomplex (Krachler et al., 2010).

Homologous recombination and strand slippage duringreplication are thought to be the primary mechanisms affectinginstability of TRs, but the relative importance of the twomechanisms may depend on the size of the repeat units. Sincethe former mechanism involves inter- or intramolecular cross-over events between homologous stretches of DNA, it is pre-dicted to becomemore prominent with increasing repeat length.In bacteria, most recombination events have been found to bedependent on RecA. We tested the effect of some key recom-bination and DNA repair functions on TR deletion in theplasmid-based cat::TRtolA system. RecA was found to beessential in this assay, since no CmR revertants could be isolatedin the recA background. Although this observation does notclarify the mechanism of TR deletion in detail, it clearly impliesinvolvement of homologous recombination. Interestingly,a variation in tolA repeats was stimulated in the absence of theMutS-dependent, but not the MutY-dependent, mismatch repairpathway, although the reasons for this remain unclear.

To conclude, we demonstrated in this work that E. coli tolAcontains a highly variable TR region. In view of the centralrole of TolA in the TolePal complex, its ability to bindspecifically to other proteins of this complex and the impor-tance of this system for stability and functionality of the E.coli outer membrane, it is remarkable that such a variablerepeat region has been maintained during evolution. Weanticipate that modulation of the size of this TR region, asdemonstrated in this work, may contribute to the fitness of E.coli under specific stress conditions or in specific niches.

322 K. Zhou et al. / Research in Microbiology 163 (2012) 316e322

Acknowledgments

This work was supported by research grants from theResearch Foundation Flanders (G.0289.06N) and from the KULeuven Research Fund (METH/07/03). We thank Dr. R.Lavigne for providing DNA sequencing service.

Appendix A. Supplementary data

Supplementary data associated with this article can befound, in the online version, at http://dx.doi.org/10.1016/j.resmic.2012.05.003.

References

Benson, G., 1999. Tandem repeats finder: a program to analyze DNA

sequences. Nucleic Acids Res. 27, 573e580.Bichara, M., Wagner, J., Lambert, I.B., 2006. Mechanisms of tandem repeat

instability in bacteria. Mutat. Res. 598, 144e163.

Bierne, H., Cossart, P., 2007. Listeria monocytogenes surface proteins: from

genome Predictions to function. Microbiol. Mol. Biol. Rev. 71, 377e397.

Click, E., Webster, R.E., 1997. Filamentous phage infection: required inter-

actions with the TolA protein. J. Bacteriol. 179, 6464e6471.

Coil, D.A., Anne, J., 2010. The role of fimV and the importance of its tandem

repeat copy number in twitching motility, pigment production, and

morphology in Legionella pneumophila. Arch. Microbiol. 192, 625e631.

Datsenko, K., Wanner, B., 2000. One-step inactivation of chromosomal genes

in Escherichia coli K-12 using PCR products. Proc. Natl. Acad. Sci. 97,

6640e6645.

Derouiche, R., Gavioli, M., Benedetti, H., Prilipov, A., Lazdunski, C.,

Lloubes, R., 1996. TolA central domain interacts with Escherichia coli

porins. EMBO J. 15, 6408e6415.Deszo, E.L., Steenbergen, S.M., Freedberg, D.I., Vimr, E.R., 2005. Escher-

ichia coli K1 polysialic acid O-acetyltransferase gene, neuO, and the

mechanism of capsule form variation involving a mobile contingency

locus. Proc. Natl. Acad. Sci. 102, 5564e5569.

Dukan, S., Belkin, S., Touati, D., 1999. Reactive oxygen species are partially

involved in the bacteriocidal action of hypochlorous acid. Arch. Biochem.

Biophys. 367, 311e316.

Funchain, P., Yeung, A., Stewart, J.L., Lin, R., Slupska, M.M., Miller, J.H.,

2000. The consequences of growth of a mutator strain of Escherichia coli

as measured by loss of function among multiple gene targets and loss of

fitness. Genetics 154, 959e970.Guo, X., Mrazek, J., 2008. Long simple sequence repeats in host-adapted

pathogens localize near genes encoding antigens, housekeeping genes,

and pseudogenes. J. Mol. Evol. 67, 497e509.

Hannan, A., 2010. TRPing up the genome: tandem repeat polymorphisms as

dynamic sources of genetic variability in health and disease. Discov. Med.

10, 314e321.

Hashem, V.I., Rosche, W.A., Sinden, R.R., 2002. Genetic assays for measuring

rates of (CAG). (CTG) repeat instability in Escherichia coli. Mutat. Res.

502, 25e37.

Jordan, P., Snyder, L.A., Sauders, N.J., 2003. Diversity in coding tandem

repeats in related Neisseria spp. BMC Microbiol. 3, 23.

Krachler, A.M., Sharma, A., Cauldwell, A., Papadakos, G., Kleanthous, C.,

2010. TolA modulates the oligomeric status of YbgF in the bacterial

periplasm. J. Mol. Biol. 403, 270e285.

Levengood, S.K., Beyer Jr., W.F., Webster, R.E., 1991. TolA: a membrane

protein involved in colicin uptake contains an extended helical region.

Proc. Natl. Acad. Sci. 88, 5939e5943.

Lindstedt, B., 2005. Multiple-locus variable number tandem repeats analysis

for genetic fingerprinting of pathogenic bacteria. Electrophoresis 26,

2567e2582.

Lubkowski, J., Hennecke, F., Pluckthun, A., Wlodawer, A., 1999. Filamentous

phage infection: crystal structure of g3p in complex with its coreceptor, the

C-terminal domain of TolA. Structure 7, 711e722.

Madoff, L.C., Michel, J.L., Kling, D., Gong, E.W., Kasper, D.L., 1996.

Group B streptococci escape host immunity by deletion of tandem

repeat elements of the alpha C protein. Proc. Natl. Acad. Sci. 93,

4131e4136.

McCarthy, A.J., Lindsay, J.A., 2010. Genetic variation in Staphylococcus

aureus surface and immune evasion genes is lineage associated: implica-

tions for vaccine design and host-pathogen interactions. BMC Microbiol.

10, 173.

Moxon, R., Bayliss, C., Hood, D., 2006. Bacterial contingency loci: the role of

simple sequence DNA repeats in bacterial adaptation. Annu. Rev. Genet.

40, 307e333.

Ritz, D., Lim, J., Reynolds, C.M., Poole, L.B., Beckwith, J., 2001. Conversion

of a peroxiredoxin into a disulfide reductase by a triplet repeat expansion.

Science 294, 158e160.

Schendel, S.L., Click, E.M., Webster, R.E., Cramer, W.A., 1997. The TolA

protein interacts with colicin E1 differently than with other group A

colicins. J. Bacteriol. 179, 3683e3690.Sheets, A.J., St Geme 3rd, J.W., 2011. Adhesive activity of the Haemophilus

Cryptic Genospecies Cha autotransporter is modulated by variation in

tandem peptide repeats. J. Bacteriol. 193, 329e339.

van Belkum, A., Scherer, S., van Alphen, L., Verbrugh, H., 1998. Short-

sequence DNA repeats in prokaryotic genomes. Microbiol. Mol. Biol. Rev.

62, 275e293.

Vandersmissen, L., De Buck, E., Saels, V., Coil, D.A., Anne, J., 2010. A

Legionella pneumophila collagen-like protein encoded by a gene with

a variable number of tandem repeats is involved in the adherence and

invasion of host cells. FEMS Microbiol. Lett. 306, 168e176.

Wagner, J., Nohmi, T., 2000. Escherichia coli DNA polymerase IV mutator

activity: genetic requirements and mutational specificity. J. Bacteriol. 182,

4587e4595.

Yang, Y.O., Gabriel, D.W., 1995. Intragenic recombination of a single plant

pathogen gene provides a mechanism for the evolution of newhost speci-

ficities. J. Bacteriol. 177, 4963e4968.

Zhao, J., Leung, H.E., Winkler, M.E., 2001. The miaA mutator phenotype of

Escherichia coli K-12 requires recombination functions. J. Bacteriol. 183,

1796e1800.