9
Systematic and Applied Microbiology 33 (2010) 427–435 Contents lists available at ScienceDirect Systematic and Applied Microbiology journal homepage: www.elsevier.de/syapm Malate dehydrogenase: A useful phylogenetic marker for the genus Aeromonas Maribel Farfán , David Mi ˜ nana-Galbis, Albert Garreta, J. Gaspar Lorén, M. Carmen Fusté Departament de Microbiologia i Parasitologia Sanitàries, Facultat de Farmàcia, Universitat de Barcelona, Avda. Joan XXIII s/n, 08028 Barcelona, Spain article info Article history: Received 13 July 2010 Keywords: Malate dehydrogenase mdh Aeromonas Phylogeny abstract The reconstruction of correct genealogies among biological entities, the estimation of the divergence time between organisms or the study of the different events that occur along evolutionary lineages are not always based on suitable genes. For reliable results, it is necessary to look at full-length sequences of genes under stabilizing selection (neutral or purifying) and behaving as good molecular clocks. In bacteria it has been proved that the malate dehydrogenase gene (mdh) can be used to determine the inter- and intraspecies divergence, and hence this gene constitutes a potential marker for phylogeny and bacterial population genetics. We have sequenced the full-length mdh gene in 36 type and reference strains of Aeromonas. The species grouping obtained in the phylogenetic tree derived from mdh sequences was in agreement with that currently accepted for the genus Aeromonas. The maximum likelihood models applied to our sequences indicated that the mdh gene is highly conserved among the Aeromonas species and the main evolutionary force acting on it is purifying selection. Only two sites under potential diversi- fying selection were identified (T 108 and S 193). In order to determine if these two residues could have an influence on the MDH structure, we mapped them in a three-dimensional model constructed from the sequence of A. hydrophila using the human mitochondrial MDH as a template. The presence of purifying selection together with the linear relationship between substitutions and gene divergence makes the mdh an excellent candidate gene for a phylogeny of Aeromonas and probably for other bacterial groups. © 2010 Elsevier GmbH. All rights reserved. Introduction Automated fluorescent DNA sequencing technologies open access to a new phylogenetic approach in bacterial taxa. This technology can generate sequences rapidly, accurately and repro- ducibly and it is also inexpensive and widely available. For all these reasons, researchers have been using partial sequences of differ- ent protein coding genes to generate phylogenies among different bacterial species, developing multilocus sequence typing (MLST) schemes. MLST is a typing scheme based on the DNA sequence first proposed in 1998 by Maiden and colleagues to identify and classify bacterial strains and to unravel population genetics, molecular evo- lution, and epidemiology of the species the strains belong to [11]. This methodology was later substituted by Multilocus Sequence Analysis (MLSA), a method proposed for comparing full-length gene sequences. Moreover, MLST is usually applied to strains that belong to a well-defined species, while MLSA is more often used when species boundaries are not well known and MLSA data are used to improve species descriptions [26]. Nucleotide sequence data reported are available in the GenBank database under accession numbers: from HM163292 to HM163327. Corresponding author. Tel.: +34 93 402 44 97; fax: +34 93 402 44 98. E-mail address: [email protected] (M. Farfán). The objectives of phylogenetic studies are to reconstruct the cor- rect genealogical ties among biological entities, to estimate the time of divergence between organisms and to chronicle the sequence of events along evolutionary lineages. Nevertheless, the genes cho- sen for this purpose are not always suitable. As mentioned by the ad hoc committee for the re-evaluation of species definition in bac- teriology [24], “genes used for obtaining phylogenetic data have to be under stabilizing selection (neutral or purifying)”. In addi- tion, the levels of information obtained would be dependant on the amount of genetic diversity present within a given taxon, and genes suspicious of horizontal transfer episodes or with multi- ple copies should be excluded. Furthermore, genes are frequently used in phylogenetic studies without considering the synonymous and non-synonymous evolution rate, or when the substitutions accumulated become saturated, which results in more recent sub- stitutions making little or no impact on the number of observed differences between the sequences. Finally, the methods for calcu- lating evolutionary distances between sequences assume that all the nucleotide sites in a sequence are equally likely to undergo a substitution, but in most genes, and particularly those codify- ing for protein, different regions of DNA may have quite different probabilities of change. As a consequence, when analyzing par- tial gene sequences we are probably over or under-estimating the divergence of this gene. The way to measure the natural selection acting on a pro- tein is to determine the ω ratio (non-synonymous substitutions 0723-2020/$ – see front matter © 2010 Elsevier GmbH. All rights reserved. doi:10.1016/j.syapm.2010.09.005

Malate dehydrogenase: A useful phylogenetic marker for the genus Aeromonas

Embed Size (px)

Citation preview

Page 1: Malate dehydrogenase: A useful phylogenetic marker for the genus Aeromonas

M

MD

a

AR

KMmAP

I

atdrebspblTAstsi

a

0d

Systematic and Applied Microbiology 33 (2010) 427–435

Contents lists available at ScienceDirect

Systematic and Applied Microbiology

journa l homepage: www.e lsev ier .de /syapm

alate dehydrogenase: A useful phylogenetic marker for the genus Aeromonas�

aribel Farfán ∗, David Minana-Galbis, Albert Garreta, J. Gaspar Lorén, M. Carmen Fustéepartament de Microbiologia i Parasitologia Sanitàries, Facultat de Farmàcia, Universitat de Barcelona, Avda. Joan XXIII s/n, 08028 Barcelona, Spain

r t i c l e i n f o

rticle history:eceived 13 July 2010

eywords:alate dehydrogenasedheromonashylogeny

a b s t r a c t

The reconstruction of correct genealogies among biological entities, the estimation of the divergencetime between organisms or the study of the different events that occur along evolutionary lineages arenot always based on suitable genes. For reliable results, it is necessary to look at full-length sequences ofgenes under stabilizing selection (neutral or purifying) and behaving as good molecular clocks. In bacteriait has been proved that the malate dehydrogenase gene (mdh) can be used to determine the inter- andintraspecies divergence, and hence this gene constitutes a potential marker for phylogeny and bacterialpopulation genetics. We have sequenced the full-length mdh gene in 36 type and reference strains ofAeromonas. The species grouping obtained in the phylogenetic tree derived from mdh sequences wasin agreement with that currently accepted for the genus Aeromonas. The maximum likelihood models

applied to our sequences indicated that the mdh gene is highly conserved among the Aeromonas speciesand the main evolutionary force acting on it is purifying selection. Only two sites under potential diversi-fying selection were identified (T 108 and S 193). In order to determine if these two residues could havean influence on the MDH structure, we mapped them in a three-dimensional model constructed from thesequence of A. hydrophila using the human mitochondrial MDH as a template. The presence of purifying

he linte ge

selection together with tmdh an excellent candida

ntroduction

Automated fluorescent DNA sequencing technologies openccess to a new phylogenetic approach in bacterial taxa. Thisechnology can generate sequences rapidly, accurately and repro-ucibly and it is also inexpensive and widely available. For all theseeasons, researchers have been using partial sequences of differ-nt protein coding genes to generate phylogenies among differentacterial species, developing multilocus sequence typing (MLST)chemes. MLST is a typing scheme based on the DNA sequence firstroposed in 1998 by Maiden and colleagues to identify and classifyacterial strains and to unravel population genetics, molecular evo-

ution, and epidemiology of the species the strains belong to [11].his methodology was later substituted by Multilocus Sequencenalysis (MLSA), a method proposed for comparing full-length geneequences. Moreover, MLST is usually applied to strains that belong

o a well-defined species, while MLSA is more often used whenpecies boundaries are not well known and MLSA data are used tomprove species descriptions [26].

� Nucleotide sequence data reported are available in the GenBank database underccession numbers: from HM163292 to HM163327.∗ Corresponding author. Tel.: +34 93 402 44 97; fax: +34 93 402 44 98.

E-mail address: [email protected] (M. Farfán).

723-2020/$ – see front matter © 2010 Elsevier GmbH. All rights reserved.oi:10.1016/j.syapm.2010.09.005

ear relationship between substitutions and gene divergence makes thene for a phylogeny of Aeromonas and probably for other bacterial groups.

© 2010 Elsevier GmbH. All rights reserved.

The objectives of phylogenetic studies are to reconstruct the cor-rect genealogical ties among biological entities, to estimate the timeof divergence between organisms and to chronicle the sequence ofevents along evolutionary lineages. Nevertheless, the genes cho-sen for this purpose are not always suitable. As mentioned by thead hoc committee for the re-evaluation of species definition in bac-teriology [24], “genes used for obtaining phylogenetic data haveto be under stabilizing selection (neutral or purifying)”. In addi-tion, the levels of information obtained would be dependant onthe amount of genetic diversity present within a given taxon, andgenes suspicious of horizontal transfer episodes or with multi-ple copies should be excluded. Furthermore, genes are frequentlyused in phylogenetic studies without considering the synonymousand non-synonymous evolution rate, or when the substitutionsaccumulated become saturated, which results in more recent sub-stitutions making little or no impact on the number of observeddifferences between the sequences. Finally, the methods for calcu-lating evolutionary distances between sequences assume that allthe nucleotide sites in a sequence are equally likely to undergoa substitution, but in most genes, and particularly those codify-ing for protein, different regions of DNA may have quite different

probabilities of change. As a consequence, when analyzing par-tial gene sequences we are probably over or under-estimating thedivergence of this gene.

The way to measure the natural selection acting on a pro-tein is to determine the ω ratio (non-synonymous substitutions

Page 2: Malate dehydrogenase: A useful phylogenetic marker for the genus Aeromonas

4 pplied Microbiology 33 (2010) 427–435

(pero(svi“a

gdt

mitcBarTmMi

casew

gotit

vttaadha

M

B

Tss

G

o(td

Table 1Aeromonas strains used for mdh sequencing.

Straina GenBank accessionnumber

A. allosaccharophila CECT 4199T HM163292A. aquariorum LMG 24688T HM163293A. bestiarum CECT 4227T HM163294A. bivalvium CECT 7113T HM163295A. caviae CECT 838T HM163296A. culicicola CIP 107763T HM163297A. diversa CDC 2478-85T HM163298A. encheleia LMG 16330T HM163299A. encheleia HG11 LMG 13075 HM163300A. enteropelogenes CECT 4487T HM163301A. eucrenophila CECT 4224T HM163302A. fluvialis CECT 7401T HM163303A. hydrophila subsp. anaerogenes CECT 4221R HM163304A. hydrophila subsp. dhakensis CECT 5744R HM163305A. hydrophila subsp. hydrophila CECT 839T HM163306A. hydrophila subsp. ranae CIP 107985R HM163307A. ichthiosmia CECT 4486T HM163308A. jandaei CECT 4228T HM163309A. media CECT 4232T HM163310A. molluscorum CECT 5864T HM163311A. piscicola CECT 7443 T HM163312A. popoffii LMG 17541T HM163313A. salmonicida subsp. achromogenes CIP 104001R HM163314A. salmonicida subsp. masoucida CECT 896R HM163315A. salmonicida subsp. pectinolytica CECT 5752R HM163316A. salmonicida subsp. salmonicida CECT 894T HM163317A. salmonicida subsp. smithia CECT 5179R HM163318A. sanarellii CECT 7402T HM163319A. schubertii CECT 4240T HM163320A. simiae CIP 107798T HM163321A. sobria CECT 4245T HM163322A. taiwanensis CECT 7403T HM163323A. tecta CECT 7082T HM163324A. trota CECT 4255T HM163325A. veronii bv. Sobria CECT 4246R HM163326A. veronii bv. Veronii CECT 4257T HM163327

resolved by electrophoresis, and amplicons were purified withan MSB® Spin PCRapace kit (Invitek). Purified PCR products weredirectly sequenced on both strands using either the PCR primersor internal primers. Sequencing reactions were performed with

Table 2List of primers used for PCR amplification and DNA sequencing.

Primer Sequence 5′–3′ Orientation Sequence

28 M. Farfán et al. / Systematic and A

dN)/synonymous substitutions (dS)). Values of ω <1, =1 or >1 meansurifying selection (amino acid changes are not allowed), neutralvolution and positive selection (amino acid changes are fixed),espectively. Since positive selection is unlikely to affect all sitesver prolonged time, Yang [28] has developed a software packagePAML4) focused on detecting positive selection that affects onlyome lineages or some sites. The “branch models” allow ω ratio toary among branches in the phylogeny and are useful for detect-ng positive selection acting on particular lineages. Meanwhile, thesites models” allow ω ratio to vary among sites (among codons ormino acids in the protein).

In bacteria it has been proved that the malate dehydrogenaseene (mdh) can be used for determining the inter- and intraspeciesivergence [3,8], and hence that gene constitutes a potential markero be used in phylogeny and bacterial population genetics.

Malate dehydrogenase (MDH) catalyzes the interconversion ofalate to oxalacetate. MDHs are multimeric enzymes consist-

ng of identical subunits usually organized as either dimmers oretramers [1]. Each subunit functions independently in terms ofatalysis. The widely distributed MDHs have been determined inacteria and Archaea domains, fungi, plants and mammals as wells mitochondria and chloroplasts. The MDHs from some bacte-ia (i.e. Escherichia coli and Salmonella enterica subsp. enterica bv.yphimurium) have a relatively high sequence identity with theitochondrial isozymes of eukaryotes, whereas other bacterialDHs (Thermus spp.) are more closely related to the cytoplasmic

sozymes of eukaryotic MDHs [7].Two regions can be distinguished in MDHs, the N-terminal

orresponding to the coenzyme binding region (residues 1–150)nd the catalytic domain (residues 151 to the end). The primarytructure of MDHs shows a generally low degree of similarity. Nev-rtheless, there are some residues that are absolutely conserved,hich are related to the coenzyme binding and catalysis sites [13].

In Aeromonas, several genes have been used to obtain phylo-enies [9,15,17,21,23] and although in some cases the phylogeniesbtained are not consistent [9,21], until now no one has questionedhose divergences. We have studied the usefulness of the mdh genen the phylogeny of the genus Aeromonas, a group of Gammapro-eobacteria that remains complex from a taxonomic point of view.

We report here the results of a comparative analysis of sequenceariation in the polymorphic malate dehydrogenase gene among 36ype and reference strains of Aeromonas and the role of the selec-ive pressure acting on this gene. In addition, we have constructedstructural model and mapped the amino acid positions that inter-ct with the functional domains of the protein in an MDH structureetermined from the sequence of the type strain of Aeromonasydrophila subsp. hydrophila using the human mitochondrial MDHs a template.

aterials and methods

acterial strains

Aeromonas strains used in this study are indicated in Table 1.hey include the type and reference strains of all the recognizedpecies of Aeromonas, even the most recently accepted, and alsoome strains considered as synonyms.

ene sequences

For each strain a segment including the 936 bp coding regionf the mdh gene was PCR-amplified from extracted genomic DNARealpure® Genomic DNA extraction kit, Durviz). Table 2 showshe primers used for PCR and sequencing reactions, which wereesigned from the published complete genome sequences of A.

a CECT, Spanish Type Culture Collection; CIP, Collection de l’Institut Pasteur; LMG,Belgian Co-ordinated Collections of Microorganisms; T, type strain; R, referencestrain; HG, DNA hybridization group.

hydrophila ATCC 7966 and A. salmonicida A449 (GenBank accessionnumbers: NC 008570 and NC 009348, respectively). Amplicons ofapproximately 1200 bp were obtained by using the primer pairmdhF1/mdhR1.

PCR reactions were carried out in a 50 �L volume, containing0.5–5 �L of genomic DNA as a template, 1× PCR buffer I: [10×PCR buffer I: 500 mM KCl, 15 mM MgCl2, 100 mM Tris–HCl (pH8.3)], 2 mM MgCl2, 0.2 mM of each dNTP, 30 ng/�L of each primerand 2 U of AmpliTaq Gold® DNA polymerase (Applied Biosystems).Amplifications were performed in an Applied Biosystems 2720thermal cycler using the following program: initial denaturationat 95 ◦C for 5 min, followed by 35 cycles of denaturation at 95 ◦Cfor 45 s, annealing at 47–49 ◦C for 1 min and elongation at 72 ◦C for2 min, and a final extension at 72 ◦C for 10 min. PCR products were

5′–3′

mdhF1 TTATTGGCTCACAGAACC F −122mdhF2 ACCAACCCGGTNACCAC F +351mdhR1 GTAANGGGTTTCATCATAAG R +1077mdhR2 GTGTTNACCGGGTTGGT R +368

Page 3: Malate dehydrogenase: A useful phylogenetic marker for the genus Aeromonas

pplied

atBS

D

raySitotu

psrc[r

fsostu

Me

P

sMuMitotSaavasiFad[

N

dH

M. Farfán et al. / Systematic and A

BigDye® Terminator v3·1 Cycle Sequencing kit (Applied Biosys-ems) and analyzed on an ABI PRISM 3700 DNA sequencer (Appliediosystems) by the Genomics Unit of the Scientific and Technicalervices of the University of Barcelona (SCT-UB).

ata analysis

The nucleotide sequences obtained were examined for openeading frames (ORFs) with the ORF finder program and werelso compared to the GenBank database using a BLASTN anal-sis, both available at the NCBI [http://www.ncbi.nlm.nih.gov/].equence data were translated, aligned using ClustalX [27] accord-ng to the system default parameters, and back translated to obtainhe nucleotide alignments. The DnaSP software [20] was used tobtain the DNA polymorphism data. The identity of the sequences,he amino acid substitutions and G + C content were calculated bysing the MEGA 4 package software [25].

The synonymous substitution rate (synonymous substitutionser synonymous site dS), non-synonymous substitution rate (non-ynonymous substitutions per non-synonymous site, dN) and dN/dSatio (ω) were calculated as previously described [5]. Standardodon models were fitted to the data set with the PAML4 software28], and likelihood ratio tests (LRT) were used to determine theelative fit of the hierarchically nested models [5].

Models tested included those that account for selection in dif-erent codon sites, within specific lineages, or a combination ofelection in different sites within specific lineages. Parameters andther details involved in these models are explained in the ‘Results’ection. To verify convergence all PAML4 analyses were run at leasthree times from distinct starting values of parameters and seedsed.

Other statistical analyses and graphics were done usingicrosoft® Excel. Unless otherwise indicated, uncertainty is

xpressed as the standard error.

rediction of the protein structure

Structural models for malate dehydrogenase from A. hydrophilaubsp. hydrophila used in this study have been obtained withodWeb, a server for comparative protein structure modeling

sing as a template the human mitochondrial malate (PDB 2dfd).odWeb depends on the large scale protein structure model-

ng pipeline, ModPipe, for its functionality [4]. The structuralemplates used to build models in ModPipe consist of a setf non-redundant chains extracted from structures in the Pro-ein Data Bank [PDB; http://www.pdb.org/pdb/home/home.do].equence-structure matches are established using multiple vari-tions of sequence–sequence, profile–sequence, sequence–profilend profile–profile alignment methods. Significant alignments (E-alue better than 1.0) covering at least 30 amino acid residuesre selected for modeling. Models are built for each one of theequence-structure matches using comparative modeling by sat-sfaction of spatial restraints as implemented in Modeller [22].inally, the resulting models are evaluated using several modelssessment schemes choosing the best scoring model. The pre-icted protein structures were visualized using PyMOL software19].

ucleotide sequence accession numbers

The nucleotide sequences determined in this study have beeneposited in the GenBank database under accession numbers: fromM163292 to HM163327 (Table 1).

Microbiology 33 (2010) 427–435 429

Results

Sequence analysis of the Aeromonas mdh gene

We have sequenced a total of 36 mdh genes corresponding tothe full-length malate dehydrogenase of all the type and referencestrains of Aeromonas (Table 1). The results of the BLASTN searchin the GenBank database verified that all the sequences obtainedshowed high homology with the Aeromonas malate dehydrogenasegene. The multiple sequence alignment displayed 936 sites, with-out gaps. Complete deletion of the terminal stop codons resulted inan alignment of 933 nucleotides with 618 invariable sites (66.24%)and 318 variable (polymorphic) sites (34.01%), 256 of which wereparsimony informative. The G + C content varied from 58.3% (A.salmonicida subsp. pectinolytica) to 62.4% (A. diversa), with an aver-age of 60.4 mol% G + C. The average nucleotide frequencies were19.9%, 29.8%, 19.6% and 30.6% for T, C, A and G, respectively.

Phylogenetic analysis

Fig. 1 shows the unrooted phylogenetic tree derived from mdhgene sequences. Bootstrap values higher than 50 are indicated onthe corresponding nodes. The species grouping obtained was inagreement with that currently accepted for the genus Aeromonas.Five main clusters were detected: cluster 1, which includes theA. hydrophila species complex (A. hydrophila, A. bestiarum and A.salmonicida) as well as A. popoffii and one of the last speciesdescribed in this genus, A. piscicola; cluster 2, which groups A.caviae/A. hydrophila subsp. anaerogenes, A. media, A. eucrenophila,A. encheleia/A. encheleia HG11 and A. tecta, together with the lat-est species described, A. sanarellii and A. taiwanensis; cluster 3comprising A. bivalvium and A. molluscorum; cluster 4, constitutedby A. enteropelogenes/A. trota together with A. fluvialis, a recentlydescribed species of Aeromonas, A. jandaei, A. sobria, A. veronii/A.ichthiosmia and A. culicicola, and A. allosaccharophila; and finally,cluster 5 that includes A. simiae, A. schubertii and A. diversa. All themain clusters, except cluster 1, exhibited bootstrap values higherthan 90%.

The amino acid sequence alignments showed a high similar-ity among the MDHs from all the Aeromonas species, the catalyticdomain (amino acid residues 151–311) being the region that exhib-ited the highest diversity (Fig. S1). In addition, we identified thepositions that have been described as important for the substratebinding, positions 81, 87, 153, or related with the coenzyme bind-ing position 34, which have been described as highly conserved inall MDHs.

In order to clarify the role of selective pressures that operates innucleotide diversity and sequence divergence of mdh in Aeromonaswe performed a maximum likelihood analysis of mdh sequence datausing the models developed by Nielsen and Yang [18] and Yanget al. [29], extending the model of codon substitution of Goldmanand Yang [6] as implemented in the PAML4 package.

Estimation of dS and dN

Estimates of synonymous (dS) and non-synonymous (dN) sub-stitution rates in the mdh sequences were determined using themaximum likelihood (ML) and the Nei–Gojobori (NG) methods[6,16]. In Fig. 2, synonymous, non-synonymous and total (dS + dN)substitution rates are plotted against the distance (t), defined as theexpected number of nucleotide substitutions per codon, which is

about 3 × (0.261dS + 0.739dN), where 0.261 and 0.739 are the pro-portions of the synonymous and non-synonymous sites in the mdhgenes. Both methods showed very similar plots with a linear rela-tion between substitutions (dS, dN, dS + dN) and gene divergence(distance t). The coefficients of determination (r2) obtained were
Page 4: Malate dehydrogenase: A useful phylogenetic marker for the genus Aeromonas

430 M. Farfán et al. / Systematic and Applied Microbiology 33 (2010) 427–435

A. salmonicida subsp. achromogenes R

A. salmonicida subsp. salmonicidaT69

A. salmonicida subsp. masoucidaR

A. salmonicida subsp. smithiaR

A. salmonicida subsp. pectinolyticaR

A. popoffiiT

A. bestiarumT

T

100

100

72

99

55

cluster 1

A piscicola

A. hydrophila subsp. dhakensisR

A. aquariorum T

A. hydrophila subsp. hydrophilaT

A. hydrophila subsp. ranaeR

A. caviaeT99

100

54

90

100

84

A. hydrophila subsp. anaerogenesR

A. sanarelliiT

A. taiwanensis T

A. mediaT

A. eucrenophila T

A. tectaT

67

99

89

74

91cluster 2

A. encheleiaT

A. encheleia HG11

A. bivalviumT

A. molluscorum T

A. enteropelogenes T

T100

100

99

88

68

cluster 3

A. trota

A. fluvialisT

A. jandaeiT

A. sobriaT

A. allosaccharophila T

A. veronii bv. Veronii T88

98

75

62

100

cluster 4

A. culicicolaT

A. ichthiosmia T

A. veronii bv. Sobria R

A. simiaeT

A. schubertiiT

A.diversa T98

52

88

100 cluster 5

A.

0.02

Fig. 1. Phylogenetic tree based on the mdh gene sequences of 36 Aeromonas strains. The tree was constructed with the Jukes–Cantor distance model and clustering with theneighbour-joining algorithm by using the MEGA software. Clusters that group the phylogenetically related species of Aeromonas are indicated in bold on the right. Bootstrapvalues (>50%) based on 1000 replicates are shown as percentages at branch nodes. The scale bar represents 2 nucleotide substitutions per 100 nucleotides.

Page 5: Malate dehydrogenase: A useful phylogenetic marker for the genus Aeromonas

M. Farfán et al. / Systematic and Applied Microbiology 33 (2010) 427–435 431

1.2

1.4

1.6

0.7

0.8

0.9

y = 1.2729x - 0.0164r 2 = 0.9823

0

0.2

0.4

0.6

0.8

1

1.210.80.60.40.20

dS

y = 1.0841x + 0.0039r 2 = 0.9847

0

0.1

0.2

0.3

0.4

0.5

0.66

0.80.70.60.50.40.30.20.10

dS

t (ML)

0.04

0.05

0.06

N

t (NG)

0.04

0.05

0.06

y = 0.0416x + 0.0027r 2 = 0.629

0

0.01

0.02

0.03

1.210.80.60.40.20t (ML)

dN

y = 0.0655x - 0.0017r 2 = 0.6287

0

0.01

0.02

0.03

0.80.70.60.50.40.30.20.10t (NG)

dN

1.3145x - 0.0137=0.6

0.8

1

1.2

1.4

1.6

dS+

dN

0.4

0.5

0.6

0.7

0.8

0.9

1

dS+

dN

yr 2 = 0.9823

0

0.2

0.4

1.210.80.60.40.20t (ML)

y = 1.1495x + 0.0023r 2 = 0.9946

0

0.1

0.2

0.3

0.80.70.60.50.40.30.20.10t (NG)

F ns ves –Gojo( ated i

h0(nm

V

oompMali(

ig. 2. Plots of synonymous (dS), nonsynonymous (dN) and total (dS + dN) substitutiotudy. Plots were calculated with two methods, maximum likelihood (ML) and Neir2), which is used as a measure of the goodness of each regression model, are indic

igh, except for the non-synonymous substitutions, ranging from.9823 (ML) to 0.9847 (NG) for dS substitutions and from 0.9823ML) to 0.9946 (NG) for total substitutions (Fig. 2). Although theon-synonymous substitutions showed low coefficients of deter-ination their contribution to the gene divergence was minimal.

ariability of ω ratio among codon sites

In order to verify the nature of the selective pressure acting onur sequences we used different codon-substitution models devel-ped by Nielsen and Yang [18] and Yang et al. [29]. Standard codonodels were applied to the data set as implemented in the codeml

rogram from the PAML4 package. Random-site models M0, M1a,

2a, M7 and M8 were fitted to our data; these models assume vari-

tion in ω among sites but not among lineages. The models usedog-likelihood, and parameter estimates of these models are shownn Table 3. Table 4 indicates the results of the likelihood ratio testsLRT) for these models.

rsus the genetic distance (distance t) for Aeromonas mdh sequences obtained in thisbori (NG). The corresponding regression equation and coefficient of determinationn each plot.

We applied the simplest of site-based models, M0 [30], whichassume a uniform ω ratio for all codons, to the data. The log-likelihood for our sequences was � = −6092.221, with an estimateof ω = 0.036, which can be interpreted as an average over all sitesin the protein and all lineages in the phylogenetic tree. The low ωvalue obtained suggests a strong action of purifying selection in theevolution of mdh in the Aeromonas species studied.

Model M1a (nearly neutral), which hypothesizes a variableselective pressure among sites but not positive selection, and M2a,which also considers positive selection, fit our data better than theM0 model, with an � = −6024.232 in both cases (Table 3). The LRTbetween M1a and M2a models allows the positive selection modelto be rejected (Table 4). These results also suggest the absence of

positive selection among sites in mdh sequences.

The most stringent test we carried out compared the M7 model,which assumes a beta distribution of ω over sites, with the M8,which adds an extra site class with a free ω ratio estimated fromthe data, allowing ω values greater than one (Table 3). Both models

Page 6: Malate dehydrogenase: A useful phylogenetic marker for the genus Aeromonas

432 M. Farfán et al. / Systematic and Applied Microbiology 33 (2010) 427–435

Table 3Log-likelihood and parameter estimates under random-site models for mdh sequences.

Model pa ωb �c Estimates of parameters Positively selected sites

M0: one ratio 1 0.036 ± 0.000 −6092.221 ω = 0.036 NoneM1a: nearly neutral 2 0.049 ± 0.150 −6024.232 p0 = 0.973 (p1 = 0.027), ω = 0.023, ω = 1 Not allowedM2a: positive selection 4 0.072 ± 0.017 −6024.232 p0 = 0.973, p1 = 0.027 (p2 = 0.000), ω0 = 0.023, ω1 = 1, ω2 = 43.075 NoneM7: beta 2 0.039 ± 0.069 −6004.963 p = 0.118, q = 2.156 Not allowedM8: beta and ω > 1 4 0.074 ± 0.039 −5996.801 p0 = 0.992 (p1 = 0.008) p = 0.149, q = 4.039, ω = 1.033 108 T (P = 0.961*) 193 S (P = 1.000**)

a p, number of free parameters for each model.b ω, averaged ω ± standard error of dN/dS ratio over all sites in the mdh gene alignment.c �, log-likelihood value for each model.

Values of ω > 1 and sites inferred to be under positive selection are in bold.* Probability of being positively selected with P > 95%.

** Probability of being positively selected with P > 99%.

Table 4Likelihood ratio test statistics (LRT) for random-site models.

Model 1a Model 2b 2��c dfd P-value

M1a M0 135.798 1 2.21 ×10−31 ***

M2a M0 135.978 3 2.78 ×10−29***

M2a M1 0 2 1M8 M7 16.324 2 2.85 ×10−4***

a Alternative model.

fi−c(

pfaaim

F

t

FM

b Null model.c 2�� = 2(�1 − �0).d Degrees of freedom (see Table 3).

*** Significant at P < 0.001.

tted the data better than M0 with values of � = −6004.963 and5996.801, respectively (Table 3). At a significance level of 99% we

an reject the null hypothesis (M7 model) and accept the M8 modelTable 4), assuming the existence of ω values higher than 1.

Even accepting M8 as the model that best fits our data, thearameter estimates showed that 99.2% of sites are under puri-ying selection and only two sites (0.8%), a threonine (position 108)nd a serine (position 193) have an ω > 1 with a probability of 95%nd 99%, respectively (Table 3). These results confirm that purify-ng selection is the predominant force acting in the evolution of the

dh gene.

ixed-site models

The ω ratios inferred under the model M8 were scattered alonghe codon sites, with the highest values clustering in the catalytic

1

1.2NAD-binding domain

0.6

0.8

Mea

n ω

0.2

0.4

0150100500

Codon

ig. 3. Distribution of the ω ratio posterior means. Omega value distribution derived froDH protein are shown at the top in bold.

domain (Fig. 3). We applied the fixed-site models to our data inorder to determine if the different sites along the sequences wereunder different selective constraints. We partitioned codon sitesinto two sections, 1: NAD-binding domain from codons 1 to 150,and 2: the catalytic domain from codons 151 to 311. The corre-spondence of these sections with the two regions of the malatedehydrogenase is depicted in Fig. 3.

Three-dimensional structural prediction model for MDH

Fig. 4 shows the predicted structure for MDH of A. hydrophila

subsp. hydrophila using human (mitochondrial) MDH as a template,which includes the NAD and malic acid molecules inside the struc-ture. Sites inferred as positively selected in model 8 (positions T108 and S 193) are mapped onto the model. These amino acids (inorange) are situated away from the functional centers of the protein

Catalytic domain

350300250200

site

m the M8 model is scattered along the codon sites. The two main domains of the

Page 7: Malate dehydrogenase: A useful phylogenetic marker for the genus Aeromonas

M. Farfán et al. / Systematic and Applied Microbiology 33 (2010) 427–435 433

F onal sw g withs ge. (ba

(Mtsi

gsspcp

Flt

ig. 4. A. hydrophila malate dehydrogenase predicted structure. (a) Three-dimensiith mitochondrial human malate dehydrogenase (grey; PDB entry 2dfd) interactin

elected sites determined with the M8 model (108 T and 193 S) are shown in orancid residues involved in this binding (I 12, D 34 and M 227).

NAD and substrate binding sites) and exposed to the surface of theDH structure. In addition, these amino acids are small molecules

hat do not interfere with the protein structures nearby. Fig. 4bhows the distances among the NAD molecule and the amino acidsn positions I 12, D 34 and M 227.

Two models of MDH corresponding to A. hydrophila have beenenerated from the crystalline template (Fig. 5). Both models show

imilar structures except in the region corresponding to the sub-trate binding (purple dotted circle). It seems that this particularart of the protein could adopt two different conformations: alosed (yellow) or an open (purple) loop, depending on the incor-oration or not of the substrate (malic acid) inside the molecule.

ig. 5. Superimposition of the two A. hydrophila MDH active-site conformations. (a) MDHoop showing the two conformations, open (purple) and closed (yellow). (c) Amplificatiohe substrate (malic acid) considering the two possible conformations.

uperposition of malate dehydrogenase of A. hydrophila subsp. hydrophila (yellow)the substrate, malic acid (cyan) and the coenzyme NAD (blue). The two positively

) Detail of the interaction of the NAD and A. hydrophila MDH, and the three amino

Table S1 shows the distances between atoms corresponding tothree arginine residues in positions R 81, R 87 and R 153 and themalic acid molecule.

Discussion

In this study we have optimized the PCR and sequencing condi-tions to amplify the malate dehydrogenase gene in a collection ofAeromonas strains that includes all the type and reference strains.The sequence length of the mdh gene is the same in all cases (936 bp)and almost coincides with those of E. coli and S. enterica subsp. enter-

model with a section corresponding to the active-site. (b) Detail of the active-siten showing the distances from the three arginine residues (R 81, R 87 and R 153) to

Page 8: Malate dehydrogenase: A useful phylogenetic marker for the genus Aeromonas

4 pplied

isuA

mictAsAersc(d

acouoIashtmo

dstalgaetbri

scesmwc

wcgwb[

st(sta

34 M. Farfán et al. / Systematic and A

ca bv. Typhimurium (939 bp). The G + C content calculated from theequence of the mdh gene ranged between 58.3 and 62.4 mol%, val-es that correspond with those previously reported for the genuseromonas [10,12].

The tree generated from the mdh gene sequences shows fiveain clusters that coincide with the classical species clustering

n the genus Aeromonas when phenotypic and genomic data areonsidered [12,14]. Indeed, cluster 1 includes the different specieshat constitute the “Aeromonas hydrophila complex” (A. hydrophila,. bestiarum and A. salmonicida) together with the closely relatedpecies that were later described, A. popoffii, the still controversial. aquariorum (a much later synonym for A. hydrophila subsp. dhak-nsis) and A. piscicola. Cluster 2 corresponds to A. caviae and otherelated species and finally cluster 4 groups A. veronii and relatedpecies. The mdh phylogenetic tree also includes two additionallusters: cluster 3 (A. molluscorum and A. bivalvium) and cluster 5A. schubertii, A. simiae and A. diversa), both grouping species laterescribed.

Although the multiple amino acid sequence of MDHs gener-lly shows a low degree of primary structure similarity, in someases as low as 20% sequence identity [7], the similarity valuesf the Aeromonas MDHs were higher than 91% (Fig. S1). High val-es were also obtained when we compared our data with thosef E. coli (81.2%) and Salmonella Typhimurium (79.5%) (Fig. S2).n addition, the MDH amino acid sequences from Aeromonas,s originally described for other bacteria, have relatively highequence identity with the mitochondrial MDHs of eukaryotes. Theigh similarity between bacterial and mitochondrial MDHs, higherhan that observed when comparing mitochondrial and cytoplas-

atic eukaryotic MDHs, reinforces the endosymbiotic origin of thisrganella.

The two methods used to calculate nucleotide substitutions andistance (ML and NG) gave us almost identical results when theynonymous, non-synonymous and total substitutions were plot-ed against the distance (Fig. 2). The figure shows that mdh evolvest a similar rate in all the Aeromonas species, keeping a very goodinear relationship between substitutions and gene divergence. Aene with these characteristics behaves as a good molecular clocknd as a consequence enables the time of species divergence to bestimated simply by comparing their gene sequences. At the sameime, methods of phylogenetic reconstruction are more likely toe accurate if genes evolve at a constant rate. These characteristicseveal that the mdh gene is suitable for constructing a phylogenyn Aeromonas.

To broaden our study we analyzed the average selective pres-ure acting on the Aeromonas mdh gene as well as measuring itodon by codon applying the PAML4 software. Most of the mod-ls included in this program when applied to our mdh sequenceshowed that nearly all sites are under purifying selection. Theodel that best fits our data was M8, which detected only two sitesith a ω > 1, a threonine in position 108 and a serine in position 193,

onfirming that this gene is mainly under purifying selection.In addition, the omega values along the different codon sites

ere low for the majority of positions (Fig. 3). The highest valuesorresponded to those of the catalytic domain, including an omegareater than 1 (ω = 1.033) in position 193. This result is concordantith previous studies that describe this region of the protein as

eing the most variable compared with the NAD-binding domain3].

In order to determine if the two amino acid residues positivelyelected in model M8 could have an influence on the MDH struc-

ure, we mapped them in a constructed three-dimensional modelFig. 4). The fact that these amino acids are allowed to be positivelyelected in an extremely conserved protein is not surprising sincehey are placed in the surface exposed domains. Furthermore, theyre small amino acid molecules that do not interfere either with

Microbiology 33 (2010) 427–435

the structure of the protein or with the active center of the catalyticactivity.

When we constructed the A. hydrophila MDH model from theamino acid sequence, we obtained two structural models that dif-fered in the loop included in the active-site (Fig. 5). This region, aspreviously reported for other bacterial MDHs [2,13,31], may appearas a closed or open structure depending on whether the substrateis incorporated or not. We also measured the atomic distancesbetween the malic acid and the NAD coenzyme and the corre-sponding amino acid residues that bond these molecules to theMDH model constructed for Aeromonas. The distances determined(Table S1) were similar to those described for the E. coli MDH [2].

Finally, we applied the branch-site model to our sequences,attempting to detect positive selection that affects only a few sitesalong a few lineages. Only three sites with fixed amino acids wereidentified (V 44, A 154 and T 277) in the branch that splits A. schuber-tii, A. simiae and A. diversa from the other Aeromonas. These selectedchanges probably contributed to the separation of this group ofspecies.

In conclusion, the maximum likelihood models applied to oursequences indicated that the mdh gene is highly conserved amongthe Aeromonas species and the main evolutionary force acting onit is purifying selection. Although ω values vary along the differ-ent positions of the gene, all but one are below 1, confirming theaforementioned conclusion. This fact, together with the linear rela-tionship between substitutions and gene divergence, makes themdh an excellent candidate gene for a phylogeny of Aeromonas.In addition, the tree obtained from the mdh sequences clusteredthe different species of this genus in complete agreement with thecurrently accepted species distribution.

As a consequence, considering that the malate dehydrogenasegene fulfills all the desirable conditions of a good phylogeneticmarker, we recommend its inclusion in the phylogenetic studiesof Aeromonas. It could probably also be used for the phylogeny ofother bacterial groups.

Acknowledgments

We thank M.J. Figueras and A. Martínez-Murcia for kindly sup-plying isolates. We also thank Aintzane Urbizu for her contributionto this study. This research was supported by the project CGL2008-03281/BOS from the Ministerio de Educación y Ciencia, Spain.

Appendix A. Supplementary data

Supplementary data associated with this article can be found, inthe online version, at doi:10.1016/j.syapm.2010.09.005.

References

[1] Banaszak, L., Brandshaw, R. (1975) Malate dehydrogenase, in: Boyer, P.D.(Ed.), The Enzymes, Volume 11, Third Edition, Academic Press, New York, pp.369–396.

[2] Bell, J.K., Yennawar, H.P., Wright, S.K., Thompson, J.R., Viola, R.E., Banaszak, L.J.(2001) Structural analyses of a malate dehydrogenase with a variable activesite. J. Biol. Chem. 276, 31156–31162.

[3] Boyd, E.F., Nelson, K., Wang, F., Whittam, T.S., Selander, R.K. (1994) Moleculargenetic basis of allelic polymorphism in malate dehydrogenase (mdh) in naturalpopulations of Escherichia coli and Salmonella enterica. Proc. Natl. Acad. Sci.U.S.A. 91, 1280–1284.

[4] Eswar, N., John, B., Mirkovic, N., Fiser, A., Ilyin, V.A., Pieper, U., Stuart, A.C.,Marti-Renom, M.A., Madhusudhan, M.S., Yerkovich, B., Sali, A. (2003) Tools forcomparative protein structure modelling and analysis. Nucleic Acids Res. 31,

3375–3380.

[5] Farfán, M., Minana-Galbis, D., Fusté, M.C., Lorén, J.G. (2009) Divergent evolutionand purifying selection of the flaA gene sequences in Aeromonas. Biol. Direct 4,23.

[6] Goldman, N., Yang, Z. (1994) A codon-based model of nucleotide substitutionfor protein-coding DNA sequences. Mol. Biol. Evol. 11, 725–736.

Page 9: Malate dehydrogenase: A useful phylogenetic marker for the genus Aeromonas

pplied

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

M. Farfán et al. / Systematic and A

[7] Goward, C.R., Nicholls, D.J. (1994) Malate dehydrogenase: a model for structure,evolution, and catalysis. Protein Sci. 3, 1883–1888.

[8] Ivanetich, K.M., Hsu, P.H., Wunderlich, K.M., Messenger, E., Walkup, W.G., Scott,T.M., Lukasik, J., Davis, J. (2006) Microbial source tracking by DNA sequenceanalysis of the Escherichia coli malate dehydrogenase gene. J. Microbiol. Meth-ods 67, 507–526.

[9] Küpfer, M., Kuhnert, P., Korczak, B.M., Peduzzi, R., Demarta, A. (2006) Geneticrelationships of Aeromonas strains inferred from 16S rRNA, gyrB and rpoB genesequences. Int. J. Syst. Evol. Microbiol. 56, 2743–2751.

10] Lorén, J.G., Farfán, M., Minana-Galbis, D., Fusté, M.C. (2010) Prediction of whole-genome DNA G + C content within the genus Aeromonas based on housekeepinggene sequences. Syst. Appl. Microbiol. 33, 237–242.

11] Maiden, M.C.J. (2006) Multilocus sequence typing of bacteria. Annu. Rev. Micro-biol. 60, 561–588.

12] Martin-Carnahan, A., Joseph, S.W. (2005) Genus. I. Aeromonas Stanier 1943,213AL. In: Brenner, D.J., Krieg, N.R., Staley, J.T., Garrity, G.M. (Eds.), Bergey’sManual of Systematic Bacteriology, Volume 2, part B, Springer, New York, pp.557–578.

13] Minárik, P., Tomásková, N., Kollárová, M., Antalík, M. (2002) Malatedehydrogenases—structure and function. Gen. Physiol. Biophys. 21, 257–265.

14] Minana-Galbis, D., Farfán, M., Lorén, J.G., Fusté, M.C. (2002) Biochemicalidentification and numerical taxonomy of Aeromonas spp. isolated from envi-ronmental and clinical samples in Spain. J. Appl. Microbiol. 93, 420–430.

15] Minana-Galbis, D., Urbizu-Serrano, A., Farfán, M., Fusté, M.C., Lorén, J.G.(2009) Phylogenetic analysis and identification of Aeromonas species basedon sequencing of the cpn60 universal target. Int. J. Syst. Evol. Microbiol. 59,1976–1983.

16] Nei, M., Gojobori, T. (1986) Simple methods for estimating the numbers ofsynonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3,418–426.

17] Nhung, P.H., Hata, H., Ohkusu, K., Noda, M., Shah, M.M., Goto, K., Ezaki, T. (2007)

Use of the novel phylogenetic marker dnaJ and DNA-DNA hybridization to clar-ify interrelationships within the genus Aeromonas. Int. J. Syst. Evol. Microbiol.57, 1232–1237.

18] Nielsen, R., Yang, Z. (1998) Likelihood models for detecting positively aminoacid sites and applications to the HIV-1 envelope gene. Genetics 148,929–936.

[

[

Microbiology 33 (2010) 427–435 435

19] PyMOL Molecular Viewer, DeLano Scientific, San Carlos, CA. [http://www.pymol.org].

20] Rozas, J., Sánchez del Barrio, J.C., Messeguer, X., Rozas, R. (2003) DnaSP, DNApolymorphism analyses by the coalescent and other methods. Bioinformatics19, 2496–2497.

21] Saavedra, M.J., Figueras, M.J., Martínez-Murcia, A.J. (2006) Updated phylogenyof the genus Aeromonas. Int. J. Syst. Evol. Microbiol. 56, 2481–2487.

22] Sali, A., Blundell, T.L. (1993) Comparative protein modelling by satisfaction ofspatial restraints. J. Mol. Biol. 234, 779–815.

23] Soler, L., Yanez, M.A., Chacón, M.R., Aguilera-Arreola, M.G., Catalán, V.,Figueras, M.J., Martínez-Murcia, A.J. (2004) Phylogenetic analysis of the genusAeromonas based on two housekeeping genes. Int. J. Syst. Evol. Microbiol. 54,1511–1519.

24] Stackebrandt, E., Frederiksen, W., Garrity, G.M., Grimont, P.A., Kämpfer, P.,Maiden, M.C., Nesme, X., Rosselló-Mora, R., Swings, J., Trüper, H.G., Vauterin, L.,Ward, A.C., Whitman, W.B. (2002) Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology. Int. J. Syst. Evol. Microbiol.52, 1043–1047.

25] Tamura, K., Dudley, J., Nei, M., Kumar, S. (2007) MEGA 4: Molecular EvolutionaryGenetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24, 1596–1599.

26] Thompson, F.L., Gomez-Gil, B., Ribeiro Vasconcelos, A.T., Sawabe, T. (2007)Multilocus sequence analysis reveals that Vibrio harveyi and V. campbellii aredistinct species. Appl. Environ. Microbiol. 73, 4279–4285.

27] Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G. (1997)The ClustalX windows interface: flexible strategies for multiple sequence align-ment aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882.

28] Yang, Z. (2007) PAML 4: a program package for phylogeneticanalysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591,http://abacus.gene.ucl.ac.uk/software/paml.html.

29] Yang, Z., Nielsen, R., Goldman, N., Pedersen, A.M. (2000) Codon-substitutionmodels for heterogeneous selection pressure at amino acid sites. Genetics 155,

431–449.

30] Yang, Z., Swanson, W.J. (2002) Codon-substitution models to detect adap-tive evolution that account for heterogeneous selective pressures among siteclasses. Mol. Biol. Evol. 19, 49–57.

31] Zaitseva, J., Meneely, K.M., Lamb, A.L. (2009) Structure of Escherichia coli malatedehydrogenase at 1.45 A resolution. Acta Cryst. F65, 866–869.