25
Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution rate and asymmetrical co-divergence of coding and non-coding sequences. Günter P. Wagner* + , Kazuhiko Takahashi*, Vincent Lynch*, Sonja J. Prohaska#, Claudia Fried#, Peter F. Stadler#, and Chris Amemiya% *)Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520-8106, USA #) Bioinformatik, Institut für Informatik, Universität Leipzig, D-04103 Leipzig, Germany %) Benaroya Research Institute at Virginia Mason, Seattle, WA 96101, USA +) Corresponding author: Gunter P. Wagner Department of Ecology and Evolutionary Biology Yale University New Haven, CT 06520-8106 Tel: 203-432-9998 Fax: 203-432-3870 Email: [email protected]

Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

Molecular Evolution of Duplicated Ray FinnedFish HoxA Clusters:

Increased synonymous substitution rate andasymmetrical co-divergence of coding and non-coding

sequences.

Günter P. Wagner*+, Kazuhiko Takahashi*, Vincent Lynch*, Sonja J. Prohaska#,Claudia Fried#, Peter F. Stadler#, and Chris Amemiya%

*)Department of Ecology and Evolutionary Biology,Yale University, New Haven, CT 06520-8106, USA

#) Bioinformatik, Institut für Informatik,Universität Leipzig, D-04103 Leipzig, Germany

%) Benaroya Research Institute at Virginia Mason,Seattle, WA 96101, USA

+) Corresponding author:Gunter P. WagnerDepartment of Ecology and Evolutionary BiologyYale UniversityNew Haven, CT 06520-8106Tel: 203-432-9998Fax: 203-432-3870Email: [email protected]

Page 2: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

2

Abstract:

In this study the molecular evolution of duplicated HoxA genes in zebrafish and fuguhas been investigated. All eighteen duplicated HoxA genes studied have a higher non-synonymous substitution rate than the corresponding genes in either bichir orpaddlefish, where these genes are not duplicated. Surprisingly, though, the higher rateof evolution is not due solely to a higher non-synonymous to synonymous rate ratiobut to an increase in both the non-synonymous as well as the synonymous substitutionrate. The synonymous rate increase can neither be explained by changes in nucleotidecomposition nor codon bias. Thus, we suggest that the HoxA genes may experiencean increased mutation rate following cluster duplication. In the non-Hox nuclear geneRAG1 only an increase in non-synonymous substitutions could be detected,suggesting that the increased mutation rate is specific to duplicated Hox clusters andmight be related to the structural instability of Hox clusters following duplication. Thedivergence among paralog genes tends to be asymmetric with one paralog divergingfaster than the other. In fugu, all b-paralogs diverge faster than a-paralogs, while inzebrafish Hoxa-13a diverges faster. This asymmetry corresponds to the asymmetry inthe divergence rate of conserved non-coding sequences, i.e. putative cis-regulatoryelements. In fugu, the putative cis-regulatory elements of the b-paralog HoxA clusterdiverge faster, while in zebrafish those of the a-paralog cluster diverge faster. Hence,we found evidence for a concerted asymmetric divergence of coding sequences on thesame cluster as well as between coding and non-coding sequences. These resultssuggest that the 5’ HoxA genes in the same cluster belong to a co-evolutionary unit inwhich genes have a tendency to diverge together.

Page 3: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

3

Introduction

Hox cluster duplications play a prominent but poorly understood role in vertebrateevolution (Wagner et al. 2003). Hox genes code for homeodomain containingtranscription factors and are homologous to the homeotic genes in the DrosophilaAntp and Ubx gene clusters. The Hox gene family is thought to have arisen bytandem duplications, leading to a cluster of three to four homeobox containing genes,which was then duplicated, giving rise to the ancestral Hox and para-Hox geneclusters (Holland and Garcia-Fernandez 1996; Kappen et al. 1989). Additionaltandem duplications increased the number of Hox genes in the cluster to up to 14 inchordates (Ferrier et al. 2000; Powers and Amemiya 2004). All well characterized in-vertebrate taxa have a single cluster with the exception of the nematode,Caenorhabditis elegans, and the urochordate Ciona intestinalis which have lost theintegrity of the Hox cluster all together (Dehal and et 2002; Ruvkun and Hobert1998). In contrast, multiple copies of Hox clusters have been found in all vertebratelineages, varying between at least three in lamprey to seven or eight in teleost species(Holland et al. 1994; Martinez and Amemiya 2002; Meyer and Malaga-Trillo 1999;Prohaska and Stadler 2004; Ruddle et al. 1994b; Taylor et al. 2003). From these dataat least three duplication events can be inferred, although the number of duplicationsmight be as high as nine (Fried et al. 2003). The reason for the different pattern ofHox cluster evolution in vertebrate compared to invertebrate lineages is not clear. Onepossibility is that the duplicated genes are necessary for vertebrate body planelaboration (Holland et al. 1994; Malaga-Trillo and Meyer 2001; Ruddle et al. 1994a),or to overcome structural constraints specific to vertebrate Hox clusters (Fried et al.2004; Wagner et al. 2003). In this paper, we examine the evolutionary fate ofduplicated HoxA genes in the teleosts in order to gain insight into the role Hox clusterduplication plays in vertebrate evolution.

A variety of teleost species are known to have more than four Hox clusters (Meyerand Schartl 1999; Prince and Pickett 2002; Prohaska and Stadler 2004), including thezebrafish(Amores et al. 1998), Fugu and Spheroides (Amores et al. 2004), andmedaka (Naruse et al. 2000; Naruse et al. 2004). Preliminary data also exists for thekillifish (Misof and Wagner 1996), stripped bass (Pavell and Stellwag 1994; Snell etal. 1999), tilapia (Santini et al. 2003), and the rice field eel (Ji et al. 2002). It is notentirely clear when this duplication has happened, but data suggests that it was sometime after the most recent common ancestor of paddlefish and other ray fined fishesand before the most recent common ancestor of Euteleosts (Metscher et al. 2004).

Three sets of taxa are considered in this paper, one for which Hox gene sequences areavailable and one where RAG1 sequence information is known and the non-codingsequences of zebrafish, fugu, tilapia, bichir and shark HoxA clusters. Thephylogenetic hypotheses for these two sets of species are given in Figures 1a and 1b.

Materials and Methods

Data:Full Hox cluster sequences were used in this study from the shark, Heterodontusfrancisci: HfHoxA = AF479755; the bichir, Polypterus senegalus : PsHoxA =AC132195 and AC12632; the two zebrafish paralogs, Danio rerio: DrHoxAa =

Page 4: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

4

AC107365 r.c.; and DrHoxAb = AC107364; the fugu,Takifugu rubripes: TrHoxAa = Fugu v.3.0 scaffold 47 positions 103001-223000r.c., contains FRU92573; and TrHoxAb = Fugu v.2.0 scaffold 1874, and tilapia,Oreochromis niloticus: OnHoxA: AF533976.

The Hoxa-13 and Hoxa-11 sequences of paddlefish were PCR amplified, cloned andsequenced and described elsewhere (Metscher et al. 2004). The sequences arepublished in Genbank (Hoxa-13: # AY661749, protein ID AAT75331) (Hoxa-11:genomic sequence # AY661748, protein ID AAT75330).

RAG1 sequences of the following species where retrieved from Genbank: zebrafish,Danio rerio (NM131389), fugu, Takifugu rubripes (AF108420), winter flounder,Pseudopleuronectus americanus (AF369067), Atlantic cod, Gadus morhua(AF369064); Osteoglossomorpha: bronze beatherback, Notopterus notopterus(AF369063), Ganthonemus sp. (IMCB-2001) (AF369062), freshwater butterfly fish,Pantodon buchholzi (AF369061), Osteoglossum sp. (AF369060); bowfin, Amia calva(AF369059), Mississippi paddlefish, Polyodon spathula (AF369057), bichir,Polypterus sp. (IMCB-2001) (AF369055), Latimeria menadoensis (AF369069).

Analysis of coding sequence evolution: both the nucleotide sequence and predictedamino acid sequence were aligned with ClustalW. The amino acid sequence was theninspected for regions of ambiguous alignment. These regions and gaps wereeliminated from the alignment. Finally, the nucleotide sequence corrected to conformto the edited amino acid sequence. Sequence alignments were analyzed with threeprograms as appropriate and indicated in the results section: RRTree 1.1.11(Robinson-Rechavi and Huchon 2000), HYPHY99.beta for MacOS (Muse and Gaut1994), and PAML (Yang 1997).

Relative Rate Test for Conserved Non-Coding Sequences: conserved non-codingsequences were detected using the tracker program (Prohaska et al. 2004). Verybriefly, this approach is based on BLAST (Altschul et al. 1990) for the initial search ofall pairs of input sequences restricted to homologous intergenic regions. The resultinglist of pairwise sequence alignments is then assembled into groups of partiallyoverlapping regions that are subsequently passed through several filtering steps andfinally aligned using the segment based multiple alignment tool DIALIGN2(Morgenstern 1999). The final output of the program is the list of these aligned“footprint cliques” (see supplemental material athttp://www.tbi.univie.ac.at/Publications/SUPPLEMENTS/04-014 ). The alignmentsof all footprint cliques are concatenated and padded with gap characters where afootprint detected between some sequences does not have a counterpart in others.Consequently, all gap characters are treated as unknown nucleotides rather than asdeletions. Conserved sequences between two outgroup species, in this case the shark,Heterodontus francisci, and the bichir, Polypterus senegalus, are compared to pairs ofingroup sequences, in these cases all pairs of the zebrafish a/b clusters, the fugu a/bclusters and the Tilapia, Oreochromis niloticus, HoxAa cluster. The fraction of theconserved nucleotide positions between the two ingroup sequences is compared witha exponential decay model (Wagner et al. 2004), assuming that there is a constant rateof loss of conservation along the lineages. Finally these numbers are used for testingfor differences in the rate of modification between the two ingroup lineages asdescribed in (Wagner et al. 2004).

Page 5: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

5

Background

Hox cluster duplication is followed by loss of some duplicated genes. The genecomplement of various ray finned fish HoxA clusters has been described in previouspapers (bichir: (Chiu et al. 2004); zebrafish: (Amores et al. 1998; Chiu et al. 2002);stripped bass: (Snell et al. 1999); fugu and puffer fish: (Amores et al. 2004); Tilapia:(Santini et al. 2003)). In addition the complete sequence of the shark HoxA clusterhas been described (Chiu et al. 2002; Kim et al. 2000). Here we recapitulate thesalient features of these results to set the stage for the detailed sequence analysisreported below (Fig. 2).

The HoxA cluster of the ancestral ray finned fish most likely had paralog groupmembers 1 to 7, 9 to 11 and 13 and 14, i.e. 12 genes (Powers and Amemiya 2004).Prior to HoxA cluster duplication in the stem of teleost fishes the HoxA clusterfurther lost Hoxa-6, and Hoxa-14 (Chiu et al. 2004; Powers and Amemiya 2004), andthus had a total of ten genes. After duplication the b-paralog cluster lost all theanterior and medial genes with the exception of Hoxa-2b. In contrast the HoxAaparalog cluster only lost one or two genes in different lineages. The fugu andpufferfish lineage both lost Hoxa-7a and zebrafish also Hoxa-10a, while tilapia andthe stripped bass retained both of these genes. The fugu has the largest number of firstorder paralogs, namely all the AbdB related genes Hoxa-9a/b, Hoxa-10a/b, Hoxa-11a/b and Hoxa-13a/b, as well as Hoxa-2a/b. In zebrafish only three 1st order paralogHoxA genes are maintained: Hoxa-9 a/b, Hoxa-11a/b and Hoxa-13 a/b. The detailedsequence analysis presented here focuses on the Hox genes with first order paralogs,with a particular focus on Hoxa-11 and Hoxa-13, and some results for Hoxa-2, Hoxa-9 and Hoxa-10. The analysis of the divergence of non-coding sequences of fugu andzebrafish will be restricted to the intergenic regions between Hoxa-9 and Hoxa-13, i.e.the segment of the cluster that contains most of the 1st order paralogs.

Inferences on the consequences of Hox gene duplication critically depend on theavailability of close outgroup taxa. There is only a limited amount of data from basalactinopterygian HoxA clusters. Recently we described a complete HoxA clustersequence from the most basal lineage of ray finned fishes, the bichir (Chiu et al.2004), and two genes, Hoxa-11 and Hoxa-13, from the American paddlefish,Polyodon spathula (Metscher et al. 2004), which is a member of the second mostbasal ray finned fish lineage (Bermis et al. 1997). Gene tree analysis of either taxonshows that these lineages arose prior to the origin of the paralog HoxA clusters foundin the teleosts (Chiu et al. 2004; Metscher et al. 2004).

Results

Increased rate of synonymous and non-synonymous substitutions after Hox geneduplication

The amino acid substitution rates of the18 teleost Hox genes studied here werecompared to the rate in either the paddlefish or the bichir using a maximum likelihoodrelative rate test as implemented in HYPHY99beta for MacOS. In every instance theestimated amino acid substitution rate in the teleost gene was higher than that in

Page 6: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

6

bichir or paddlefish and in 15 cases the difference was statistically significant (Fig. 3,Table 1).

The higher rate of non-synonymous substitutions in duplicated Hox genes is expectedassuming that duplicated genes have a certain degree of functional redundancy.Functional redundancy would imply that the non-synonymous rate woulddifferentially increase over the synonymous rates, leading to an increased dN/dS ratio,ω. Given that the duplication may be as long as 320 Mio years ago (Vanderpoele et al.2004), estimating the synonymous substitution rate is only possible with genes forwhich relatively close out groups are available. Table 2 gives the results for Hoxa-11and Hoxa-13 since only for those genes we had a paddlefish ortholog at hand. BothdN and dS were estimated with a codon based maximum likelihood method (Codemlof PAML) (Yang 1997) and a distance based method (RRTree)(Robinson-Rechaviand Huchon 2000) and compared to that in paddlefish (Tables 2A and B). Bothmethods account for phylogenetic structure, which makes these methods moreappropriate for the comparison of deeply divided lineages than three taxa relative ratetests. In addition RRTree estimates the transversion rate at fourfold degenerate sites(B). This measure of the synonymous substitution rate can be estimated over longerphylogenetic distances because it saturates more slowly than dS itself.

Unexpectedly, PAML-based tests for differences in ω between teleosts andpaddlefish, unexpectedly, did not show significant differences (not shown) in spite ofthe large differences in amino acid substitution rates summarized in Table 1. The datain Tables 2A and B show that both, the synonymous as well as the non-synonymoussubstitution rate in the teleost lineage are increased by a factor 3 to 5, depending onthe gene and the method of estimation. For Hoxa-11 both dS as well as B4 aresignificantly higher in teleosts than in paddlefish according to RRTree. For Hoxa-13dS could not be estimated but B4 is significantly increased (differences in dS can notbe tested in PAML). The increased synonymous substitution rate can be the result ofchanges in the base composition and/or codon bias or an increased mutation rate.Below we show that the first two explanations are not supported by the data.

Accelerated evolution after Hox cluster duplication is not caused by changes incompositional or codon bias

To address the issue why the rate of synonymous substitutions increased after Hoxcluster duplications we consider three possibilities: a systematic shift in the basecomposition of the HoxA clusters, a change in codon bias and a general increase ofmutation rate in all nuclear genes.

As shown in Tables 2A and B, the GC content of the teleost HoxA clusters and genesdoes not differ systematically from out groups. On average there is a slightly higherGC content of teleost genes but this difference is only due to perciform species, i.e.fugu, tiliapia and stipped bass, but not zebrafish (data not shown), which also has anincreased dS. Hence there is no indication that the change in synonymous substitutionrate is due to a shift in GC content. Similarly, the effective number of codons, Nc, ameasure of codon bias (Wright 1990), is not systematically different between teleostsand outgroups like bichir, coelacanth, shark or paddlefish. There are a few geneswhich have changed their condon usage pattern, but they are single genes in singlelineages, like fugu Hoxa-2b, and can not account for the overall increase in

Page 7: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

7

synonymous substitution rate. Thus we conclude that the most likely explanation forthe increased synonymous substitution rate is an increase in mutation rate.

In order to test whether the apparent increase in mutation rate is limited to HoxAgenes or a genome wide pattern we examined another nuclear gene, RAG1. Asmentioned above, the main limitation for detecting any rate differences in sequenceevolution is the availability of close out-group taxa. We only found one gene withsufficient coverage of basal ray finned fish lineages in Genbank to estimate thesynonymous rate differences between teleosts and basal ray finned fishes, RAG1.

Rate of RAG1 evolution in ray finned fishes

We obtained RAG1 sequences from 12 species to compare the rate of evolutionbetween teleost and basal ray finned fish lineages, in particular paddlefish and Amia.The rate of evolution was estimated and tested for difference with three methods:codon based PAML (Yang 1997), codon model of (Muse and Gaut 1994) asimplemented in HYPHY.99beta, and tree based relative rate tests as implemented inRRTree (Robinson-Rechavi and Huchon 2000). RRTree tests for differences in non-synonymous substitution rates are all highly significant between Amia/Paddlefish andteleosts, as well as between Amia/Paddlefish and osteoglossomorphs. Hence teleostnuclear genes investigated here evolve at a higher non-synonymous rate than closeoutgroups.

The synonymous substitution rate could not be estimated with RRTree due tosaturation and even B4, the four fold degenerate transversion rate, could only beestimated with exceedingly high standard deviations. We then performed an RRTreetest with the 3rd codon positions. The nucleotide substitution rate at 3rd positions couldbe estimated with reasonable error variance and indicated no difference betweenteleost clades and Amia/Paddlefish. The synonymous substitution rates estimated byPAML are consistently higher in teleosts but no direct test for the significance ofthese estimates can be given in the PAML framework.

Finally we performed a local clock test based on the Muse and Gaut (1994) codonmodel for heterogeneity of dS rates. The results show that the teleosts are not differentfrom a clock model, and neither is a group including the Amia and Paddlfish. Hencesynonymous substitution rate is not distinguishable from a clock model in a cladewhich includes the basal rayfinned fishes Amia and Paddlefish as well as teleosts. Alocal clock test for non-synonymous substitutions, however, provides strong evidencefor rate heterogeneity within this clade (LR=48.52, P=8*10-8) but not within theteleosts clade. This is consistent with the results of tree based relative rate tests (seeabove).

In order to test the power of the local clock model to detect differences in thesynonymous substitution rate we performed a local clock test with the Hoxa-11 data.The results indicate significant rate heterogeneity at the level of inclusion predictedby the RRTree tests, i.e. significant heterogeneity in the clade including Paddlefishbut not within the teleost gene clade. Hence the clock test for synonymous rates ispowerful enough to detect synonymous rate heterogeneity in Hoxa-11 sequences atthat level of phylogenetic differentiation and with shorter sequences than the RAG1alignment. Hence it seems likely that the failure of detecting synonymous rate

Page 8: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

8

heterogeneity for RAG1 among the ray finned fishes is not due to lack of power. Itrather seems that there is a much smaller synonymous substitution rate difference,than found in the Hox genes, between teleosts at the one side and Amia andpaddlefish at the other side.

In summary, there is strong evidence for an increased non-synonymous substitutionrate in teleosts for RAG1, but no evidence for an increase in the synonymoussubstitution rate in teleosts. While there are still questions about the power to detectdifferences in the synonymous substitution rate with taxa as diverged as paddlefishand zebrafish, the available results from local clock tests and the rate of 3rd codonpositions argues against that possibility. Hence it is possible that the increased dS rateobserved for HoxA genes is Hox gene specific rather than a genome widephenomenon.

Asymmetry of divergence among 1st order paralogs

Among the zebrafish HoxA genes there are three 1st order paralog gene pairs: Hoxa-13a/b, Hoxa-11a/b and Hoxa-9a/b (Fig. 2). Among those genes only one, Hoxa-13a,has a significantly increased non-synonymous substitution rate compared to its 1st

order paralog Hoxa-13b (Table 3). Fugu, on the other hand, has five 1st order HoxAparalog pairs: Hoxa-13a/b, Hoxa-11a/b, Hoxa-10a/b, Hoxa-9a/b and Hoxa-2a/b(Fig.2). Of these, three have a significantly increased rate of non-synonymoussubstitutions, namely Hoxa-11b, Hoxa-10b and Hoxa-2b (Table 3). Note that all ofthe accelerated genes are b-paralogs. Furthermore the estimated rates for all of the b-paralogs are higher than that of the a-paralogs even for those two cases where thedifferences are not statistically significant, Hoxa-13a/b and Hoxa-9a/b. The a prioriprobability that in five instances paralogs from the same cluster has a higher estimatedrate than the other is 0.55 = 0.03125<5%. Hence it is possible that in fugu the b-paralogs are diverging at a systematically higher rate than the a-paralogs, rather thanparalogs randomly sampled from both paralog clusters.

Asymmetry of divergence of conserved non-coding sequences

The majority of 1st order paralog gene pairs are found in the 5’ segment of the HoxAgene cluster. To compare the rate of non-coding sequence evolution after clusterduplication we focused on the inter-genic sequences 3’ of Hoxa-13 and 5’ of Hoxa-9.In this analysis we also included the published sequence of the Tilapia HoxA-a cluster(Santini et al. 2003). A global alignment of the 5’HoxA clusters of shark and bichiridentified blocks of conserved nucleotides as described in Prohaska et al. (2004).Then we identified the corresponding blocks of conservation in the five teleost HoxAclusters from zebrafish, fugu and tilapia and subjected them to a relative rate test asdescribed in (Wagner et al. 2004). The results indicate that three of the duplicated5’HoxA clusters are indistinguishable with respect to the rate of modification ofconserved non-coding sequences among the clusters: the zebrafish HoxA-b, TilapiaHoxA-a and Fugu HoxA-a clusters (Table 4). Two clusters show a significantincrease in the rate of modification of conserved non-coding sequences: zebrafishHoxA-a and fugu HoxA-b clusters; but these are indistinguishable from each other.

The results of the asymmetric modification of conserved non-coding sequences showan intriguing pattern of congruence with the divergence of the coding sequences. In

Page 9: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

9

zebrafish the only gene with asymmetric divergence is a HoxA-a paralog, Hoxa-13aand the non-coding sequences of the HoxA-a cluster are diverging at a higher ratethan that of the zebrafish HoxA-b cluster. There is however, a possible confoundingfactor in this comparison since it is also the HoxA-a cluster that lost its Hoxa-10aparalog and thus may lose non-coding sequence conservation at a higher rate thanHoxA-b cluster because of that.

In fugu, however, the pattern is clearer. All five b-paralogs are estimated to haveincreased rates of non-synonymous substitutions (although only three of them arestatistically significant, Table 3) and also the non-coding sequences of the HoxA-bcluster diverge faster than the fugu HoxA-a cluster. Notably the non-codingsequences of the fugu HoxA-b cluster also diverge faster than the zebrafish HoxA-bcluster. The latter observation is significant because both, the zebrafish and the FuguHoxA-b clusters have the same gene inventory (Fig. 2). Hence the increased rate ofnon-coding sequence evoolution in the fugu HoxA-b cluster cannot be explained bydifferences in the number of genes retained. Thus in the fugu we observe a concertedasymmetric divergence of both the coding as well as the non-coding sequences of the5’HoxA-b clusters after duplication. Furthermore this patter is distinctly differentfrom that observed in the zebrafish, where the HoxA-a cluster divergences at a higherrate. Hence the majority of the asymmetric divergence occurred after the split of thezebrafish and the fugu lineage, rather than immediately after Hox cluster duplication.

In order to examine whether the non-coding sequence evolution rate increase isuniform along the Hox cluster we performed relative rate tests on each of theintergenic regions independently (Fig. 4). In the zebrafish HoxAa cluster the higherrate of divergence of the a-paralog is focused on the intergenic regions between Hoxa-13 and Hoxa-11 as well as between Hoxa-11 and Hoxa-10. These two regions are 3’of Hoxa-13, which has an accelerated rate of non-synonymous divergence and 5’ ofHoxa-10, which in the HoxAa cluster is a pseudogene. In fugu, the only significantlyincreased rate of non-coding divergence is between Hoxa-10 and Hoxa-9, whereHoxa-10 has an accelerated rate of evolution. Hence, the rate acceleration isdifferential among different ingergenic regions and could provide hints at themolecular level of regulatory changes following Hox cluster duplications.

Discussion

Rate of sequence evolution in teleosts

For all genes analyzed in this paper, including the non-Hox gene RAG1, the rate ofnon-synonymous substitutions is higher in teleosts than in close out groups, i.e. basalbony fish lineages like paddlefish and bichir. This finding is consistent with at leasttwo previously published reports. Robinson-Rechavi and Laudet (2001) reported thatmost nuclear genes evolve quicker in teleosts than in mammals, independent of thepresence of paralog genes, i.e. teleost genes evolve quicker regardless of whetherthere is potential for genetic redundancy or not. Krieger and Fuerst (2002) reported asmaller rate of amino acid evolution in sturgeons compared to teleost genes, inaccordance to the results reported here. The interpretation of these authors, however,was that the rate in sturgeons is depressed, but this feature is shared with all basalgnathostome lineages, including bichir and Latimeria. Hence it is more parsimonious

Page 10: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

10

to assume that the high rate of amino acid sequence evolution in teleosts is derived,rather than the low rate in bichir and paddlefish.

In teleost Hox genes the increased rate of non-synonymous substitutions is notnecessarily indicative of relaxed purifying selection or directional selection. Otherstudies have established that recently duplicated genes have a significantly increaseddN/dS rate ratio ω indicating either relaxed selection or directional selection (Conantand Wagner 2003; Kondrashov et al. 2002; Lynch and Conery 2000). The failure todetect an increased ω in teleost Hox genes may be due to two factors. First, theduplication of teleost genes are likely associated with a genome duplication whichmay have occurred as long as 320 Mio years ago (Vanderpoele et al. 2004). Lynchand Connery (2000) have shown that older duplicates return to lower ω values thanrecent duplicates. On the other hand, we report evidence that the synonymoussubstitution rate, dS, is also substantially increased. An increased rate of dS makes itharder to detect relaxed stabilizing selection.

Not many studies have addressed the questions whether synonymous substitutionrates differ between teleosts and other bony fish. Merrit and Quattro studied theevolution of cytosolic malate dehydrogenase genes and reported an increased rate ofevolution of teleost genes compared to sturgeon and mammalian genes. The authorsreport an increased dN/dS ratio and even directional selection in one lineage, i.e.dN/dS>1. The average number of synonymous changes in the teleost lineages(111.75), however, is similar to that in the sturgeon lineage (99) and not comparablewith the roughly threefold increase we found in the HoxA genes in this study. We arenot aware of any study that has investigated dS for other teleost nuclear genes andcompared it with basal actinopterygians. Our attempt to test whether the non-Hoxgene RAG1 also has increased dS were ambiguous, but the balance of evidencesuggests that dS for RAG1 is not increased among teleosts, at least not as much as inHoxA genes. Hence, it is possible that the increased rate of synonymous substitutionsis a phenomenon restricted to duplicated Hox clusters or may be even only the HoxAclusters of teleosts.

Of the possible explanations for an increased rate of synonymous substitutions wecould eliminate two: changes in base composition and changes in the degree of codonbias. We thus conclude that the most likely explanation for the increased dS is anincreased mutation rate but also note that this conclusion is only supported by theelimination of known alternatives and not by direct evidence. It would be desirable tocompare the mutation rates of homologous genes in teleost and non-teletost fishes.

Asymmetrical divergence of paralog genes

There are a total of eight 1st order HoxA paralog gene pairs, three gene pairs inzebrafish and five in fugu. Four of those paralog pairs (50%) diverge at a rate that isstatistically distinguishable at the 5% level. This fraction is comparable of thatreported by Zhang et al (Zhang et al. 2003) for recent duplicates in the human genome(60%). G. Conant and A. Wagner (2003) report asymmetry in 20 to 30% of duplicatedyeast, Drosophila and Caenorhabditis elegans genomes. Van de Peer andcollaborators (2001) found that 50% of duplicated zebrafish genes have significantasymmetry in divergence. Kellis et al. (2004) report about 16% asymmetrically

Page 11: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

11

diverging genes in yeast, although these numbers are not comparable because they didnot apply rigorous statistical tests to detect asymmetry. In contrast, Kondrashov et al.(2002) only found significant asymmetry in two out of 49 (4%) recently duplicatedmammalian genes and two out of eleven (18%) in Drosophila. While there might beboth biological as well as methodological reasons responsible for these discrepancies,the results show that the HoxA genes in zebrafish and fugu are at the upper end of thefrequency of asymmetrically diverging genes families even though they are notrecently duplicated like the mammalian genes reported by Zhang et al., (2003). Thisresult of consistent with that of Van de Peer et al. (2001) who also found high levelsof asymmetrical divergence in a sample of genes containing a large fraction oftranscription factors.

In fugu the asymmetrical divergence of duplicated HoxA genes has two intriguingadditional features. In all of the five 1st order paralog HoxA gene pairs the b-paralogshave a higher estimated rate of dN than the a-paralog (of which three are statisticallysignificant), which by chance is less likely than 5%. It is thus possible that theasymmetric divergence of Hox genes on a cluster is not independent but is correlatedperhaps because of the strong regulatory integration among the Hox genes.Acquisition of new functions by one Hox gene may increase the chance of functionalrecruitment of other genes on the same cluster. This conclusion is also supported bythe fact that many secondary functions of AbdB Hox genes, like their role in pairedappendage development and reproductive tract development involve multiple Hoxgenes, and often all the AbdB related genes on a cluster (Haack and Gruss 1993;Kobayashi and Behringer 2003; Taylor et al. 1997). These facts suggest that the set ofAbdB related genes act like a co-evolutionary module (Schlosser 2002).

Furthermore, the conserved non-coding regions in the cluster segment 5’ of Hoxa-9(the AbdB related set of genes), also diverges asymmetrically with the fugu 5’HoxA-bcluster evolving faster. This suggest, that in fugu the HoxA-b cluster diverged fasterwith respect to protein coding sequences as well as putative cis-regulatory elements.In the zebrafish the HoxA-b cluster is neither accelerated in any of its protein codinggenes nor with respect to its non-coding sequences. The situation in the zebrafishHoxA-a cluster is complicated due to a higher rate of gene loss than the fugu HoxA-acluster. In any case the results show that the pattern of asymetric gene divergencediffers between the zebrafish and the fugu lineage.

In both the zebrafish HoxAa cluster and the fugu HoxAb cluster the rate accelerationof non-coding sequence evolution is heterogenous along the cluster. In either case theintergenic segments that show accelerated evolution are localized in the vicinity ofcoding regions, which diverge asymmetrically (Fig. 4). Thus, it is likely that theregulatory changes that lead to the asymmetrical divergence of Hox genes arelocalized and thus specific to particular expression domains rather than distributedgeneral along the whole paralog sequences.

The relationship between the rate of cis-regulatory element evolution and the rate ofprotein evolution has recently been investigated in nematodes (Castillo-Davis et al.2004). These authors found no significant correlation between dN and the divergenceof promoter proximal transcription factor binding sites if the regression is correctedfor differences in dS, i.e. time of duplication and mutation rate. Hence the associationreported here between putative cis-regulatory element evolution and Hox protein

Page 12: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

12

divergence was not found in a genome wide study. The reason for this discrepancymight be that the mechanistic reason for protein divergence differs between differentclasses of proteins. Castillio-Davis and collaborators did not report whether thecorrelation between cis-regulatory element divergence and protein sequencedivergence differs between different classes of proteins. For transcription factorsdifferences in tissue expression is possibly associated with a different set of tissuespecific protein interaction partners (Featherstone 2003). Hence it is mechanisticallyplausible that divergence in gene regulation, inferred from the divergence in putativecis-regulatory elements, and protein sequence evolution are more strongly correlatedin transcription factor genes than in other genes.

It is noteworthy that the asymmetric divergence of paralog genes is different betweenthe zebrafish and the fugu lineage. For instance, in zebrafish the Hoxa-13a paralog isdiverging faster while in fugu it is the Hoxa-13b paralog. This implies that the fasterdivergence has happened after the split of the zebrafish and fugu lineages rather thanimmediately after the duplication event, suggesting that asymmetrical divergence isdriven by forces that do not acting immediately after the duplication but take sometime to mount and are random among lineages. This result could also explain why thefrequency of asymmetrically diverging duplicated genes is very low among recentlyduplicated genes (4 to 18%, (Kondrashov et al. 2002)). An interpretation of thispattern is that asymmetrical divergence is indicative of a novel function by the fasterdiverging gene (this possibility is considered further below with the zebrafish Hoxa-13 genes). The acquisition of a novel function is probably only possible after aconsiderable period of random divergence due to relaxed stabilizing selection. Thisscenario would explain the delay after duplication in the development of asymmetricdivergence.

Conclusions:

Based on the results reported here we propose a two stage model of Hox geneevolution following Hox cluster duplication consistent with the idea that duplicatedHox genes can contribute to physiological or developmental innovations (Malaga-Trillo and Meyer 2001; Ohno 1970; Wagner et al. 2003).

1. Immediately following the Hox cluster duplication there was a phase ofincreased random divergence including gene loss and increased mutationrates.

2. Some time after the duplication one paralog of a subset of the duplicated genesacquires novel functions and then diverges faster than the other paralog.

Evidence for a phase of increased random divergence following duplication comesfrom the increased rate of non-synonymous substitutions in teleosts. The Hox genesshare this feature with other nuclear genes. In addition, however, there is alsoevidence for an increased mutation rate of Hox genes, not seen in other nuclear genes.Hence, in Hox genes at least, the increased coding sequence divergence is not alonecaused by relaxed stabilizing selection due to genetic redundancy, but probably alsoby enhanced mutation rate.

Page 13: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

13

Strongly asymmetric divergence of first order paralogs has been considered asevidence for the acquisition of new functions by one paralog and the maintenance ofthe ancestral function by the other paralog. The asymmetry arises to a large degreeafter the split of zebrafish and fugu lineages and differs between these two lineages.This feature is also suggestive of the acquisition of a new function, since it is likelythat recruitment in a new function requires some time.

Acknowledgements: we thank Dr Gavin Conant for reading an earlier version of thismanuscript. The financial support by NSF grant INB-0321470 to GPW and the DFGBioinformatics Initiative BIZ-6/1-2 to PFS is gratefully acknowledged.

Page 14: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

14

Table 1: comparison of amino acid substitution rate of teleost genes compared to anon-duplicated outgroup. dA stands for the maximum likelihood estimate of theamino acid substitution rate as calculated by HYPHY under the Dayhoff substitutionmodel (Muse and Gaut 1994). The subscript T indicates the rate of the teleost genelineage and O stands for that of the outgroup lineage. Note that all ratios dAT/dAO arelarger than 1, indicating that the rate in the teleost lineage is estimated to be higherthan in the outgroup lineage. Fifteen of the eighteen comparisons (83%) arestatistically significant.

Gene dAT dAT/dAO P outgroupDr-Hoxa-2b 0.060 1.598 0.538 n.s. bichirTr-Hoxa-2a 0.097 1.898 0.289 n.s. bichirTr-Hoxa-2b 0.198 6.821 0.001 ** bichirDr-Hoxa-9a 0.546 2.045 0.003 ** bichirDr-Hoxa-9b 0.367 1.872 0.029 * bichirTr-Hoxa-9a 0.606 2.304 0.0004 *** bichirTr-Hoxa-9b 0.529 2.726 0.0003 *** bichirDr-Hoxa-10b 0.285 2.500 0.047 * bichirTr-Hoxa-10a 0.308 2.053 0.087 n.s. bichirTr-Hoxa-10b 0.473 6.088 0.0005 *** bichirDr-Hoxa-11a 0.175 4.768 0.0002 *** paddlefishDr-Hoxa-11b 0.147 4.153 0.0009 *** paddlefishTr-Hoxa-11a 0.154 3.775 0.0012 ** paddlefishTr-Hoxa-11b 0.235 6.546 9.3 10-7 **** paddlefishDr-Hoxa-13a 0.202 4.810 2.8 10-6 **** paddlefishDr-Hoxa-13b 0.157 4.448 6.9 10-5 **** paddlefishTr-Hoxa-13a 0.131 2.433 0.0086 ** paddlefishTr-Hoxa-13b 0.207 2.875 0.0002 *** paddlefish

Page 15: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

15

Table 2: comparison of nucleotide substitution rates, base composition and codon biasbetween teleost Hox genes and paddlefish. The PAML rate estimates for the teleostare averages over the terminal branches, while the RRTree estimates are averagesover the whole clade. For Hoxa-13 the synonymous rate could not be estimated mostlikely because of saturation. Note that both the non-synonymous as well as thesynonymous rates are increased in the duplicated teleost genes. The increase insynonymous substitution rate could neither be explained by a shift in GC content norby a chance in codon bias. dN = non-synonymous substitution rate; dS = synonymoussubstitution rate; B4 = transversion rate at fourfold degenerate sites. GC = fraction ofG and C in the coding sequence; GC3 = fraction of G and C in third codon positions;Nc = effective number of codons (high Nc indicates low codon bias; (Wright 1990)).

A) Hoxa-11

RRTree PAMLHoxa-11 dN dS B4 dN dS GC GC3 NcTeleosts 0.116 1.554 0.892 0.066 1.074 0.53 0.63 57.8Polyodon 0.032 0.508 0.195 0.012 0.353 0.49 0.59 56.6Factor 3.6 3.1 4.5 5.5 3.0 1.1 1.07 1.02P 10-7 3.6*10-5 9.6*10-3

B) Hoxa-13

RRTree PAMLHoxa-13 dN dS B4 dN dS GC GC3 NcTeleosts 0.123 --- 0.91 0.0621 0.970 0.52 0.56 57.1Polyodon 0.037 ---- 0.33 0.0103 0.129 0.50 0.48 58.4Factor 3.3 --- 2.7 6.0 7.5 1.0 1.2 1.00P 10-7 sat 4.1*10-4

C) RAG1RRTree PAML

RAG1 dN B4 dN dS GC GC3s NcTeleosts 0.139 5.88# 0.182 1.79 0.52 57.5 54.1(Polyodon, Amia) 0.093 4.4 0.087 0.548 0.51 56.5 53.6Factor 1.5 1.2 2.1 3.26 1.0 1.0 1.0P 9*10-5 1.00#

#) Saturation, test is not working

Page 16: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

16

Table 3: comparison of non-synonymous substitutions rates of paralog gene pairs. Amaximum likelihood relative rate test was performed using the codon model of Museand Gaut (1994) as implemented in HYPHY. The outgroup for Hoxa-2, Hoxa-10, andHoxa-11 was the corresponding human gene, for Hoxa-9 and Hoxa-13 the outgroupwas Xenopus tropicalis. LR stands for likelihood ratio.

GenedN of

paralog-adN of

paralog-b LR PDr Hoxa-9 0.1497 0.2082 0.0820 0.775Dr Hoxa-11 0.0876 0.0837 0.0176 0.894Dr Hoxa-13 0.1256 0.0613 5.0095 0.025Tr Hoxa-2 0.0796 0.1189 4.2991 0.038Tr Hoxa-9 0.1213 0.1899 0.1389 0.709Tr Hoxa-10 0.1683 0.1935 9.3324 0.002Tr Hoxa-11 0.0805 0.1544 4.7850 0.028Tr Hoxa-13 0.0796 0.1098 1.3691 0.242

Page 17: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

17

Table 4: comparison of the rate of evolution of conserved non-coding regions in theinter-genic regions between Hoxa-13 and Hoxa-9 of the zebrafish and fugu HoxAclusters according to the method described in Wagner et al., (2004). In this methodtwo distant outgroups are compared, in this case shark and bichir, to identifyconserved non-coding sequences. Then the orthologous sequence elements areidentified in the two ingroup sequences, in this case the two paralog clusters inzebrafish and fugu and the amount of sequence conservation compared. The degreeof modification is here measured by the rate of retained conserved nucleotidepositions r. A modified z’-statistic is calculated to account for stochastic dependenceamong the nucleotide positions. This test allows to assess whether the retention ratesof conserved nucleotide positions is statistically different. The results show that thezebrafish Hox cluster segment DrAa13-9, i.e. the a-paralog, retains significantlyfewer conserved nucleotide positions (about 20%) than its zebrafish paralog DrAb13-9 as well its orthologs in fugu and tilapia, TrAa13-9 and OnAa13-9. The results alsoshow that the fugu TrAb13-9, the b-paralog, evolves faster in its putative cis-regulatory elements than its paralog TrAa13-9 and its zebrafish ortholog DrAb13-9.

Cluster 1 Cluster 2 r(cluster 1) r(cluster 2) z’ PDrAa13-9 DrAb13-9 0.200 0.260 1.685 0.046DrAa13-9 TrAa13-9 0.199 0.282 2.838 0.002DrAa13-9 OnAa13-9 0.197 0.286 3.056 0.001TrAa13-9 OnAa13-9 0.286 0.292 0.302 0.375TrAa13-9 TrAb13-9 0.295 0.206 2.589 0.005DrAb13-9 TrAb13-9 0.264 0.203 1.873 0.031DrAb13-9 TrAa13-9 0.263 0.289 0.826 0.203

Page 18: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

18

Figure captions

Figure 1: phylogenetic relationships among the gene sequences analyzed in this paper.A) gene tree of the HoxA genes analyzed. Bichir is considered the most basal rayfinned fish lineage. Paddlefish together with the sturgeons belongs to the second mostbasal ray finned fish lineage (Inoue et al. 2003), and the two paralogs of the HoxAclusters in fugu and zebrafish derive from the same duplication event (Amores et al.2004). B) phylogenetic relationships among the RAG1 genes analyzed in this paper.The bowfin, Amia calva, and paddlefish are considered as belonging to a cladeaccording to the results of Inoue et al. (2003). The relationships of theosteoglossomorph sequences (Osteroglossus, Panthodon, Gnathonbemus andNotopterus) are based on (Guo-Qing and Wilson 1996).

Figure 2: the gene inventory of the HoxA clusters analyzed in this paper. The shark,Heterodontus francisci, HoxA cluster, HfHoxA, has been described in Kim et al.,(2000) and Chiu et al., (2002) as HoxM cluster. The coelacanth, Latimeriamenadoensis, HoxA cluster, LmHoxA, is described in Powers et al., (2004), and thebichir, Polypterus senegalus, cluster, PsHoxA, has been described in Chiu et al.,(2004). The paddlefish Hox genes are only known from three partial HoxA genesequences: Hoxa-13, and Hoxa-11 are represented by almost complete codingsequences including the intron (Metscher et al. 2004) and Hoxa-1 by a smallerfragment (J. McClintock, C. Jozefowicz, and V. E. Prince, AY188522). The zebrafishHoxA clusters, DrHoxAa/b, are based on the complete Hox cluster sequencedescribed in Chiu et al. (2002), and the fugu sequences, TrHoxAa/b, are from the fugugenome sequence versions 2.0 and 3.

Figure 3: amino acid sequence evolution of Hoxa-11 (A) and Hoxa-13 (B) duplicatedgene is accelerated compared to the non-duplicated genes of paddlefish, bichir andcoelacanth.

Figure 4: spatial distribution of sequence divergence along the 5’ segment of theduplicated teleost HoxA clusters. The open boxes symbolize the coding regions(ignoring introns) and the lines the intergenic sequences (not to scale). The gray boxesin the intergenic regions give the difference in the fraction of conserved non-codingsequence positions modified in one of the two paralogs: (1-r(HoxAa) – (1-r(HoxAb)).If the difference is >0, i.e. if the HoxAa cluster is more modified than the HoxAbcluster the bar is on top of the line, if the HoxAb is more modified, the bar is drawnbelow the line. In the coding regions the gray bars symbolize the difference in the rateof non-synonymous substitutions (Table 3). Symbols: * P<5%, ** P<2.5%, ***P<1%; n.s. P>5%.

Page 19: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

19

zebr

afis

h A

fugu

A

zebr

afis

h B

fugu

B

padd

lefi

sh

bich

ir

Lati

mer

ia

shar

k

zebr

afis

h

cod

flou

nder

fugu

Ost

eogl

ossu

s

Pant

hodo

n

Gna

thon

emus

Not

opte

rus

bow

fin

padd

lefi

sh

bich

ir

Lati

mer

ia

shar

k

A B

Wagner et al. Figure 1

Page 20: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

20

Wagner et al., Figure 2

Page 21: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

21

Page 22: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

22

Wagner et al., Figure 4

Page 23: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

23

References:

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignmentsearch tool. J. Mol. Biol. 215:403-410

Amores A, Force A, Yan Y-L, Joly L, Amemiya C, Fritz A, Ho RK, Langeland J,Prince V, Wang Y-L, Westerfield M, Ekker M, Postlethwait HH (1998)Zebrafish hox clusters and vertebrate evolution. Science 282:1711-1714

Amores A, Suzuki T, Yan Y-L, Pomroy J, Singer A, Amemiya C, Postlethwait J(2004) Developmental roles of pufferfish Hox clusters and genome evolutionin

ray-fin fish. Genome Res. 14:1-10Bermis WE, Findeis EK, Grande L (1997) An overview of Acipenseriformes.

Environ. Biol. Fish. 48:25-71Castillo-Davis CI, Kondrashov FA, Hartl DL, Kulathinal RJ (2004) The functional

genomic distribution of protein divergence in two animal phyla: coevolution,genomic conflict, and constraint. Genome Research 14:802-811

Chiu C-h, Amemiya C, Dewar K, Kim C-b, Ruddle F, Wagner GP (2002) Molecularevolution of the HoxA cluster in three major gnathostome lineages. PNAS99:5492-5497

Chiu C-h, Dewar K, Wagner GP, Takahashi K, Ruddle F, Ledje C, Bartsch P,Scemama J-L, Stellwag E, Fried C, Prohaska SJ, Stadler PF, Amemiya CT(2004) Bichir HoxA cluster sequence reveals surprising trends in ray-finnedfish genomic evolution. Genome Research 14:11-17

Conant GC, Wagner A (2003) Asymmetric sequence divergence of duplicate genes.Genome Research 13:2052-2058

Dehal P, et a (2002) The draft genome of Ciona intestinalis: insights into chordate andvertebrate origins. Science 298:2157-2167

Featherstone M (2003) HOX proteins and their co-factors in trancriptional regulation.In: Lufkin T (ed) Murine homeobox gene control of embryonic patterning andorganogenesis. Elsevier, Amsterdam, NL, p 1-42

Ferrier DK, Minguillón C, Holland PWH, Garcia-Fernández J (2000) The amphioxusHox cluster: deuterostome posterior flexibility and Hox14. Evol. Dev. 2:284-293

Fried C, Prohaska SJ, Stadler PF (2003) Independent Hox-cluster duplications inLampreys. J. Exp. Zool. (Mol. Dev. Evol.) Part B 299:18-25

Fried C, Prohaska SJ, Stadler PF (2004) Exclusion of repetitive DNA elements fromgnathostome Hox clusters. J Exp Zool Part B (Mol Dev Evol) 302B:165-173

Guo-Qing L, Wilson MVH (1996) Phylogeny of Osteoglossomorpha. In: Stiassny M,Parenti L, Johnson GD (eds) Interrelationships of Fishes. Academic Press, SanDiego, CA, p 163-174

Haack H, Gruss P (1993) The Establishment of Murine Hox-1 Expression Domainsduring Patterning of the Limb. Developmental Biology 157:410-422

Holland PW, Garcia-Fernandez J (1996) Hox genes and chordate evolution.Developmental Biology 173:382-395

Holland PWH, Garcia-Fernández J, Williams NA, Sidow A (1994) Gene duplicationand the origins of vertebrate development. Development (Suppl.):125-133

Inoue JG, Miya M, Tsukamoto K, Nishida M (2003) Basal actinopterygianrealtionships: a mitogenomic perspective on the phylogeny of the "ancientfish." Mol Phyl. Evol 26:110-120

Page 24: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

24

Ji FY, Liu JD, Yi MS, Huang L, Zhou F, Yu QX (2002) Chromosomal localization ofrice filed eel Hox genes by PRINS. Acta Genetica Sinica 29:612-615

Kappen C, Schugart K, Ruddle FH (1989) Two steps in the evolution ofAntennapedia-class vertebrate homeobox genes. PNAS 86:5459-5463

Kellis M, Birren BW, Lander ES (2004) Proof and evolutionary analysis of ancientgenome duplication in the yeast Saccharomyces cerevisiae. Nature 428:617-624

Kim C-B, Amemiya C, Bailey W, Kawasaki K, Mezey J, MillerII W, Minoshima S,Shimizu N, Wagner GP, Ruddle F (2000) Hox cluster genomics in the hornshark, Heterodontus francisci. PNAS 97:1655-1660

Kobayashi A, Behringer RR (2003) Developmental genetics of the femalereproductive tract in mammals. Nature Rev Genet. 4:969-980

Kondrashov FA, Rogozin IB, Wolf YI, E.V K (2002) Selection in the evolution ofgene duplications. Genome Biology 3:REASEARCH0008

Krieger J, Fuerst PA (2002) Evidence for a slowed rate of molecular evolution in theorder Acipenseriformes. Mol. Biol. Evol. 19:891-897

Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicategenes. Science 290:1151-1155

Malaga-Trillo E, Meyer A (2001) Genome duplication and accelerated evolution ofHox genes and cluster architecture in teleost fishes. Amer. Zool. 41:676-686

Martinez P, Amemiya CT (2002) Genomics of the HOX gene cluster. Comp.Biochem. Physiol. Part B 133:571-580

Metscher BD, Takahashi K, Crow K, Amemiya C, Nonaka DF, Wagner GP (2004)Expression of Hoxa-11 and Hoxa-13 in the pectoral fin of a basal rayfinnedfish, Polyodon spathula: implications for the origin of tetrapod limbs.Evolution and Development revision

Meyer A, Malaga-Trillo E (1999) Vertebrate genomics: more fishy tales about Hoxgenes. Curr. Biol. 9:210-213

Meyer A, Schartl M (1999) Gene and genome duplications in vertebrates: the one-to-four (-to-eight in fish) rule and the evolution of novel gene functions. CurrOpin Cell Biol 11:699-704

Misof BY, Wagner GP (1996) Evidence for four Hox clusters in the Killifish,Fundulus heteroclitus (Teleostei). Mol. Phylogen. Evol. 5:309-322

Morgenstern B (1999) DIALIGN 2:improvement of the segment-to-segment approachto multiple sequence alignment. Bioinformatics 15:21-218

Muse SV, Gaut BS (1994) A likelihood approach for comparing synonymous andnon-synonymous nucleotide sequences and its applications. Mol. Biol. Evol.11:715-724

Naruse K, Fukamachi S, Mitani H, Kondo M, Matsuoka T, Kondo S, Hanmura N,Morita Y, Hasegawa K, Nishigaki R, Shimada A, Wada H, Kusakabe T,Suzuki N, Kinoshita M, Kanamori A, Terado T, Kimura H, Nonaka M, ShimaA (2000) A detailed linkage map of medaka, Oryzias latipes: comparativegenomics and genome evolution. Genetics 154:1773-1784

Naruse K, Tanaka M, Mita K, Shima A, Postleithwait J, Mitani H (2004) A medakagene map: the trace of ancestral vertebrate proto-chromosomes revewaled bycompariative gene mapping. Genome Research 14:820-828

Ohno S (1970) Evolution by Gene Duplication. Springer Verl., New YorkPavell AM, Stellwag EJ (1994) Survey of Hox-like genes in the teleost Morone

saxatilis: implications for evolution of the Hox gene family. Mol. Marine Biol.Biotech. 3:149-157

Page 25: Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters · 2004. 8. 10. · Molecular Evolution of Duplicated Ray Finned Fish HoxA Clusters: Increased synonymous substitution

25

Peer YVd, Taylor JS, Braasch I, Meyer A (2001) The ghost of selection past: rates ofevolution and functinal divergence of anciently duplicated genes. J. Mol. Evol.53:436-446

Powers TP, Amemiya CT (2004) Evidence for a Hox14 paralog group in vertebrates.Curr. Biol. 14:R183-184

Prince V, Pickett FB (2002) Splitting pairs: the diverging fates of duplicated genes.Nature Reviews Genetics 3:827-837

Prohaska SJ, Fried C, Flamm C, Wagner GP, Stadler PF (2004) Surveyingphylogenetic footprints in large gene clusters: application to Hox clusterduplications. Mol. Phylo. Evol. 31:581-604

Prohaska SJ, Stadler PF (2004) The duplication of the Hox gene clusters in teleostfishes. Theory in Bioscience 123:89-110

Robertson-Rechavi M, Laudet V (2001) Evolutionary rates of duplicated genes in fishand mammals. Mol. Biol. Evol. 18:681-683

Robinson-Rechavi M, Huchon D (2000) RRTree: relative-rate tests between groups ofsequences on a phylogenetic tree. Bioinformatics 16:296-297

Ruddle FH, Bartels JL, Bentley KL, Kappen C, Murta MT, Pendelton JW (1994a)Evolution of Hox genes. Annu. Rev. Genet. 28:423-442

Ruddle FH, Bentley KL, Murtha MT, Risch N (1994b) Gene loss and gain in theevolution of the vertebrates. Development (Suppl):155-161

Ruvkun G, Hobert O (1998) The taxonomy of developmental control inCaenorhabditis elegans. Science 282:2033-2041

Santini S, Boore JL, Meyer A (2003) Evolutionary conservation of regulatoryelements in vertebrate Hox gene clusters. Genome Res. 13:1111-1122

Schlosser G (2002) Modularity and the units of evolution. Theory in Bioscience121:1-80

Snell EA, Scemama JL, Stellwag EJ (1999) Genomic organization of the Hoxa-4-Hoxa10 region from Morone saxatilis: implications for Hox gene evolutionamong vertebrates. J. Exp. Zool. (Mol. Dev. Evol.) 285:41-49

Taylor HS, Vanden HG, Igarashi PA (1997) A conserved Hox axis in the mouse andhuman reproductive system: late establishment and persistent expression ofthe Hoxa cluster genes. Biol Reprod 57:1338-1345

Taylor JS, Braasch I, Frickey T, Meyer A, Peer YVd (2003) Genome duplication, atrait shared by 22,000 species of ray-finned fish. Genome Research 13:382-390

Vanderpoele K, Vos WD, Taylor JS, Meyer A, Peer YVd (2004) Major events in thegenome evolution of vertebrates: paranome age and size differ considerablybetween ray-finned fishes and land vertebrates. PNAS 101:1638-1643

Wagner GP, Amemiya C, Ruddle F (2003) Hox cluster duplications and theopportunity for evolutionary novelties. PNAS 100:14603-14606

Wagner GP, Fried C, Prohaska SJ, Stadler PF (2004) Divergence of conserved non-coding sequences: rate estimates and relative rate tests. Mol. Biol. Evol. inpress

Wright F (1990) The 'effective number of codons' used in a gene. Gene 87:23-29Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum

likelihood. CABIOS 13:555-556Zhang P, Gu Z, Li WH (2003) Different evolutionary patterns between young

duplicate genes in the human genome. Genome Biology 4:R56