18
Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic protein family Rafael Zardoya 1 Departamento de Biodiversidad y Biolog´ ıa Evolutiva, Museo Nacional de Ciencias Naturales, CSIC, Jos ´ e Guti ´ errez Abascal, 2, 28006 Madrid, Spain Background information. MIPs (major intrinsic proteins) form channels across biological membranes that control recruitment of water and small solutes such as glycerol and urea in all living organisms. Because of their widespread occurrence and large number, MIPs are a sound model system to understand evolutionary mechanisms underlying the generation of protein structural and functional diversity. With the recent increase in genomic projects, there is a considerable increase in the quantity and taxonomic range of MIPs in molecular databases. Results. In the present study, I compiled more than 450 non-redundant amino acid sequences of MIPs from NCBI databases. Phylogenetic analyses using Bayesian inference reconstructed a statistically robust tree that allowed the classification of members of the family into two main evolutionary groups, the GLPs (glycerol-uptake facilitators or aquaglyceroporins) and the water transport channels or AQPs (aquaporins). Separate phylogenetic analyses of each of the MIP subfamilies were performed to determine the main groups of orthology. In addition, comparative sequence analyses were conducted to identify conserved signatures in the MIP molecule. Conclusions. The earliest and major gene duplication event in the history of the MIP family led to its main functional split into GLPs and AQPs. GLPs show typically one single copy in microbes (eubacteria, archaea and fungi), up to four paralogues in vertebrates and they are absent from plants. AQPs are usually single in microbes and show their greatest numbers and diversity in angiosperms and vertebrates. Functional recruitment of NOD26-like intrinsic proteins to glycerol transport due to the absence of GLPs in plants was highly supported. Acquisition of other MIP functions such as permeability to ammonia, arsenite or CO 2 is restricted to particular MIP paralogues. Up to eight fairly conserved boxes were inferred in the primary sequence of the MIP molecule. All of them mapped on to one side of the channel except the conserved glycine residues from helices 2 and 5 that were found in the opposite side. Introduction Water recruitment is essential for life, and living or- ganisms evolved a wide array of membrane integral protein channels to ensure its passive transport into cells (Chrispeels and Agre, 1994; Borgnia et al., 1999). These channels termed MIPs (major intrinsic proteins) are found in eubacteria, archaea, fungi, plants and animals (Engel and Stahlberg, 2002). MIPs are particularly abundant in plants where they show ubiquitous locations (Kjellbom et al., 1999; 1 Email [email protected]. Key words: amino acid variation, aquaglyceroporin, aquaporin, Bayesian inference, multigene family. Abbreviations used: AQP, aquaporin; GLP, glycerol-uptake facilitator; MIP, major intrinsic protein; NIP, NOD26-like intrinsic protein; PIP, plasma membrane intrinsic protein; SIP, small basic intrinsic protein; TIP, tonoplast intrinsic protein. Johansson et al., 2000; Maurel et al., 2002; Wallace and Roberts, 2004), and in vertebrates where they are mostly restricted to fluid-conducting organs such as kidney and lungs or to secretory (salivary, lacrimal, sweat) glands (Agre, 1997; Echevarria and Ilund´ ain, 1998; Takata et al., 2004). Despite their abundance and important role in osmoregulation, it was not until the early 1990s when the first MIP was purified from human red blood cells and shown to increase the per- meability of water in Xenopus oocytes (Preston et al., 1992). Since then, more than 450 members of the MIP family have been identified. At present, new members of the MIP family are being discovered at a rapid rate and more biological roles are being de- scribed for them. By controlling osmolarity, MIPs seem to affect indirectly many physiological and cel- lular processes, and their potential in biomedicine www.biolcell.org | Volume 97 (6) | Pages 397–414 397

Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article

Phylogeny and evolution of themajor intrinsic protein familyRafael Zardoya1

Departamento de Biodiversidad y Biologıa Evolutiva, Museo Nacional de Ciencias Naturales, CSIC, Jose Gutierrez Abascal, 2, 28006

Madrid, Spain

Background information. MIPs (major intrinsic proteins) form channels across biological membranes that controlrecruitment of water and small solutes such as glycerol and urea in all living organisms. Because of their widespreadoccurrence and large number, MIPs are a sound model system to understand evolutionary mechanisms underlyingthe generation of protein structural and functional diversity. With the recent increase in genomic projects, there isa considerable increase in the quantity and taxonomic range of MIPs in molecular databases.

Results. In the present study, I compiled more than 450 non-redundant amino acid sequences of MIPs from NCBIdatabases. Phylogenetic analyses using Bayesian inference reconstructed a statistically robust tree that allowedthe classification of members of the family into two main evolutionary groups, the GLPs (glycerol-uptake facilitatorsor aquaglyceroporins) and the water transport channels or AQPs (aquaporins). Separate phylogenetic analyses ofeach of the MIP subfamilies were performed to determine the main groups of orthology. In addition, comparativesequence analyses were conducted to identify conserved signatures in the MIP molecule.

Conclusions. The earliest and major gene duplication event in the history of the MIP family led to its main functionalsplit into GLPs and AQPs. GLPs show typically one single copy in microbes (eubacteria, archaea and fungi), up tofour paralogues in vertebrates and they are absent from plants. AQPs are usually single in microbes and show theirgreatest numbers and diversity in angiosperms and vertebrates. Functional recruitment of NOD26-like intrinsicproteins to glycerol transport due to the absence of GLPs in plants was highly supported. Acquisition of other MIPfunctions such as permeability to ammonia, arsenite or CO2 is restricted to particular MIP paralogues. Up to eightfairly conserved boxes were inferred in the primary sequence of the MIP molecule. All of them mapped on to oneside of the channel except the conserved glycine residues from helices 2 and 5 that were found in the opposite side.

IntroductionWater recruitment is essential for life, and living or-ganisms evolved a wide array of membrane integralprotein channels to ensure its passive transport intocells (Chrispeels and Agre, 1994; Borgnia et al.,1999). These channels termed MIPs (major intrinsicproteins) are found in eubacteria, archaea, fungi,plants and animals (Engel and Stahlberg, 2002).MIPs are particularly abundant in plants where theyshow ubiquitous locations (Kjellbom et al., 1999;

1Email [email protected] words: amino acid variation, aquaglyceroporin, aquaporin, Bayesianinference, multigene family.Abbreviations used: AQP, aquaporin; GLP, glycerol-uptake facilitator; MIP,major intrinsic protein; NIP, NOD26-like intrinsic protein; PIP, plasmamembrane intrinsic protein; SIP, small basic intrinsic protein; TIP, tonoplastintrinsic protein.

Johansson et al., 2000; Maurel et al., 2002; Wallaceand Roberts, 2004), and in vertebrates where they aremostly restricted to fluid-conducting organs such askidney and lungs or to secretory (salivary, lacrimal,sweat) glands (Agre, 1997; Echevarria and Ilundain,1998; Takata et al., 2004). Despite their abundanceand important role in osmoregulation, it was not untilthe early 1990s when the first MIP was purified fromhuman red blood cells and shown to increase the per-meability of water in Xenopus oocytes (Preston et al.,1992). Since then, more than 450 members of theMIP family have been identified. At present, newmembers of the MIP family are being discovered ata rapid rate and more biological roles are being de-scribed for them. By controlling osmolarity, MIPsseem to affect indirectly many physiological and cel-lular processes, and their potential in biomedicine

www.biolcell.org | Volume 97 (6) | Pages 397–414 397

Page 2: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

R. Zardoya

(Agre and Kozono, 2003) and agriculture (Maurelet al., 2002) is promising.

According to substrate specificity, MIPs are mainlyclassified into AQPs (aquaporins) if they are only per-meable to water, and GLPs (glycerol-uptake facilita-tors or aquaglyceroporins) if they in addition facili-tate passive diffusion of small solutes such as glycerolor urea (Park and Saier, 1996; Heymann and Engel,1999; Engel and Stahlberg, 2002; Zardoya et al.,2002). However, this functional classification may besomewhat simple since it was recently shown that atobacco MIP can transport CO2 (Uehlein et al., 2003),and some MIPs are permeable toammonia (Jahn et al.,2004). Recently, the three-dimensional structures oftwo AQPs and a GLP were determined (Fu et al.,2000; Murata et al., 2000; Savage et al., 2003). Thequaternary structure of the protein is a homotetramer.Each MIP monomer is organized into six hydrophobicmembrane-spanning helices connected by five loops(A–E) that delimit a polar channel with two wideperiplasmic vestibules and one central pore (hour-glass model; Figure 1). Two of the connecting loops,namely B (cytoplasmic) and E (extracellular), inter-act with each other from opposite sides through twohighly conserved NPA (Asn-Pro-Ala) boxes formingthe narrowest region of the pore. Several residues in-volved in water or glycerol selectivity have been iden-tified based on sequence similarity comparisons, andtopological, site-directed mutagenic and phylogen-etic analyses (Froger et al., 1998; Lagree et al., 1999;Fu et al., 2000; Heymann and Engel, 2000; Zardoyaand Villalba, 2001; Zardoya et al., 2002). A singleN-glycosylation site is present in the extracellularloop C, and a cysteine residue in loop E is known tobe responsible for mercurial sensitivity of most MIPs(Takata et al., 2004).

Early sequence analyses proved that the MIP poly-peptide chain can be divided into two closely re-lated halves that may have arisen by gene duplication(Pao et al., 1991; Park and Saier, 1996). Accord-ing to previous phylogenetic analyses (Zardoya andVillalba, 2001; Zardoya et al., 2002), alternative mainsubstrate selective modes (AQPs and GLPs) were ac-quired early in the history of the family by gene du-plication and functional shift. The greatest diversific-ation of the protein family occurred in vertebrates andhigher plants (Zardoya et al., 2002). Up to 11 putat-ive members of the MIP family have been describedin human (AQP0 to AQP10), four of which (AQP3,

AQP7, AQP9 and AQP10) transport glycerol (Agre,1997; Echevarria and Ilundain, 1998; Takata et al.,2004). GLP orthologues are absent from plants(Zardoya et al., 2002) but up to 35 different AQPgenes have been identified in Arabidopsis (Johansonet al., 2001) and at least 31 in maize (Chaumont et al.,2001). Plant AQPs are classified into four subfami-lies: TIPs (tonoplast intrinsic proteins), PIPs (plasmamembrane intrinsic proteins), SIPs (small basic in-trinsic proteins) and NIPs (NOD26-like intrinsicproteins) (Johanson and Gustavsson, 2002; Wallaceand Roberts, 2004). Of these, only the latter trans-port glycerol (Weig and Jakob, 2000; Wallace et al.,2002; Biswas, 2004). Recently, it has been shown thatNIPs are the AQPs that were probably recruited totransport glycerol in plants because of the absence ofGLPs in them (Zardoya et al., 2002). This functionalrecruitment required convergent or parallel replace-ments at specific amino acid positions related to waterand glycerol transporting specificity. Moreover, in re-cent phylogenetic analyses, NIPs were recovered withmoderately high support as a sister group of bacterialAQPs, and thus it was suggested that NIPs were ac-quired from a single event of horizontal gene transferfrom bacteria at the origin of plants (Zardoya et al.,2002).

Recent genomic projects have boosted the iden-tification of novel members of the MIP family. Thetaxonomic range of the different MIP subfamilies hasbeen significantly extended, and for instance, newMIP genes were recently described for the first time inprotozoans such as Toxoplasma (Pavlovic-Djuranovicet al., 2003), Leishmania (Gourbal et al., 2004), Plas-modium (Beitz et al., 2004), Trypanosoma (Uzcateguiet al., 2004) and Dictyostelium (Flick et al., 1997), aswell as in bryophytes such as Physcomitrella (Borstlap,2002). Moreover, the number of new MIPs identifiedin previously poorly sampled groups such as eubac-teria, archaea, fungi and invertebrates has also in-creased considerably. This wealth of new descriptionshas prompted numerous comparative studies on theevolution, structure and physiological roles of MIPs.To correctly understand the mechanisms underlyingMIP diversification, and the origin of the special-ized functions and tissue distributions of the differentmembers, a robust phylogenetic framework is man-datory (Zardoya and Villalba, 2001). In this regard,distinction between paralogues (i.e. family membersfound in the same species that have arisen by gene

398 C© Portland Press 2005 | www.biolcell.org

Page 3: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

Phylogeny of major intrinsic proteins Research article

Figure 1 Molecular features of MIPs(A) The hourglass model. The six transmembrane domains (1–6) are connected by five loops (A–E) and delimit a central pore.

Conserved NPA boxes interact with each other at the narrowest region of the channel. An N-glycosylation site is found in

the extracellular loop C. An asterisk shows the position of the cysteine residue that is responsible for mercurial sensitivity.

(B) Conserved signatures of the MIP molecule that were deduced from comparative analysis of multiple alignments. (C) Views

of the human AQP1 (lfqy) crystal structure along and perpendicular to the fourfold symmetry axis. Conserved signatures of

helices 1, 3, 4 and 6 are shown in red, cyan, orange and blue respectively. NPA motifs are depicted in yellow. Glycine residues

of helices 2 and 5 are shown in green.

duplication) and orthologues (i.e. family membersfound in different species that share a common an-cestor and have arisen by species divergence) is only

possible based on a phylogeny. Furthermore, classi-fication and nomenclature of MIPs should be basedonly on MIP phylogeny rather than on functional

www.biolcell.org | Volume 97 (6) | Pages 397–414 399

Page 4: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

R. Zardoya

properties. In the present study, I compiled more than450 amino acid sequences from NCBI databases, andreconstructed a phylogeny of the MIP family usingBayesian methods, a new generation of phylogeneticmethods of inference based on posterior probabilities(Huelsenbeck et al., 2001; Lewis, 2001). This newphylogeny tripled previous ones in the number ofanalysed taxa, and allowed determination of majorduplication events (i.e. paralogy) as well as establish-ment of main groups of orthology within the family.

Results and discussionPhylogeny of the main groups of MIPsThere are a total of 896 amino acid MIP sequences inthe PF00230 entry of Pfam database v15.0 (http://www.sanger.ac.uk/cgi-bin/Pfam/). This number wasdecreased to 463 non-redundant, complete or almost-complete MIP sequences after performing successiveBLASTP searches at NCBI databases (http://www.ncbi.nlm.nih.gov) with different members of the MIPfamily as queries. Alignment of selected sequencesproved to be difficult because conserved amino acidstretches were mostly found among MIP orthologuesbut hardly among paralogues. In addition, the pre-sence of some entries with either long N- or C-ter-minal ends, long indels or highly divergent sequencesalso complicated aligning efforts, and produced nu-merous phylogenetic inference artifacts including thewell-known long-branch attraction effect (i.e. longbranches in the tree are spuriously attracted to eachother and to the outgroup). To obtain reliable phylo-genetic inferences for the whole MIP family, highlydivergent sequences were pruned from the alignment(notably AQP11 and AQP12 were excluded from theanalysis because they may be only distantly relatedto the MIP family). In addition, the number of se-quences per paralogue was decreased proportionallybecause of computational constraints. The final MIPdata set included 150 sequences and 470 positions.Of these, 285 (from gap-rich regions) were excludedfrom the analyses because of uncertainty in positionalhomology, two were invariant and 181 were consi-dered informative for phylogenetic reconstructionunder the criterion of maximum parsimony (i.e. theywere potential shared derived characters).

The tree resulting from the Bayesian inference un-der the JTT (Jones et al., 1992) + I + � model isshown in Figure 2. The different MIP paralogues were

recovered as distinct groups with high statistical sup-port. However, phylogenetic relationships amongthem were difficult to resolve (deep nodes connectingdifferent paralogues were rather short because of thegeneral lack of shared derived characters among para-logues). A clear distinction between GLPs and AQPswas recovered, although only with moderate statisti-cal support (Figure 2). Within GLPs, Gram-negativeand Gram-positive eubacteria did not group together.Within AQPs, plant NIPs and eubacterial AQPs wererecovered as the most basal paralogues. Phylogen-etic relationships between both groups could not beconfidently resolved. Hence, it was not possible toconfirm or reject the previously postulated bacterialorigin of plant NIPs (Zardoya et al., 2002). FungalAQPs and plant SIPs were grouped together andplaced in a basal position with respect to animalAQPs, and plant TIPs and PIPs. However, the ex-tremely long branches exhibited by plant SIPs pro-bably hindered the inference of their correct phylo-genetic position. The next paralogues that branchedoff were plant TIPs and animal AQP8. The onlynematode AQP included in the phylogenetic analysiswas recovered as a sister group of AQP8 with highstatistical support. Plant PIPs were recovered withmoderate statistical support as the closest relative ofinsect AQPs and a group including vertebrate AQP0,AQP1, AQP2, AQP4, AQP5 and AQP6. Withinvertebrates, AQP1 and AQP4 were recovered as themost basal paralogues. AQP0 was placed in a morederived position as a sister group of AQP2, AQP5and AQP6.

Alignment within each of the MIP subfamiliesproved to be more reliable and could include morephylogenetically informative positions. Therefore se-parate phylogenetic analyses including all putativemembers of the different subfamilies were performed.

Eubacterial and archaean MIPsThe best-characterized bacterial MIP is Escherichiacoli AQPZ (Calamita, 2000). Its crystal structure isknown (Savage et al., 2003), and different studieshave shown its physiological role in short- and long-term osmoregulation, exponential growth and bac-terial virulence (Calamita, 2000). However, bio-logical information on other bacterial MIPs is lessthorough. A total of 67 eubacterial and four archaeansequences were retrieved from NCBI databases andincluded in the phylogenetic analysis. The average

400 C© Portland Press 2005 | www.biolcell.org

Page 5: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

Phylogeny of major intrinsic proteins Research article

Figure 2 Evolutionary relationships of MIPs(A) Phylogenetic tree reconstructed using Bayesian inference and the JTT + I +� evolutionary model. (B) Closeup view of the

phylogenetic relationships of animal and plant AQPs. Numbers in the nodes are Bayesian posterior probabilities. Branch lengths

are proportional to evolutionary distance.

length (+−S.D.) of bacterial MIPs is 249 +− 21 aminoacids. Most eubacterial and archaean MIPs depositedin sequence databases were identified as part of theautomated annotation of genomic projects. Interest-ingly, however, several complete genomes of both eu-bacteria and archaea seemed to lack MIPs (Calamita,2000; Hohmann et al., 2000). Some of these genomesbelong to microorganisms that are pathogens andmight have lost their MIP genes during co-evolutionwith the host in the absence of the required selectivepressure (i.e. osmoregulation). However, it is import-ant to note that an endosymbiont such as Buchnera orintracellular pathogens such as Listeria and Salmonellado have MIP genes. Other organisms may lack MIPsbecause they live in extreme conditions where waterand solute transport might be differently regulated

(Calamita, 2000; Hohmann et al., 2000). Alternat-ively, all these microorganisms may have MIP geneswith highly divergent amino acid sequences that aredifficult to identify by similarity searches.

The eubacterial and archaean sequence data set pro-duced an alignment of 374 positions. Of these, 183were excluded from the analyses, 10 were invariantand 177 were parsimony informative. The reconstruc-ted Bayesian tree is shown in Figure 3. In Figures 3–9,numbers in the nodes are Bayesian posterior probabil-ities. Phylogenetic analysis confirmed that osmore-gulation in eubacteria requires only one AQP andone GLP. Within each paralogue, a clear separationbetween Gram-positive and Gram-negative bacteriawas recovered. Interestingly, most of the AQPs iden-tified were from Gram-negative bacteria, whereas

www.biolcell.org | Volume 97 (6) | Pages 397–414 401

Page 6: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

R. Zardoya

Figure 3 Bayesian tree showing evolutionary relationships of eubacterial and archaean MIPsHere, and also in Figures 4–9, numbers in the nodes are Bayesian posterior probabilities.

most GLPs correspond to Gram-positive bacteria(Figure 3). This asymmetry might be related to thedifferent structure and diffusion properties ofthe membranes and cell walls in both bacterial groups(Calamita, 2000) that select for one or another type ofMIPs. Despite the fact that 20 complete genomesof archaea have been sequenced, only a few archaeanMIPs have been described so far and all of them seemto transport both water and glycerol (Kozono et al.,2003). This indicates that GLPs could take over thefunction of AQPs in archaea, but more taxa belong-ing to this group need to be screened to confirm thishypothesis.

Fungal MIPsUntil recently, the only MIPs identified in fungi werethose of baker’s yeast (Saccharomyces pombe) (Calamita,2000; Hohmann et al., 2000; Carbrey et al., 2001).

This species shows one GLP (GenBank® identifica-tion no. gi: 14318465) that doubles the length ofa normal MIP due to an extremely long N-terminalend of approx. 300 amino acids (Luyten et al., 1995).Interestingly, this channel is preferentially involvedin glycerol export. In addition, yeast has two closelyrelated AQPs of normal length (Carbrey et al., 2001).

Up to 16 complete genomes of fungi have beenrecently sequenced, and a total of 25 fungal MIPswere retrieved from NCBI databases. These proteinsare heterogeneous in length but overall relativelylong, with an average length of 438 +− 166 aminoacids. GLPs of Schizosaccharomyces (GenBank® acces-sion no. 19113700), Kluyveromyces (GenBank®

accession no. 50307951), Candida (GenBank® acces-sion no. 50285779), and Ustilago (GenBank®

accession no. 46097151) are above 500 amino acidsdue to long N-terminal ends. Aspergillus AQP

402 C© Portland Press 2005 | www.biolcell.org

Page 7: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

Phylogeny of major intrinsic proteins Research article

Figure 4 Bayesian tree showing evolutionary relationships of fungal MIPs

(GenBank® accession no. 40742230) exhibits an ex-tremely long C-terminal end that yields a proteinwith a total length of 959 amino acids.

The fungal sequence data set produced an align-ment of 1405 positions due to the long N- andC-terminal ends. After excluding these ends and in-ternal sites of uncertain positional homology, unam-biguously aligned sequences included 182 charac-ters (five invariant and 170 parsimony informative).The recovered Bayesian tree is shown in Figure 4.Phylogenetic analysis showed that, in general, eachgenome has one AQP and one GLP. However, thepresence of a second copy of any of the two groupsin some fungal genomes is also frequent. GLPs andAQPs recovered significantly different phylogeneticrelationships among the studied fungal species. Thispattern suggests that evolution of fungal MIP genesmay occur through horizontal transfer events. Never-theless, phylogenetic reconstruction artifacts cannotbe ruled out in this case due to (i) the presenceof highly divergent sequences in the data set,which can produce long-branch attraction effects and(ii) incomplete taxon sampling.

Protozoan MIPs

At present, only few protozoan MIPs have been de-scribed and it is not enough yet to perform separatephylogenetic analyses with them. Therefore proto-zoan MIPs were included in phylogenetic analysestogether with animal GLPs and AQPs. In general,protozoan MIP proteins are of normal length (277 +−28 amino acids), but their sequences are found to berelatively divergent and produced long branches inreconstructed phylogenetic trees that prevented un-ambiguous recovery of their exact phylogenetic posi-tion. It is likely that each protozoan genome has atleast one AQP and one GLP, as illustrated by Trypano-soma. However, thus far, no other GLP orthologueshave been identified in protozoans whereas AQP or-thologues are found in several protozoans such ase.g. Plasmodium and Dictyostelium. Some protozoansmay lack GLPs but it may occur that they havehighly divergent amino acid sequences that are diffi-cult to identify by similarity searches. Phylogeneticanalysis indicated that the GLP gene and the AQPgene underwent one duplication event in Trypano-soma and Dictyostelium respectively. Protozoan MIP

www.biolcell.org | Volume 97 (6) | Pages 397–414 403

Page 8: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

R. Zardoya

Figure 5 Bayesian tree showing evolutionary relationships of animal GLPs

homologues were generally recovered at the base ofanimal GLP and AQP trees (Figures 5 and 6).

Animal GLPsUp to 42 GLP orthologues were identified in ani-mals. The average length of animal GLPs is 284 +−39 amino acids. The GLP data set produced analignment of 523 positions. Of these, 282 wereexcluded, 15 were invariant and 210 were parsi-mony informative. The recovered Bayesian tree isshown in Figure 5. Phylogenetic analysis indi-cated that the GLP gene underwent several dupli-cation events within Caenorhabditis. No GLPs havebeen identified in insects thus far. The great diver-sification of GLPs occurred in vertebrates (Takataet al., 2004). Phylogenetic reconstruction delimitedfour main groups with strong statistical support:AQP3, AQP9, AQP7 and AQP10 (Figure 5). Thephylogeny of vertebrates was fairly recovered withineach of the groups. According to the recovered tree,AQP7 and AQP10 are sister groups (Figure 5).However, phylogenetic relationships between AQP3,AQP9 and the cluster AQP7 + AQP10 could not

be clearly resolved. AQP3 is an ubiquitous para-logue that is expressed in the epithelial cells of kid-ney, brain, eye, skin, as well as in the urinary, di-gestive, and respiratory tracts (Takata et al., 2004).AQP7 and AQP9 are preferentially expressed in ad-ipocytes and leucocytes respectively (Takata et al.,2004). Both paralogues seem to transport arsenite(Liu et al., 2002). Members of the AQP10 group ex-hibited highly divergent sequences, which explainsthe fact that this paralogue was only recently de-scribed as a member of the family (Ishibashi et al.,2002). In human, AQP10 is exclusively expressed inthe intestine, but in mouse AQP10 is a pseudogene(Morinaga et al., 2002). According to phylogeneticanalyses, AQP10 is also present in fish.

Animal AQPsAQPs are highly diversified in animals (Agre, 1997;Echevarria and Ilundain, 1998; Takata et al., 2004).A total of 105 animal AQPs were retrieved fromNCBI databases. The average length of animal AQPsis 281 +− 86 amino acids. However, some AQPs of in-sects are particularly long. This is the case of one AQP

404 C© Portland Press 2005 | www.biolcell.org

Page 9: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

Phylogeny of major intrinsic proteins Research article

Figure 6 Bayesian tree showing evolutionary relationships of animal AQPs

of Drosophila (GenBank® accession no. 171136672),Anopheles (GenBank® accession no. 31211827) andApis (GenBank® accession no. 48095234). Each ofthem has a total length over 600 amino acids dueto extended C-terminal ends. Interestingly, anotherAQP of Apis (GenBank® accession no. 40111098)

is 1630 amino acid long. Sequence analysis showedthat this protein contains two copies in tandem of anormal AQP. Within vertebrates, one chicken AQP(GenBank® accession no. 50755713) exhibits a longN-terminal end that generates a protein with a totallength of 479 amino acids.

www.biolcell.org | Volume 97 (6) | Pages 397–414 405

Page 10: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

R. Zardoya

The animal AQP data set produced an alignment of1308 positions due to the long N-terminal (chicken)and C-terminal (insect) ends. Extended ends and in-ternal sites of ambiguous positional homology wereexcluded from further analysis, and the alignmentwas decreased to 213 positions (3 invariant and 208parsimony informative). The recovered Bayesian phy-logeny is shown in Figure 6. Phylogenetic analysisshowed that animal AQPs could be classified into sev-eral derived groups with strong statistical support.However, phylogenetic relationships among thesegroups could not be confidently resolved. Nema-tode AQPs exhibited rather divergent sequences thatproduced long branches and hindered unambiguousrecovery of their correct phylogenetic position and re-lationships (Figure 6). Insect AQPs were recovered intwo different clusters. One was placed as a sister groupof vertebrate AQPs, although with low statistical sup-port. The other was closely related to AQP11 andAQP12, and included one highly divergent sequenceof Apis (GenBank® accession no. 48102715), Ano-pheles (GenBank® accession no. 31201751) andDrosophila (GenBank® accession no. 17736985).

According to the recovered tree, vertebrate AQPscould be classified into at least nine groups (Takataet al., 2004) that resulted from several rounds of geneduplication and functional divergence. Each of thenine paralogues was capable of fairly recoveringthe vertebrate phylogeny. The most divergent para-logues were AQP11 and AQP12, which may be onlydistantly related to the MIP family. In fact, theirvery recent annotation as members of the MIP familywas only tentative since it was based exclusively onautomated sequence similarity searches without fur-ther analysis. The next most divergent member wasAQP8 that was identified in fish, birds and mammals.This paralogue is ubiquitous in its tissue expressionand has a predominant intracellular location (Ferriet al., 2003). The striking association of AQP8 withmitochondria and the endoplasmic reticulum may ex-plain its significantly divergent evolution (Ferri et al.,2003). The remaining vertebrate groups were AQP0(formerly known as MIP26), AQP1 (formerly knownas CHIP28), AQP2, AQP4, AQP5 and AQP6. Themost basal ones, AQP1 and AQP4 are of ubiquit-ous distribution, and in human are found in kidney,lungs, brain, stomach, eye and ear (Echevarria andIlundain, 1998; Takata et al., 2004). Water trans-port by AQP4 is mercurial-insensitive (Hasegawa

et al., 1994). More derived AQPs are tissue-specific.AQP0 is the most abundant membrane protein inhuman eye lens. AQP2 is found in renal collectingducts and its expression is regulated by the antidi-uretic hormone. Mutations in human AQP2 resultin diabetes insipidus. AQP5 is found in body (salivary,lacrimal and sweat) secretions and AQP6 is restric-ted to kidney collecting ducts. The recovered treeshowed a close relationship among AQP2, AQP5 andAQP6.

Plant AQPsMIP abundance in plants is often related with theneed of continuous absorption and evaporation of sub-stantial volumes of water during plant growth. MIPdiversification in plants includes multiple subcellularlocalization and differential expression during plantdevelopment (Johansson et al., 2000; Maurel et al.,2002). TIPs and PIPs are generally found in the vacu-olar (tonoplast) and plasma membranes respectively(Kjellbom et al., 1999). Hydrostatic pressure is lim-ited in the tonoplast but is very high in the plasmamembrane. Hence, it is likely that TIPs and PIPsmay have evolved significantly different physiologicalroles (Johanson et al., 2001). NIPs seem to have dis-tinct functions in different plants. In legumes, NIPsare localized in the peribacteroid membrane of sym-biotic root nodules, and regulate water and metabol-ite flux between plant and nitrogen-fixing bacteria(Rivers et al., 1997; Guenther and Roberts, 2000).In loblolly pine, NIPs are expressed in early embryo-genesis (Ciavatta et al., 2001). In pea, NIP transcriptsare only detected in the seed coat (Schuurmans et al.,2003). SIPs were identified based on phylogentic ana-lysis of expressed sequence tags (ESTs) (Johanson andGustavsson, 2002), and their localization, expressionpattern and functional role remains unknown.

A total of 220 MIPs from a great number and di-versity of plants including ferns, gymnosperms, mon-cots and dicots were retrieved from NCBI databases.The average length of plant MIPs is 268 +− 25 aminoacids. In agreement with previous studies, prelimi-nary phylogenetic analyses of the plant MIP data setshowed that all analysed membrane channels wereAQPs, and confirmed their classification into four or-thologue groups. Therefore PIP, TIP and NIP datasets were analysed independently. Since few SIP se-quences have been identified thus far, no separate phy-logenetic analysis of this paralogue was performed.

406 C© Portland Press 2005 | www.biolcell.org

Page 11: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

Phylogeny of major intrinsic proteins Research article

Figure 7 Bayesian tree showing evolutionary relationships of plant PIPs

The PIP data set produced an alignment of 348positions. Of these, 83 were excluded, 75 were invari-ant and 136 were parsimony informative. PIPs consti-tute a rather homogeneous group with comparativelylow pairwise sequence divergences. The remarkableconservation of this paralogue could indicate eitherstrong functional constraints or a recent origin. Thepresence of PIPs in both gymnosperms and angio-

sperms rejects the latter hypothesis and suggests aslow rate of evolution of these proteins. The recoveredBayesian tree is shown in Figure 7. Phylogeneticanalysis supported separation of PIPs into two dis-tinct groups: PIP1 and PIP2 (Kjellbom et al., 1999).Although the exact physiological role of both groupsremains unknown, it was recently shown thatmaize PIP1 and PIP2 have differential functional

www.biolcell.org | Volume 97 (6) | Pages 397–414 407

Page 12: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

R. Zardoya

Figure 8 Bayesian tree showing evolutionary relationships of plant TIPs

properties. PIP2 paralogues are able to inducewater channel activity in Xenopus oocytes whereasPIP1 paralogues are inactive (Chaumont et al., 2001).

The TIP data set produced an alignment of 297positions. Of these, 60 were excluded, 19 were in-variant and 197 were parsimony informative. The re-

covered Bayesian tree is depicted in Figure 8. Phylo-genetic analyses confirmed the validity of the threetypically recognized groups of TIPs (α, γ, δ), and sup-ported the existence of a fourth group of more diver-gent sequences (β). According to the recovered tree,α- and γ-TIPs grouped more closely to each other

408 C© Portland Press 2005 | www.biolcell.org

Page 13: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

Phylogeny of major intrinsic proteins Research article

Figure 9 Bayesian tree showing evolutionary relationships of plant NIPs

than to β-TIP. δ-TIP was placed as the most basalparalogue, and included, among others, a group ofhighlydivergent sequences fromArabidopsis,Hordeum,Zea and Oryza. These sequences were previously in-correctly suggested to conform to a different TIPgroup based on phylogenetic analyses that did notcorrect for saturation (Chaumont et al., 2001). Basedon the relative phylogenetic position of the Picea TIPsequences, it can be concluded that gene duplicationevents that led to the four main groups of plant TIPs,predated the split of gymnosperms and angiosperms.Further gene duplications probably occurred withineach group to generate the 7–9 TIP copies found inthe well-surveyed genomes of Arabidopsis, Zea andOryza. Several studies showed that the distinct TIPmembers are expressed in diverse intracellular loc-alizations, and presumably have different functions(Moriyasu et al., 2003; Takahashi et al., 2004).

The NIP data set produced an alignment of 386positions. Of these, 186 were excluded, 24 were in-variant and 163 were parsimony informative. Therecovered Bayesian tree is shown in Figure 9. Phylo-genetic analyses defined three main groups of NIPs(Chaumont et al., 2001). Of these, NIP2 is the onlyone that has been identified thus far in a fern and agymnosperm. Most described angiosperm NIPs be-

long to the NIP1 group. NIP3 showed the mostdivergent sequences.

Molecular features of MIPsSeveral studies have thoroughly analysed the primarystructure of the MIP molecule, searching for con-served motifs across the whole family and within eachof the main members (Park and Saier, 1996; Frogeret al., 1998; Heymann and Engel, 2000; Zardoya andVillalba, 2001; Zardoya et al., 2002; Wallaceand Roberts, 2004). In the present analysis, the twomost conserved motifs are the NPA boxes in loopsB and E (at positions 76–78 and 192–194 of humanAQP1 respectively; Figure 1). Only 20 out of the 463analysed sequences have a non-canonical NPA box inloop B. In most cases, spurious changes in single spe-cies occur in the third position of the motif, wherethe alanine residue is replaced by valine, threonine,serine or leucine residue. Notably, in all AQP7 se-quences, the proline residue in the second positionof the motif is changed to an alanine residue. Non-mammal AQP8 sequences show NPV or NPP motifs.SIPs show NPL or NPT motifs. NIP3 members havean NPS signature. The recently identified AQP11and AQP12 have NPC and NPT motifs respect-ively. Finally, three highly divergent insect sequences

www.biolcell.org | Volume 97 (6) | Pages 397–414 409

Page 14: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

R. Zardoya

(GenBank® accession nos. 31201751, 17736985 and48102715) have a CPY motif at the correspondingposition.

Approximately 30 analysed sequences have changesin the NPA box of loop E. The most commonchange is in the third position of the box from ala-nine to valine residue (both are small and hydropho-bic amino acids). Members of the AQP7 group showeither NPS or NPT motifs at this position. ThereforeAQP7 seems to have evolved a distinct pore withinthe MIP family by compensatory changes. It wouldbe interesting to obtain the crystal structure of anAQP7 to see how such a distinctive conformation af-fects flux through the pore. Similarly, NIP3 membersand the above-mentioned highly divergent insect se-quences show an NPV motif at this position that maycompensate changes occurring in the loop-B motif.

Stretches of conserved residues are also found inthe nearby upstream and downstream regions of bothNPA boxes. A consensus sequence for the first boxcould be SGXHXNPAVT (Figure 1) (Heymann andEngel, 2000; Zardoya and Villalba, 2001). A con-sensus sequence for the second box could be GXXX-NPAR(S/D)XG (Figure 1). The glycine residue at thebeginning of this consensus sequence is remarkablychanged to asparagine residue in bacterial AQPs. In-terestingly, one of the positions in the signature seemsto be related with substrate selectivity showing aserine residue in AQPs and aspartic residue in GLPs(Froger et al., 1998; Heymann and Engel, 2000;Zardoya and Villalba, 2001). AQP11 and AQP12have an alanine residue in that position. Another con-served motif in most members of the MIP family isa glutamic residue in the transmembrane helix 1 (atposition 17 of human AQP1), which forms a ratherconserved box AEFXXT (Figure 1) that is absentfrom SIPs. The next conserved residue is a glycine inhelix 2 (Figure 1; at position 57 of human AQP1)that is changed to alanine in TIPs and to asparaginein AQP6 members.

Three almost contiguous residues in helix 3, tyro-sine, glutamine and glycine (at positions 97, 101 and104 of human AQP1 respectively), are highly con-served (Figure 1). The tyrosine residue is absent fromfungal AQPs, SIPs, AQP11 and AQP12. The gly-cine residue is missing in AQP11 and some NIP3.Both tyrosine and glutamine residues are not foundin highly divergent insect sequences. Helix 4 showsthree conserved and contiguous residues, glutamic,

threonine and leucine (at positions 142, 146 and 149of human AQP1 respectively; Figure 1). This boxmight be involved in glycerol discrimination since aglutamic residue is replaced by glutamine residue inanimal GLPs, and a threonine residue is missingin fungal GLPs. A threonine residue is changed tocysteine residue in AQP11 and AQP12, and is notfound in some NIP1, in protist AQPs and in highlydivergent insect sequences. The leucine residue is ab-sent from AQP11 and AQP12, in NIP2 and in highlydivergent insect sequences. Moreover, it is changedto a phenylalanine residue in AQP0, and shifts oneposition ahead in bacterial AQPs and NIPs (exceptNIP2).

Only one glycine residue is conserved in helix 5 (atposition 173 of human AQP1; Figure 1). This residueis missing in fungal GLPs as well as in AQP11 andAQP12 and in highly divergent insect sequences.Another conserved box is found in helix 6 (Figure 1)and it includes four residues: tryptophan, proline,glycine and tyrosine (at positions 210, 216, 219 and227 of human AQP1 respectively). The tryptophanresidue is only absent from Gram-negative bacterialGLPs and archaean MIPs. No conserved residues arefound in loops A, C and D of both the N- and C-terminal ends.

Deduced conserved boxes were mapped on to thecrystal structure of human AQP1 (Murata et al.,2000) (Figure 1). In a perpendicular view to thefourfold symmetry axis of the protein, conservedboxes are localized at the centre of the protein, close tothe narrowest part of the pore. All of them face insidewithin the protein except a phenylalanine residue ofhelix 1 and a tryptophan residue of helix 6. These tworesidues might be involved in the oligomerizationof the tetramer. A view from the cytoplasmic sidealong the fourfold symmetry axis shows that all con-served boxes are on one side of the channel except theconserved glycine residues from helices 2 and 5 thatare found on the opposite side (Figure 1).

Several residues have been proposed to be involvedin the selectivity to water or glycerol (Froger et al.,1998; Heymann and Engel, 2000; Zardoya andVillalba, 2001; Zardoya et al., 2002). Direct evi-dence that two contiguous residues in helix 6 (at posi-tions 212 and 213 of human AQP1) are involved inthe switching of substrate specificity was achievedusing site-directed mutagenic experiments (Lagreeet al., 1999). At these positions, AQPs have tyrosine

410 C© Portland Press 2005 | www.biolcell.org

Page 15: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

Phylogeny of major intrinsic proteins Research article

(or phenylalanine) and tryptophan residues whereasGLPs show proline and valine (or isoleucine) residues.Other selectivity residues were deduced from com-parative sequence and topological analyses. For in-stance, the so-called aromatic/arginine region that islocated in the narrowest part of the pore was pro-posed to be involved in substrate selectivity by directcomparison of the AQP and GLP crystal structures.This region is at the confluence of Phe58 (helix 2),His182 (helix 5), Cys191 (loop E) and Arg197 (loop E)of human AQP1 (Murata et al., 2000). In E. coli GLP(Fu et al., 2000), these positions correspond to Trp48,Gly191, Phe200 and Arg206 respectively. In NIPs, theregion has an intermediate combination of residues:tryptophan, valine, alanine and arginine (Weig andJakob, 2000; Wallace et al., 2002; Biswas, 2004).NIPs are postulated to be AQPs recruited to trans-port glycerol, and the peculiar composition of thearomatic/arginine region in NIPs was seemingly ac-quired by convergent replacements (Zardoya et al.,2002).

Finally, there are several residues that characterizeeach of the main members of the MIP family (seeFigure 6 in Zardoya and Villalba, 2001 for more de-tails). Such discriminating residues are mostly foundat within-group conserved stretches in the flankingregions of the NPA boxes. Probably these residuesmust be responsible for the specific functional pro-perties of each of the paralogues.

Concluding remarks and perspectivesThe hypothesis that specific proteins could mediateprotein water transport into cells was quite marginal20 years ago, and most researchers postulated passivediffusion as the main mechanism for water recruit-ment. The discovery of the first MIP in human in the1990s not only changed dramatically the mainstreamview, but also prompted the search for homologues.It was soon realized that in fact MIPs conform to anancient protein family that can be found throughoutliving organisms. Recently, genomic projects haveallowed identification of the exact number of MIPgenes per model system species, and have exceedinglyexpanded the taxonomic range of known MIPs. Inthe present study, I reconstructed an MIP phylogenytaking into account the new genomic information.With more than 450 proteins analysed, and a fairrepresentation of the different MIP paralogues in themain groups of living organisms, the recovered phylo-

geny confirms and generalizes previous analyses basedon decreased and taxonomically more biased data sets.According to the results, the earliest and major geneduplication event in the history of the MIP family ledto the main functional split into GLPs and AQPs. Theformer were absent from plants, and achieved theirgreatest diversification in vertebrates. The latter hadat least nine members (PIP1, PIP2, α-TIP, β-TIPβ,γ-TIP, δ-TIP, NIP1, NIP2 and NIP3) in plants andat least seven (AQP0, AQP1, AQP2, AQP4, AQP5,AQP6 and AQP8) in vertebrates. Functional recruit-ment of NIPs to glycerol transport due to the absenceof GLPs in plants was highly supported in the newphylogenetic analyses. However, the proposed ori-gin of plant NIPs from bacteria through an ancientevent of gene horizontal transfer could not be con-firmed due to the general lack of resolution of deepernodes. Acquisition of other MIP functions, such aspermeability to ammonia, arsenite or CO2, is not ageneral trend of the family but it is restricted to par-ticular MIP paralogues.

Up to eight fairly conserved boxes were found inthe primary sequence of the MIP molecule through-out the entire alignment of 463 proteins (except in theabove-mentioned highly divergent sequences). Topo-logical mapping of these residues showed their in-timate association with the narrowest section of thechannel. Two putative paralogues in plants (SIP1 andSIP2), three insect AQPs and vertebrate AQP11and AQP12 showed highly divergent sequences. Thelack of several conserved boxes in the insect AQPs,as well as in AQP11 and AQP12 suggests that theseproteins may not belong to the family or may be onlydistantly related.

Because of their abundance, many MIP genes are of-ten identified at early stages of most genomic projects.Hence, the perspective of getting more complete MIPsequence data sets for further evolutionary studies isencouraging. A preliminary TBLASTN search of ESTdatabases from ongoing genomic projects showedthat members of the MIP family described here couldbe tentatively identified using similarity searches,and provided a glimpse of the large amount of se-quence data on MIPs that is currently being accumu-lated. For instance, AQPs in dog were found at chro-mosomes 6 (GenBank® accession no. 50101842), 7(GenBank® accession no. 50194928), 10 (GenBank®

accession no. 50116290), 14 (GenBank® acces-sion no. 50097440) and 27 (GenBank® accession

www.biolcell.org | Volume 97 (6) | Pages 397–414 411

Page 16: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

R. Zardoya

nos. 50118419, 50118421). GLPs in dog werefound in chromosomes 7 (GenBank® accession no.50193816), 11 (GenBank® accession no. 50090285)and 30 (GenBank® accession no. 50202430). Inthe sea urchin Strongylocentrotus purpuratus, twoAQPs (GenBank® accession nos. 34754721 and34794000) and two GLPs (GenBank® accession nos.34743653 and 34754721) were identified. Finally,in the moss Physcomitrella patens, up to five PIPs, fourTIPs, one NIP and two SIPs were identified (Borstlap,2002). Preliminary phylogenetic analyses with thesefragments (results not shown) indicate a very earlydiversification of MIPs in plants.

In parallel, the development of new bioinform-atic tools such as the MIP relational database (http://idefix.univ-rennes1.fr:8080/Prot/index.html) will fa-cilitate data mining and also will be very useful indiscerning the specific signatures that characterizeeach MIP paralogue. Such features will help futuresequence comparison, function prediction and phylo-genetic analysis.

Materials and methodsMolecular databases at NCBI ((http://www.ncbi.nlm.nih.gov)were screened for MIPs with the BLASTp search tool (Altschulet al., 1997) using mouse AQP4 (GenBank® accession no.33563244) and AQP3 (GenBank® accession no. 20072731) asinitial queries. Further searches to find more divergent para-logues were performed using representatives of the different MIPsubfamilies as queries. Redundant entries, those that containedpoint mutations with respect to sequences already included inthe analyses, and short partial sequences were discarded. A totalof 463 complete or almost complete MIP proteins were retrievedfrom the NCBI database and analysed at the amino acid level. Ofthese, 71 were of eubacteria and archaea, 25 were of fungi, 147were of animals and 220 were of plants. Additionally, differentEST databases corresponding to ongoing genomic projects werescreened for new MIPs with the TBLASTN search tool, usingmouse AQP4 and AQP3 as queries.

Sequences were aligned using CLUSTAL X (Thompsonet al., 1997) with default settings, and multiple alignmentsrefined by eye using MacClade 4.05 (Maddison and Maddison,1992). Gaps resulting from the alignment were treated as miss-ing data. Ambiguous alignments in highly variable (gap-rich)regions were excluded from the phylogenetic analyses (alignedsequences and the exclusion sets are available from the authoron request). Bayesian phylogenetic inferences were conductedusing MrBayes v3.0b3 (Huelsenbeck and Ronquist, 2001) byMetropolis coupled Markov Chain Monte Carlo (MCMCMC)sampling for 200000 generations (four simultaneous MC chains;sample frequency 100 generations; chain temperature 0.2) un-der the JTT (Jones et al., 1992) + I +� model for each data set.Robustness of the inferred trees was evaluated using Bayesianposterior probabilities.

The Jalview applet in the Pfam site (http://www.sanger.ac.uk/cgi-bin/Pfam) was used to view and determine conservedresidues in the online MIP alignment of 896 proteins basedon percentage identity. Selected residues were further localizedvisually in the MIP alignment of 463 proteins. RasMol 2.7(http://www.umass.edu/microbio/rasmol/) was used to map con-served residues on to the three-dimensional structure of humanAQP1 (lfqy).

AcknowledgementsF. Abascal provided insightful comments on anearlier version of the manuscript and helped withRasMol. This work received partial financial supportfrom the Ministerio de Ciencia y Tecnologıa (grantno. REN2001-1514/GLO).

ReferencesAgre, P. (1997) Molecular physiology of water transport: aquaporin

nomenclature workshop. Mammalian aquaporins. Biol. Cell 89,255–257

Agre, P. and Kozono, D. (2003) Aquaporin water channels: molecularmechanisms for human diseases. FEBS Lett. 555, 72–78

Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W.and Lipman, D. (1997) Gapped BLAST and PSI-BLAST: a newgeneration of protein database search programs.Nucleic Acids Res. 25, 3389–3402

Beitz, E., Pavlovic-Djuranovic, S., Yasui, M., Agre, P. and Schultz, J.E.(2004) Molecular dissection of water and glycerol permeability ofthe aquaglyceroporin from Plasmodium falciparum by mutationalanalysis. Proc. Natl. Acad. Sci. U.S.A. 101, 1153–1158

Biswas, S. (2004) Functional properties of soybean nodulin 26 from acomparative three-dimensional model. FEBS Lett. 558, 39–44

Borgnia, M., Nielsen, S., Engel, A. and Agre, P. (1999) Cellular andmolecular biology of the aquaporin water channels.Annu. Rev. Biochem. 68, 425–458

Borstlap, A.C. (2002) Early diversification of plant aquaporins.Trends Plant Sci. 7, 529–530

Calamita, G. (2000) The Escherichia coli aquaporin-Z water channel.Mol. Microbiol. 37, 254–262

Carbrey, J.M., Bonhivers, M., Boeke, J.D. and Agre, P. (2001)Aquaporins in Saccharomyces: characterization of a secondfunctional water channel protein. Proc. Natl. Acad. Sci. U.S.A. 98,1000–1005

Chaumont, F., Barrieu, F., Wojcik, E., Chrispeels, M.J. and Jung, R.(2001) Aquaporins constitute a large and highly divergent proteinfamily in maize. Plant Physiol. 125, 1206–1215

Chrispeels, M.J. and Agre, P. (1994) Aquaporins: water channelproteins of plant and animal cells. Trends Biochem. Sci. 19,421–425

Ciavatta, V.T., Morillon, R., Pullman, G.S., Chrispeels, M.J. andCairney, J. (2001) An aquaglyceroporin is abundantly expressedearly in the devlopment of the suspensor and the embryo proper ofloblolly pine. Plant Physiol. 127, 1556–1567

Echevarria, M. and Ilundain, A.A. (1998) Aquaporins.J. Physiol. Biochem. 54, 107–118

Engel, A. and Stahlberg, H. (2002) Aquaglyceroporins: channelproteins with a conserved core, multiple functions, and variablesurfaces. Int. Rev. Cytol. 215, 75–104

Ferri, D., Mazzone, A., Liquori, G.E., Cassano, G., Svelto, M. andCalamita, G. (2003) Ontogeny, distribution, and possible functionalimplications of an unusual aquaporin, AQP8, in mouse liver.Hepatology 38, 947–957

412 C© Portland Press 2005 | www.biolcell.org

Page 17: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

Phylogeny of major intrinsic proteins Research article

Flick, K.M., Shaulsky, G. and Loomis, W.F. (1997) The wacA gene ofDictyostelium discoideum is a developmentally regulated memberof the MIP family. Gene 195, 127–130

Froger, A., Tallur, B., Thomas, D. and Delamarche, C. (1998)Prediction of functional residues in water channels and relatedproteins. Protein Sci. 7, 1458–1468

Fu, D., Libson, A., Miercke, L.J.W., Weitzman, C., Nollert, P.,Krucinski, J. and Stroud, R.M. (2000) Structure of a glycerol-conducting channel and the basis for its selectivity. Science 290,481–486

Gourbal, B., Sonuc, N., Bhattacharjee, H., Legare, D., Sundar, S.,Ouellette, M., Rosen, B.P. and Mukhopadhyay, R. (2004) Druguptake and modulation of drug resistance in Leishmania by anaquaglyceroporin. J. Biol. Chem. 279, 31010–31017

Guenther, J.F. and Roberts, D.M. (2000) Water-selective andmultifunctional aquaporins from Lotus japonicus nodules. Planta210, 741–748

Hasegawa, H., Ma, T., Skach, W., Matthay, M. and Verkman, A.(1994) Molecular cloning of a mercurial-insensitive water channelexpressed in selected water-transporting tissues. J. Biol. Chem.269, 5497–5500

Heymann, J.B. and Engel, A. (1999) Aquaporins: phylogeny,structure, and physiology of water channels. News Physiol. Sci.14, 187–193

Heymann, J.B. and Engel, A. (2000) Structural clues in the sequencesof the aquaporins. J. Mol. Biol. 295, 1039–1053

Hohmann, S., Bill, R.M., Kayingo, G. and Prior, B.A. (2000) MicrobialMIP channels. Trends Microbiol. 8, 33–38

Huelsenbeck, J.P. and Ronquist, F.R. (2001) MrBayes: Bayesianinference of phylogeny. Bioinformatics 17, 754–755

Huelsenbeck, J.P., Ronquist, F.R., Nielsen, R. and Bollback, J.P.(2001) Bayesian inference of phylogeny and its impact onevolutionary biology. Science 294, 2310–2314

Ishibashi, K., Morinaga, T., Kuwahara, M., Sasaki, S. and Imai, M.(2002) Cloning and identification of a new member of waterchannel (AQP10) as an aquaglyceroporin. Biochim. Biophys. Acta1576, 335–340

Jahn, T.P., Møller, A.L.B., Zeuthen, T., Holm, L.M., Klaerke, D.S.,Mohsin, B., Kuhlbrandt, W. and Schjoerring, J.K. (2004) Aquaporinhomologues in plants and mammals transport ammonia.FEBS Lett. 574, 31–36

Johanson, U. and Gustavsson, S. (2002) A new subfamily of majorintrinsic proteins in plants. Mol. Biol. Evol. 19, 456–461

Johanson, U., Karlsson, M., Johansson, I., Gustavsson, S.,Sjovall, S., Fraysse, L., Weig, A.R. and Kjellbom, P. (2001) Thecomplete set of genes encoding major intrinsic proteins inArabidopsis provides a framework for a new nomenclature formajor intrinsic proteins in plants. Plant Physiol. 126, 1358–1369

Johansson, I., Karlsson, M., Larsson, C. and Kjellbom, P. (2000)The role of aquaporins in cellular and whole plant water balance.Biochim. Biophys. Acta 1465, 324–342

Jones, D.T., Taylor, W.R. and Thornton, J.M. (1992) The rapidgeneration of mutation data matrices from protein sequences.Comp. Appl. Biosci. 8, 275–282

Kjellbom, P., Larsson, C., Johansson, I., Karlsson, M. andJohanson, U. (1999) Aquaporins and water homeostasis in plants.Trends Plant Sci. 4, 308–314

Kozono, D., Ding, X., Iwasaki, I., Meng, X., Kamagata, Y., Agre, P. andKitagawa, Y. (2003) Functional expression and characterization ofan archaeal aquaporin. AqpM from Methanothermobactermarburgensis. J. Biol. Chem. 278, 10649–10656

Lagree, V., Froger, A., Deschamps, S., Hubert, J.F., Delamarche, C.,Bonnec, G., Thomas, D., Gouranton, J. and Pellerin, I. (1999)Switch from an aquaporin to a glycerol channel by two aminoacids substitution. J. Biol. Chem. 274, 6817–6819

Lewis, P.O. (2001) Phylogenetic systematics turns over a new leaf.Trends Ecol. Evol. 16, 30–37

Liu, Z., Shen, J., Carbrey, J.M., Mukhopadhyay, R., Agre, P. andRosen, B.P. (2002) Arsenite transport by mammalianaquaglyceroporins AQP7 and AQP9. Proc. Natl. Acad. Sci. U.S.A.99, 6053–6058

Luyten, K., Albertyn, J., Skibbe, W.F., Prior, B.A., Ramos, J.,Thevelein, J.M. and Hohmann, S. (1995) Fps1, a yeast member ofthe MIP family of channel proteins, is a facilitator for glyceroluptake and efflux and is inactive under osmotic stress. EMBO J.14, 1360–1371

Maddison, W.P. and Maddison, D.R. (1992) MacClade: analysis ofphylogeny and character evolution, Sinauer Associates Inc.,Sunderland

Maurel, C., Javot, H., Lauvergeat, V., Gerbeau, P., Tournaire, C.,Santoni, V. and Heyes, J. (2002) Molecular physiology ofaquaporins in plants. Int. Rev. Cytol. 215, 105–148

Morinaga, T., Nakakoshi, M., Hirao, A., Imai, M. and Ishibashi, K.(2002) Mouse aquaporin 10 gene (AQP10) is a pseudogene.Biochem. Biophys. Res. Commun. 294, 630–634

Moriyasu, Y., Hattori, M., Jauh, G.-Y. and Rogers, J.C. (2003) Alphatonoplast intrinsic protein is specifically associated with vacuolemembrane involved in an autophagic process. Plant Cell Physiol.44, 795–802

Murata, K., Mitsuoka, K., Hiral, T., Walz, T., Agre, P., BernardHeymann, J., Engel, A. and Fujiyoshi, Y. (2000) Structuraldeterminants of water permeation through aquaporin-1.Nature (London) 407, 599–605

Pao, G.M., Wu, L.F., Johnson, K.D., Hofte, H., Chrispeels, M.J.,Sweet, G., Sandal, N.N. and Saier, M.H. (1991) Evolution of theMIP family of integral membrane transport proteins. Mol. Microbiol.5, 33–37

Park, J.H. and Saier, M.H. (1996) Phylogenetic characterization of theMIP family of transmembrane channel proteins. J. Membr. Biol.153, 171–180

Pavlovic-Djuranovic, S., Schultz, J.E. and Beitz, E. (2003) A singleaquaporin gene encodes a water/glycerol/urea facilitator inToxoplasma gondii with similarity to plant tonoplast intrinsicproteins. FEBS Lett. 555, 500–504

Preston, G.M., Carroll, T.P., Guggino, W.B. and Agre, P. (1992)Appearance of water channels in Xenopus oocytes expressing redcell CHIP28 protein. Science 256, 385–387

Rivers, R.L., Dean, R.M., Chandy, G., Hall, J.E., Roberts, D.M. andZeidel, M.L. (1997) Functional analysis of nodulin 26, an aquaporinin soybean root nodule symbiosomes. J. Biol. Chem. 272,16256–16261

Savage, D.F., Egea, P.F., Robles-Colmenares, Y., Iii, J.D. and Stroud,R.M. (2003) Architecture and selectivity in aquaporins: 2.5

A x-raystructure of aquaporin. Z. PLoS Biol. 1, E72

Schuurmans, J.A., van Dongen, J.T., Rutjens, B.P., Boonman, A.,Pieterse, C.M. and Borstlap, A.C. (2003) Members of the aquaporinfamily in the developing pea seed coat include representatives ofthe PIP, TIP, and NIP subfamilies. Plant Mol. Biol. 53, 633–645

Takahashi, H., Rai, M., Kitagawa, T., Morita, S., Masumura, T. andTanaka, K. (2004) Differential localization of tonoplast intrinsicproteins on the membrane of protein body typeII and aleuronegrain in rice seeds. Biosci. Biotechnol. Biochem. 68,1728–1736

Takata, K., Matsuzaki, T. and Tajika, Y. (2004) Aquaporins:water channel proteins of the cell membrane.Prog. Histochem. Cytochem. 39, 1–83

Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, J. andHiggins, D.G. (1997) The Clustal X windows interface: flexiblestrategies for multiple sequence alignment aided by qualityanalysis tools. Nucleic Acids Res. 25, 4876–4882

www.biolcell.org | Volume 97 (6) | Pages 397–414 413

Page 18: Phylogeny and evolution of the major intrinsic protein family · Biol. Cell (2005) 97, 397–414 (Printed in Great Britain) Research article Phylogeny and evolution of the major intrinsic

R. Zardoya

Uehlein, N., Lovisolo, C., Siefritz, F. and Kaldenhoff, R. (2003)The tobacco aquaporin NtAQP1 is a membrane CO2 porewith physiological functions. Nature (London) 425,734–737

Uzcategui, N.L., Szallies, A., Pavlovic-Djuranovic, S., Palmada, M.,Figarella, K., Boehmer, C., Lang, F., Beitz, E. and Duszenko, M.(2004) Cloning, heterologous expression and characterization ofthree aquaglyceroporins from Trypanosoma brucei. J. Biol. Chem.279, 42669–42676

Wallace, I.S. and Roberts, D.M. (2004) Homology modeling ofrepresentative subfamilies of arabidopsis major intrinsic proteins.Classification based on the aromatic/arginine selectivity filter.Plant Physiol. 135, 1059–1068

Wallace, I.S., Wills, D.M., Guenther, J.F. and Roberts, D.M. (2002)Functional selectivity for glycerol of the nodulin 26 subfamily ofplant membrane intrinsic proteins. FEBS Lett. 523, 109–112

Weig, A.R. and Jakob, C. (2000) Functional identification of theglycerol permease activity of Arabidopsis thaliana NLM1 andNLM2 proteins by heterologous expression in Saccharomycescerevisiae. FEBS Lett. 481, 293–298

Zardoya, R. and Villalba, S. (2001) A phylogenetic framework for theaquaporin family in eukaryotes. J. Mol. Evol. 52, 391–404

Zardoya, R., Ding, X., Kitagawa, Y. and Chrispeels, M.J. (2002) Originof plant glycerol transporters by horizontal gene transfer andfunctional recruitment. Proc. Natl. Acad. Sci. U.S.A. 99,14893–14896

Received 13 September 2004; accepted 4 November 2004

Published as Immediate Publication 25 April 2005, DOI 10.1042/BC20040134

414 C© Portland Press 2005 | www.biolcell.org