11
Research article Phylogenomic and structural modeling analyses of the PsbP superfamily reveal multiple small segment additions in the evolution of photosystem II-associated PsbP protein in green plants Naoki Sato * Department of Life Sciences, Graduate School of Arts and Sciences, University of Tokyo, Komaba 3-8-1, Meguro-ku, Tokyo 153-8902, Japan article info Article history: Received 1 October 2009 Revised 12 November 2009 Accepted 23 November 2009 Available online 26 November 2009 Keywords: Cyanobacteria Green plants Homology modeling Molecular phylogeny Oxygen evolution complex Photosynthesis abstract PsbP is a thylakoid lumen protein involved in oxygen evolution in photosystem II (PSII) in green plants. Genomic analysis identified a number of PsbP homologs in plants, algae, and cyanobacteria. To analyze the transition of cyanobacterial PsbO/U/V complex to PsbO/P/Q complex in green plants, the evolutionary history of the PsbP superfamily was reconstructed. Phylogenetic analyses suggested that PsbP homologs be classified into eight major families (A–H), which were also characterized by specific insertion/deletion of short segments, as found by sequence alignment and homology modeling. Family A represented authentic PsbP proteins involved in oxygen evolution. The cyanobacterial PsbP and plant/algal PPL (Fam- ily H), having the simplest structure, should be considered as the root of all other families of PsbP, which subsequently gained various short, family-specific structural motifs during diversification of PsbP fami- lies. Interestingly, segments specific to Family A proteins were found arranged as a ring surrounding the modeled Arabidopsis PsbP protein. These results suggest that Family A-specific additions of short seg- ments played a decisive role in the transition of PsbO/U/V to PsbO/P/Q complex in green plants. Ó 2009 Elsevier Inc. All rights reserved. 1. Introduction Oxygenic photosynthesis is the basis of all life on this planet. The PsbP protein, or 23-kDa oxygen-evolving complex (OEC) pro- tein, is a component of the luminal protein complex associated with photosystem II (PSII). The OEC in green plants contains three proteins, PsbO (33-kDa protein), PsbP (23-kDa protein), and PsbQ (17-kDa protein) (Suorsa and Aro, 2007; Roose et al., 2007; De Las Rivas et al., 2007), whereas cyanobacterial and red algal OEC contain PsbU and PsbV in place of PsbP and PsbQ (Enami et al., 2008). The OEC was, therefore, switched from the PsbO/U/V system to the PsbO/P/Q system in the lineage of green algae and plants. However, comparative analysis of cyanobacterial genomes identi- fied homologs of PsbP and PsbQ (Thornton et al., 2004; De Las Rivas et al., 2004), both of which are not essential (Thornton et al., 2004; Ishikawa et al., 2005). In plants, many additional homologs of PsbP and PsbQ have been found encoded in the completely sequenced Arabidopsis thaliana and in rice (Roose et al., 2007; De Las Rivas et al., 2007). Although many of the proteins involved in eukaryotic photosynthesis in chloroplasts have been imported from the cyanobacterial endosymbiont (Martin et al., 2002), PsbP and PsbQ represent uncommon proteins that have different physiological functions in cyanobacteria and photosynthetic eukaryotes. PsbP and PsbQ are present in the thylakoid lumen in cyanobacteria, but their roles in oxygen evolution are not clear (see Roose et al., 2007 for a review). The PsbP homologs in plants are also non- essential in photosynthesis, but act in some functions that support photosynthesis, such as the NDH complex (Ishihara et al., 2007). RNA interference studies suggest that PsbP is required for assem- bly and/or normal function of PSII (Ifuku et al., 2005; Yi et al., 2007). Therefore, it is essential to identify the phylogenetic and structural relationships of all PsbP-related proteins to understand the transition from PsbO/U/V to PsbO/P/Q in green plants. A comprehensive comparative genomic analysis was performed to identify conserved proteins in plants and cyanobacteria (Sato, 2002). The analysis used the novel clustering software, Gclust (Sato et al., 2005; Sato, 2009), to assemble clusters of homologous proteins based on an all-against-all BLASTP (Altschul et al., 1997) search of all proteins encoded by the genomes of plants, algae, cyanobacteria, and non-photosynthetic organisms. An informatic analysis per- formed in 2003 identified 56 Arabidopsis hypothetical proteins in 37 homologous protein clusters that are shared by all photosyn- thetic organisms (two eukaryotes and eight cyanobacteria) but are not found in non-photosynthetic organisms (Sato et al., 2005; Ishik- awa et al., 2009). Green fluorescent protein analysis indicated that all but two paralogous proteins are targeted to chloroplasts (Ishika- wa et al., 2009). However, fluorescence kinetic analysis performed on disruptants of these proteins in Arabidopsis and Synechocystis showed significant differences in fluorescence parameters. This 1055-7903/$ - see front matter Ó 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.ympev.2009.11.021 * Tel.: +81 3 5454 6631; Fax: +81 3 5454 6998. E-mail address: [email protected] Molecular Phylogenetics and Evolution 56 (2010) 176–186 Contents lists available at ScienceDirect Molecular Phylogenetics and Evolution journal homepage: www.elsevier.com/locate/ympev

Phylogenomic and structural modeling analyses of the PsbP superfamily reveal multiple small segment additions in the evolution of photosystem II-associated PsbP protein in green plants

Embed Size (px)

Citation preview

Page 1: Phylogenomic and structural modeling analyses of the PsbP superfamily reveal multiple small segment additions in the evolution of photosystem II-associated PsbP protein in green plants

Molecular Phylogenetics and Evolution 56 (2010) 176–186

Contents lists available at ScienceDirect

Molecular Phylogenetics and Evolution

journal homepage: www.elsevier .com/locate /ympev

Research article

Phylogenomic and structural modeling analyses of the PsbP superfamily revealmultiple small segment additions in the evolution of photosystem II-associatedPsbP protein in green plants

Naoki Sato *

Department of Life Sciences, Graduate School of Arts and Sciences, University of Tokyo, Komaba 3-8-1, Meguro-ku, Tokyo 153-8902, Japan

a r t i c l e i n f o

Article history:Received 1 October 2009Revised 12 November 2009Accepted 23 November 2009Available online 26 November 2009

Keywords:CyanobacteriaGreen plantsHomology modelingMolecular phylogenyOxygen evolution complexPhotosynthesis

1055-7903/$ - see front matter � 2009 Elsevier Inc. Adoi:10.1016/j.ympev.2009.11.021

* Tel.: +81 3 5454 6631; Fax: +81 3 5454 6998.E-mail address: [email protected]

a b s t r a c t

PsbP is a thylakoid lumen protein involved in oxygen evolution in photosystem II (PSII) in green plants.Genomic analysis identified a number of PsbP homologs in plants, algae, and cyanobacteria. To analyzethe transition of cyanobacterial PsbO/U/V complex to PsbO/P/Q complex in green plants, the evolutionaryhistory of the PsbP superfamily was reconstructed. Phylogenetic analyses suggested that PsbP homologsbe classified into eight major families (A–H), which were also characterized by specific insertion/deletionof short segments, as found by sequence alignment and homology modeling. Family A representedauthentic PsbP proteins involved in oxygen evolution. The cyanobacterial PsbP and plant/algal PPL (Fam-ily H), having the simplest structure, should be considered as the root of all other families of PsbP, whichsubsequently gained various short, family-specific structural motifs during diversification of PsbP fami-lies. Interestingly, segments specific to Family A proteins were found arranged as a ring surrounding themodeled Arabidopsis PsbP protein. These results suggest that Family A-specific additions of short seg-ments played a decisive role in the transition of PsbO/U/V to PsbO/P/Q complex in green plants.

� 2009 Elsevier Inc. All rights reserved.

1. Introduction

Oxygenic photosynthesis is the basis of all life on this planet.The PsbP protein, or 23-kDa oxygen-evolving complex (OEC) pro-tein, is a component of the luminal protein complex associatedwith photosystem II (PSII). The OEC in green plants contains threeproteins, PsbO (33-kDa protein), PsbP (23-kDa protein), and PsbQ(17-kDa protein) (Suorsa and Aro, 2007; Roose et al., 2007; DeLas Rivas et al., 2007), whereas cyanobacterial and red algal OECcontain PsbU and PsbV in place of PsbP and PsbQ (Enami et al.,2008). The OEC was, therefore, switched from the PsbO/U/V systemto the PsbO/P/Q system in the lineage of green algae and plants.However, comparative analysis of cyanobacterial genomes identi-fied homologs of PsbP and PsbQ (Thornton et al., 2004; De Las Rivaset al., 2004), both of which are not essential (Thornton et al., 2004;Ishikawa et al., 2005). In plants, many additional homologs of PsbPand PsbQ have been found encoded in the completely sequencedArabidopsis thaliana and in rice (Roose et al., 2007; De Las Rivaset al., 2007). Although many of the proteins involved in eukaryoticphotosynthesis in chloroplasts have been imported from thecyanobacterial endosymbiont (Martin et al., 2002), PsbP and PsbQrepresent uncommon proteins that have different physiologicalfunctions in cyanobacteria and photosynthetic eukaryotes. PsbP

ll rights reserved.

and PsbQ are present in the thylakoid lumen in cyanobacteria,but their roles in oxygen evolution are not clear (see Roose et al.,2007 for a review). The PsbP homologs in plants are also non-essential in photosynthesis, but act in some functions that supportphotosynthesis, such as the NDH complex (Ishihara et al., 2007).RNA interference studies suggest that PsbP is required for assem-bly and/or normal function of PSII (Ifuku et al., 2005; Yi et al.,2007). Therefore, it is essential to identify the phylogenetic andstructural relationships of all PsbP-related proteins to understandthe transition from PsbO/U/V to PsbO/P/Q in green plants.

A comprehensive comparative genomic analysis was performedto identify conserved proteins in plants and cyanobacteria (Sato,2002). The analysis used the novel clustering software, Gclust (Satoet al., 2005; Sato, 2009), to assemble clusters of homologous proteinsbased on an all-against-all BLASTP (Altschul et al., 1997) search of allproteins encoded by the genomes of plants, algae, cyanobacteria,and non-photosynthetic organisms. An informatic analysis per-formed in 2003 identified 56 Arabidopsis hypothetical proteins in37 homologous protein clusters that are shared by all photosyn-thetic organisms (two eukaryotes and eight cyanobacteria) but arenot found in non-photosynthetic organisms (Sato et al., 2005; Ishik-awa et al., 2009). Green fluorescent protein analysis indicated thatall but two paralogous proteins are targeted to chloroplasts (Ishika-wa et al., 2009). However, fluorescence kinetic analysis performedon disruptants of these proteins in Arabidopsis and Synechocystisshowed significant differences in fluorescence parameters. This

Page 2: Phylogenomic and structural modeling analyses of the PsbP superfamily reveal multiple small segment additions in the evolution of photosystem II-associated PsbP protein in green plants

N. Sato / Molecular Phylogenetics and Evolution 56 (2010) 176–186 177

indicated that the function of the chloroplast proteins of endosym-biont origin is not exactly identical to that of the correspondingortholog in cyanobacteria, except for well-established componentsof the photosystems, electron transport complexes, and carbon fix-ation cycles. PsbP and PsbQ are additionally regarded as exceptionsof functional conservation. These proteins are also different in thegreen (green algae and land plants) and red (red algae and hetero-konts) lineages after the endosymbiosis (Sato, 2006).

PsbP is especially interesting because many homologs havebeen found in plants (Roose et al., 2007), such as PsbP-like (PPL)and PsbP-domain (PPD) proteins. Unfortunately, the relationshipamong these homologs, as well as cyanobacterial homologs, hasnot been thoroughly analyzed, and different terminologies havebeen used (De Las Rivas and Roman, 2005; Roose et al., 2007; Ifukuet al., 2008). In particular, the cyanobacterial homolog is calledPsbP, PsbP-like, or cyanoP. This is because, first, most of these pro-teins are not stably associated with PSII (Thornton et al., 2004),although at least a part is bound to PSII (Ishikawa et al., 2005);and second, a clear phenotype has not been found in the disruptantof the cyanobacterial psbP gene, such as sll1418 in Synechocystis(Ishikawa et al., 2005; Summerfield et al., 2005). Another studysuggested that cyanobacterial PsbP is involved in stabilization ofthe donor side of PSII (Sveshnikov et al., 2007). The fact that theX-ray structure of tobacco PsbP has been determined (Ifukuet al., 2004) made possible comparative structural analysis usinghomology modeling. Knockdown of the PsbP gene resulted in aber-rant structure of thylakoids (Ido et al., 2009). Functional analysis ofPsbP homologs in tobacco by RNA interference was also reported(Ishihara et al., 2005). PsbP is, therefore, a good target for analysisof phylogenomics based on structural comparison.

In the present study, I attempted comprehensive classificationof all PsbP homologs by phylogenetic analysis and sequence align-ment, supported by structural modeling to identify major familiesof PsbP. The cyanobacterial PsbP and the plant PPL proteins are theancestral family having the simplest structure, whereas all otherfamilies are the result of diversification and domain acquisitionin the early evolution of algae. A number of simultaneous additionsof short domains are suggested as key features in the creation ofthe plant PsbP functioning in oxygen evolution.

2. Materials and methods

2.1. Sequences

All homologs of PsbP were retrieved from the Gclust database(ALL95_4 dataset at http://gclust.c.u-tokyo.ac.jp/) according tothe published list of homologs (De Las Rivas and Roman, 2005;Roose et al., 2007; Ishihara et al., 2007) and the list of the PsbP fam-ily in UniProtKB (Leinonen et al., 2006). Gclust is a comparativegenomic database of homologous protein clusters suitable for phy-logenetic profiling (Sato, 2002, 2009; Sato et al., 2005). The sourcesof the original databases are described on the website, but theArabidopsis sequences were retrieved from GenBank correspondingto The Arabidopsis Information Resource (TAIR, http://www.arabidopsis.org/) annotation version 7 as of 2007. The rice sequencesoriginated from the Rice Annotation Project Database (RAP-DB,http://rapdb.dna.affrc.go.jp/) version 3. The sequence of therecently identified longer splicing variant AT3G05410.2 (FamilyF) was retrieved from GenBank (NM_001161120).

2.2. Phylogenetic analysis

Amino acid alignment was prepared using the Muscle programversion 3.6 (Edgar, 2004) and was later manually modified accord-ing to the first round of structural modeling. Such refinement was

repeated to give reasonable homology models. The revised se-quence alignment was used for subsequent phylogenetic analysisand final homology modeling. This repetition of alignment modifi-cation and structural modeling improved the quality of alignmentand modeling. The sites having gaps in more than 20% of sequenceswere removed using SISEQ software version 1.59 (Sato, 2000),which was also used for conversion of various sequence formats.ClustalX version 1.83 was also used for profile alignment and tomanage aligned sequences (Thompson et al., 1994).

The alignment (88 OTU, 149 sites) was used in the phylogeneticanalysis with the following methods. A neighbor-joining (NJ) treewas estimated with MEGA version 4 (Tamura et al., 2007) usingthe JTT model with an equal evolutionary rate; calculation withthe maximum likelihood (ML) method was performed with Protmlin MOLPHY version 2.3beta (Adachi and Hasegawa, 1996), Pamlversion 4.2 (Yang, 2007), and TreeFinder March 2008 version (Jobbet al., 2004), using the WAG model (used as a user-defined matrixin Protml). Bayesian inference (BI) was performed using MrBayessoftware version 3.2 (Ronquist and Huelsenbeck, 2003), withthe following options: aamodelpr = fixed(wag), ratepr = variable,ngen = 4000000, samplefreq = 200, burnin = 8000. For 45 OTUs,ngen = 2000000, samplefreq = 100, burnin = 5000. The AU test,which is intended to test the relative likelihood of various forms oftrees based on support levels of individual sites, was performed withthe CONSEL program (Shimodaira and Hasegawa, 2001), with theoutput of Protml and Paml, based on the 64 most probable trees se-lected by Protml using constrained trees according to the results ofML and BI analyses. The WAG model was used. The ‘cleandata’ op-tion was turned off in Paml. Phylogenetic trees were drawn withNJplot software (Perrière and Gouy, 1996) and Mesquite utility ver-sion 2.5 (http://mesquiteproject.org). The alignment files used forphylogenetic analysis are available as supplements.

2.3. Homology modeling

Homology modeling of all Arabidopsis PsbP homologs, as well assome selected proteins, was performed using Modeller softwareversion 8v1 or 8v2 (Šali and Blundell, 1993) with the automodelscript. As described in 2.2, refinement of alignment and modelingwas repeated to obtain reasonable results. Both of the structuresin the PDB entry IV2B (Ifuku et al., 2004) were used as templates.This file describes two monomers arranged in a symmetric posi-tion, but the two loops and the N-terminus are invisible becauseof high flexibility. The corresponding parts in the inferred modelsare an invention of the software and are meaningful as an indica-tion of the presence of a structural part. Cartoon models of struc-ture were prepared with Moscript software (Kraulis, 1991). Atransparent surface model was prepared with VMD software(Humphrey et al., 1996) and rendered with Raster 3D software ver-sion 2.7 (Merritt et al., 1997). The coordinate files of homologymodeling are available from the author on request.

3. Results

3.1. Comprehensive collection of PsbP homologs

Use of the Gclust database greatly increased the efficiency andaccuracy of sequence collection. As shown in Figs. 1 and 2, mostmajor families were collected as single clusters in the database.Exceptions were the PsbP homolog in Gloeobacter violaceus (cyano-bacterium) in the singleton cluster 122258, one of the PsbP homo-logs in Cyanidioschyzon merolae (red alga) in the singleton cluster121818, and homologs in three marine cyanobacteria (Cluster31427). These were collected by tracking the ‘‘Related Groups” linkin the Cluster Display. Not all such links were followed, however, if

Page 3: Phylogenomic and structural modeling analyses of the PsbP superfamily reveal multiple small segment additions in the evolution of photosystem II-associated PsbP protein in green plants

Fig. 1. Alignment of amino acid sequences in the mature region of PsbP homologs. The structure motifs (b1-10, a1, a2) are those of tobacco PsbP. A, B, C, D, E, F, G, H1, and H2indicate families. Various family-specific additional motifs are marked by red boxes. The default coloring scheme of ClustalX was used. The orange arrows on top of thealignment indicate important residues referred to in Section 4.3: K28/11, K30/13, 134/48, K280/143, and K303/160. (The left number refers to the position in the alignment,while the right number refers to the position in the spinach sequence.). Each sequence name is a composite of the abbreviated organism name and gene identifier in theGclust database. Organism name abbreviations are as follows: ATH, Arabidopsis thaliana; PoTR, poplar (Populus trichocarpa); OSA, rice (Oryza sativa); PPT, moss (Physcomitrellapatens); CRE, Chlamydomonas reinhardtii; OTAU, Ostreococcus tauri; CME, Cyanidioschyzon merolae; PTR, Phaeodactylum tricornutum; TPS, Thalassiosira pseudonana; Ana,Anabaena sp. PCC 7120; Ava, Anabaena variabilis ATCC 29413; Glv, Gloeobacter violaceus; Npun, Nostoc punctiforme PCC73102; Pm1, Prochlorococcus marinus MED4; Pm2, P.marinus MIT9313; Pm3, P. marinus SS120; Pm4, P. marinus MIT9312; Pm5, P. marinus NATL2A; Pm6, P. marinus MIT9301; Pm7, P. marinus MIT9303; Pm8, P. marinus MIT9315;Pm9, P. marinus NATL1A; PmA, P. marinus AS9601; S63, Synechococcus sp. PCC 6301; S79, S. sp. PCC 7942; S81, S. sp. WH8102; S93, S. sp. CC9311; S96, S. sp. CC9605; S99, S. sp.CC9902; Syn, Synechocystis sp. PCC 6803; Tel, Thermosynechococcus elongatus; Ter, Trichodesmium erythraeum; YelA, Cyanobacterium Yellowstone A-prime; YelB,Cyanobacterium Yellowstone B-prime.

178 N. Sato / Molecular Phylogenetics and Evolution 56 (2010) 176–186

clearly identifiable annotation was ascribed to members of relatedgroups, such as PTAC3 and Ycf37. Cluster 21254 was not includedeither because the sequence was highly divergent from that ofother PsbP homologs. I tried to include as many clusters as possiblein the current analysis, but some very distant relatives could havebeen missed, such as MOG1p, which is thought to be structurallysimilar to PsbP and was not included (see Section 4.2) because se-quence homology was very low. Curiously, no sequences of non-photosynthetic organisms or non-oxygenic bacteria were includedin the homolog clusters. The use of Gclust has benefits over a sim-ple BLASTP search. The sequence of AT3G05410 in the Gclust data-base lacked a C-terminal half when compared with other PsbPhomologs. Consulting the latest version of data in TAIR identifieda longer splicing variant, AT3G05410.2, and this was included inthe collection.

3.2. Identification of major PsbP families

Alignment of mature sequences (Fig. 1) was used for phyloge-netic inferences. As described in 2.2, the alignment was repeatedly

refined by homology modeling, and that shown in Fig. 1 was rea-sonably refined. Fig. 2 shows the result of phylogenetic analysiswith 88 protein sequences. In total, nine groups were identifiedas strongly supported clades, namely A, B, C, D, E, F, G, H1, andH2; these are referred to as ‘families’ hereafter. Families H1 andH2 were subgroups of Cluster 2481. H1 included homologs ofgreen algae and plants, whereas H2 included homologs of cyano-bacteria, red algae, and diatoms. As tested later, support for themonophyly of H1 and H2 was not always high. However, the se-quence alignment (Fig. 1) clearly indicated that H1 and H2 are sim-ilar in structural organization, without any extra domains or motifs(this will be examined in detail below). Hence, these two familieswere considered to be subgroups of Family H, giving eight familiesin total.

All other clusters included sequences only of eukaryotes, exceptthe homolog of a putative primitive cyanobacterium Gloeobacter inFamily D. The problem with Gloeobacter will be discussed below.The homolog in Cyanidioschyzon (red alga) was found in FamilyC, whereas homologs in the diatoms Thalassiosira and Phaeodacty-lum were in Families C and F, respectively. Interestingly, each

Page 4: Phylogenomic and structural modeling analyses of the PsbP superfamily reveal multiple small segment additions in the evolution of photosystem II-associated PsbP protein in green plants

Fig. 1 (continued)

N. Sato / Molecular Phylogenetics and Evolution 56 (2010) 176–186 179

family contained plant and green algal homologs. Family B con-sisted of two subfamilies, B1 and B2, each of which included aset of plant and green algal homologs. Family H1 contained extrasequences of Arabidopsis and poplar. Family A contained severalclosely related paralogs of Arabidopsis, poplar, and Physcomitrella.Family E contained two copies each of poplar and rice homologs.Family F is a new group of PsbP homologs that have not been rec-ognized previously. The initial gene model of the Arabidopsishomolog of this family in TAIR was a truncated protein. The genemodel of the corresponding rice homolog Os01g0934400 lacksthe N-terminus, but whether this is due to an unidentified exonis still not clear in version 4 of RAP-DB. Family F contained twopoplar sequences.

These results suggest that the PsbP homologs of eukaryotesoriginated from the cyanobacterial PsbP family (H2), and thatdiversification of the eukaryotic families occurred early duringthe evolution of algae, possibly before diversification of the majorgreen and red lineages. Diatoms are considered as the result of sec-ondary endosymbiosis of a red algal endosymbiont. However, theexact relationship of the eukaryotic families (A–G) was not re-solved with confidence in this analysis.

3.3. Relationship of PsbP families

To further analyze the relationship of various families of PsbP,ill-aligned sequences were removed. All sequences in poplar and

most sequences in Physcomitrella were also removed because theseoriginated from high-throughput genome sequencing and stillneed high-quality curation. Most cyanobacterial sequences werealso removed to decrease redundancy. The resulting 45 sequenceswere used for further phylogenetic analysis. Although BI analysishad not given clear-cut resolution of families in the larger dataset(Fig. 2), clustering of Families A and B now became clear (Fig. 3),and Families C, D, E, and F also formed a cluster. The confidence le-vel of these deep branches was acceptably high in BI and ML anal-yses, but NJ analysis gave different patterns of branching,depending on the parameters used. Family B is known as a PsbP-domain (PPD) protein that was recognized after the PsbP-like(PPL) proteins in Family H1 were identified, but the current analy-sis indicated that this family is most closely related to authenticPsbP involved in oxygen evolution (Family A). The PsbP homologin Gloeobacter was always associated with Family D.

3.4. Rigorous test of branching

To further evaluate the relationship among various families, anAU test was performed with representative trees, which were se-lected by the Protml program using a constraint in which all pos-sible inter-family relationships were allowed while fixing mostintra-family branching patterns that were taken from the besttrees of BI and ML. The selected 64 trees were evaluated by theAU test using the Protml and Paml programs. Both results were

Page 5: Phylogenomic and structural modeling analyses of the PsbP superfamily reveal multiple small segment additions in the evolution of photosystem II-associated PsbP protein in green plants

Fig. 2. Unrooted maximum likelihood tree of 88 PsbP homologs. This tree was calculated with the TreeFinder program. The three numbers on each branch indicate theconfidence level in percentage of NJ, BI, and ML methods. If branching is not consistent with the ML tree shown, a hyphen is given as the confidence level. Branches with >90%confidence in ML analysis are shown in bold lines. Broken lines indicate branches with <70% confidence. The cluster number in the Gclust database is indicated with eachfamily.

180 N. Sato / Molecular Phylogenetics and Evolution 56 (2010) 176–186

Page 6: Phylogenomic and structural modeling analyses of the PsbP superfamily reveal multiple small segment additions in the evolution of photosystem II-associated PsbP protein in green plants

Prochlorococcus marinus NATL2A PMN2A_0650Synechococcus sp. CC9902 Syncc9902_1400

Cyanobacterium Yellowstone A prime CYA_1943_psbP

98/91/100

Anabaena sp. PCC 7120 all3076Anabaena variabilis ATCC 29413 0834

Nostoc punctiforme NpF5517

100/99/100

Trichodesmium erythraeum Tery_0164

99/93/100

Cyanidioschyzon merolae CMN290CPhaeodactylum tricornutum 4835999/99/100

61/80/98

-/48/55

Ostreococcus tauri 33870rice Os08g0347500

rice Os10g0461100Arabidopsis thaliana AT3G55330 (PPL1)

100/98/100

97/83/99

Arabidopsis thaliana AT2G39470 (PPL2)

-/75/96

Chlamydomonas reinhardtii 127879rice Os01g0617900

Arabidopsis thaliana AT5G1145099/74/100Phaeodactylum tricornutum 49743

Thalassiosira pseudonana 5405Ostreococcus tauri 33605

Chlamydomonas reinhardtii 191340_PSBP5Arabidopsis thaliana AT3G0541088/82/99

99/98/100

89/61/91

rice Os07g0275100rice Os12g0564400Arabidopsis thaliana AT4G15510

100/89/100

61/92/90

100/100/100

Gloeobacter violaceus gvip201_psbPChlamydomonas reinhardtii 167973

Physcomitrella patens 175716rice Os08g0512500

Arabidopsis thaliana AT1G76450100/99/10097/91/100

99/98/100

Cyanidioschyzon merolae CMT368CThalassiosira pseudonana 2597

Chlamydomonas reinhardtii 191582_PSBP2Arabidopsis thaliana AT2G28605

-/54/74

8857//99

-/40/52

-/89/81

100/99/100

-/64/82

-/96/100

Ostreococcus tauri 10204Physcomitrella patens 182654

rice Os12g0190200Arabidopsis thaliana AT1G7709096/91/100

Physcomitrella patens 89973rice Os01g0805300

Arabidopsis thaliana AT3G5665073/-/50

100/98/100

100/99/100

91/86/100

Chlamydomonas reinhardtii 148057_PSBPPhyscomitrella patens 183147

rice Os07g0141400

Arabidopsis thaliana AT1G06680 (PsbP1)73/94/10084/77/99

66/45/64

100/99/100

57/89/95

100/90/100

90/92/100

88/81/99

0.2

Family ACluster 8390

Family BCluster 7974

Family CCluster 11774

Family DCluster 18055

Family ECluster 9715

Family FCluster 13083

Family GCluster 21170

Family H1Cluster 2481

Family H2Cluster 2481

NJ/ML/BI

β1

Loop3ALoop4ALoop7A

Loop2

β4

N-terminal extension and extended Loop1

Loop1DLoop5

Loop3ELoop4E

Loop3G,Loop4G,C-term

(Family V)

Loop6

Fig. 3. Unrooted Bayesian inference consensus tree of 45 selected PsbP homologs. This tree was calculated with the MrBayes program. The final average standard deviation ofsplit frequencies was 0.006419. The relationship of major groups is not completely resolved. The three numbers on each branch indicate the confidence level in percentage ofNJ, ML, and BI methods. Lineage-specific addition of various motifs is shown in boxes.

N. Sato / Molecular Phylogenetics and Evolution 56 (2010) 176–186 181

similar, with only a minor difference in ln L values (about 0.1). All64 trees gave AU probabilities higher than 0.1 and, therefore, werenot discarded. Fig. 4A shows top-ranking tree topologies. In thisanalysis, the enigmatic PsbP homolog in Gloeobacter was used asa single clade (indicated as ‘‘V” for ‘‘violaceus”). The clustering ofA and B was always supported. Hence, the trees are drawn usingthe clade (A and B) as the outgroup, but these trees are unrooted.The associations of C and D, E and F, as well as G, H1, and H2 werealso highly supported, although the positions of G and H1 weresomewhat unstable. The position of V (Gloeobacter) was also unsta-ble, being associated with C or D in most cases.

The monophyly of Gloeobacter PsbP and other cyanobacterialPsbP was evaluated using a constraint keeping V and H2 as a single

cluster. The results (Fig. 4B) clearly indicate that the best of suchtrees was undoubtedly inferior to the top-ranking trees, giving dif-ference in ln L values of more than 12, and AU values less than0.052. Therefore, the monophyly of Gloeobacter PsbP and othercyanobacterial PsbP was rejected.

3.5. Localization of family-specific domains in the 3D structure

The diversity of various families of PsbP also included insertion/deletion of short sequences (Fig. 1). To evaluate the significance ofsuch short sequences, their position was estimated using 3D mod-eling. For this purpose, homology modeling was performed usingthe Modeller program in automatic mode. Although automatic

Page 7: Phylogenomic and structural modeling analyses of the PsbP superfamily reveal multiple small segment additions in the evolution of photosystem II-associated PsbP protein in green plants

Tree 5: 0.1/0.772 (highest au)

A B C D V E F G H1

H2

Tree 20: 0.2/0.621Tree 32: 0.3/0.590

A B C D V E F H1

G H2

Tree 19: -0.1/0.631 (best lnL)Tree 28: 0.1/0.615

A B C D V E F H1

G H2

Tree 38: 0.1/0.640Tree 44: 0.2/0.579

A B C D V E F G H1

H2

A. Top ranking topologies among 62 significant ones

Tree 2: 0.9/0.456

A B C V D E F G H1

H2

Diff lnL / AU test

Tree 65: 12.4/0.052Tree 67: 12.4/0.052

A B C D E F G H1

V H2

Tree 66: 14.5/0.036

A B C D E F G H1

V H2

B. Evaluation of cyanobacterial monophyly hypothesis

Fig. 4. Approximately unbiased (AU) test of tree topologies. Results of the AU test for the top-ranking trees are shown. Gloeobacter PsbP is shown as Family V. Different treeswith different intra-family topologies are shown as a single simplified tree. The complete list of trees (simplified and original) and the complete results are available asSupplementary Document 1. (A), The five top ranking simplified trees with negative difference in log likelihood value and score of the AU test. (B), Two best trees withcyanobacterial monophyly. Note that the negative log-likelihood values are significantly high.

182 N. Sato / Molecular Phylogenetics and Evolution 56 (2010) 176–186

mode was used, the alignment was repeatedly refined using mod-eling, and therefore regarded as a result of iteration. The basicassumption for this method is that all PsbP homologs shouldroughly share common basic structures similar to that determinedfor tobacco PsbP (Ifuku et al., 2004). If a strange structure is mod-eled for a particular homolog, then this homolog has a differentstructure, which cannot be modeled using the known structureas a template, or, alternatively, the alignment is not correct. Forthis reason, I repeatedly refined the alignment manually to repro-duce a structure similar to that of tobacco PsbP. The results (Fig. 5)show mostly similar structures with different sizes of loops. Loops3 and 4, as well as the N-terminal part, were not resolved in theoriginal X-ray structure of tobacco PsbP (Ifuku et al., 2004), whichis why further refinement of the models was not attempted.Homology modeling is known to be inaccurate, especially if homol-ogy is low, because low homology results in inaccurate alignment.In fact, in the initial stage of the present study, poor alignment re-sulted in largely deformed and curious structures. After the align-ment was refined, as described in Section 2.2, essentially similarstructures were reconstructed (Fig. 5). The fact that all proteinscould be modeled essentially in similar structures justifies thealignment, and the models help to recognize various family-spe-cific short domains in the essentially common 3D structure. Thelack of the small beta strand 1 in many families merits experimen-tal verification.

Comparison of various families in amino acid sequence (Fig. 1)and structure (Fig. 5) indicated that the short segments that arespecific to each family were mostly located within the loops or in

the N-terminal region (except in Family F). Family H (H1 and H2)was the simplest in sequence and structure, having no extra loops,such as the unstructured Loops 3 and 4 found in other families.Loop 3 is indeed very large in Family A if explicitly shown as amodeled structure, but was not recognized previously because this‘X-ray invisible’ part is not explicitly shown in the structure of to-bacco PsbP (Ifuku et al., 2004, 2008). Loop 3 might not have a loopstructure, as shown in the model (Fig. 5), but could be folded intosome other structure in the real protein. Previous reviews on thestructural features of PsbP (De Las Rivas and Roman, 2005; DeLas Rivas et al., 2007; Ifuku et al., 2008) also disregarded the largeN-terminal mobile domain and Loop 4A. Loop 7A, although notvery obvious, is conserved in Family A and has been designatedthe GDKRWFKG motif (De Las Rivas et al., 2007). Family B alsohas a large N-terminal domain. Family F is clearly very differentfrom other PsbP families in its N-terminal half (four beta strands).Family B2 is different from B1 in its large Loop 1. Families C, D, andF are characterized by, respectively, Loop 6, Loops 2 and 5, andLoops 2, 3, and 4. The Gloeobacter PsbP shares Loop 2 with FamilyD, but other features differ. These results suggest that not only se-quence variation but also insertion/deletion of short segmentswere involved in evolution of the PsbP superfamily.

3.6. Membrane targeting

PsbP is a luminal protein within the thylakoid membrane. Afterimport into the chloroplast, it is targeted to the thylakoid mem-brane by the Twin-arginine transport (Tat) system (Robinson and

Page 8: Phylogenomic and structural modeling analyses of the PsbP superfamily reveal multiple small segment additions in the evolution of photosystem II-associated PsbP protein in green plants

Arabidopsis A2 (AT2G30790, PsbP2)Arabidopsis A1 (AT1G06680 (PsbP1) Arabidopsis B2 (AT3G56650, PPD6)Arabidopsis B1 (AT1G77090, PPD4)

Arabidopsis F (AT3G05410)Arabidopsis E (AT4G15510, PPD1) Arabidopsis G (AT5G11450, PPD5)

1V2B_A (tobacco PsbP)

Template

Arabidopsis D (AT1G76450, PPD3)Arabidopsis C (AT2G28605, PPD2) Gloeobacter D (gvip201, psbP)

Arabidopsis H1 (AT3G55330, PPL1) Arabidopsis H2 (AT2G39470, PPL2) Synechocystis H (sll1418, PsbP) Anabaena H (all3076, PsbP)

Loop3Loop3

Loop4Loop4

Loop3ALoop3A

Loop4ALoop4A

N-termsheet

C-termhelix(α2)

Mainβ-sheet

Helix(α1)

N-termextention

N-termextention

N-termextention

N-termextention

Loop3ALoop3A

Loop4ALoop4A

Loop3ELoop3E

Loop4ELoop4E

Loop3GLoop3G

Loop4GLoop4G

Loop1

ExtendedLoop1

C-termextention

Loop2BLoop2B

Loop2DLoop2DLoop2DLoop2D

Loop2ELoop2E

N-termextention

Loop1B2

β3 lacking β3 lacking β3 lacking

β3 lacking β3 lackingUnstructuredN-term

Loop2Loop2

Loop6Loop6

Loop5Loop5

Loop7ALoop7A

Loop7ALoop7A Loop7ALoop7A

β1

β2β3

β4

β5

β6

β7

β8

β9

β10

Fig. 5. Homology models of all Arabidopsis PsbP proteins and selected cyanobacterial homologs. The rainbow color gradation from blue to red indicates direction from the N-terminus to C-terminus.

N. Sato / Molecular Phylogenetics and Evolution 56 (2010) 176–186 183

Bolhuis, 2004). There is a characteristic doublet of arginines thatprecedes a hydrophobic stretch in the targeting sequence (Supple-mentary Fig. S1). The PsbP homologs in eukaryotes all have at leastone similar arginine doublet. Several putative proteins in poplar,Physcomitrella, Chlamydomonas, and Ostreococcus lack the N-termi-

nal portion (chloroplast-targeting and luminal-targeting), but alonger N-terminus could be predicted if a 50-most exon were iden-tified. The position of ‘RR’ varies in different families and familymembers, but this variation is caused by poor sequence conserva-tion in the transit sequence. Conservation of the ‘RR’ motif is

Page 9: Phylogenomic and structural modeling analyses of the PsbP superfamily reveal multiple small segment additions in the evolution of photosystem II-associated PsbP protein in green plants

184 N. Sato / Molecular Phylogenetics and Evolution 56 (2010) 176–186

consistent with the experimental detection of all PsbP homologsexcept a Family F member in the thylakoid lumen of Arabidopsis(Zybailov et al., 2008).

In contrast, all PsbP homologs in cyanobacteria, except Nostocand Anabaena, possess a lipobox (Babu and Sankaran, 2002), whichis necessary for membrane targeting and diacylglycerylation atthe conserved cysteine residue in bacteria and chloroplasts. How-ever, it is still unknown whether cyanobacterial PsbP is linked todiacylglycerol.

4. Discussion

4.1. Strategy for complete listing of protein families over allphotosynthetic organisms

The PsbP homologs are present in all organisms that performoxygenic photosynthesis. Each cyanobacterium contains a singlecopy of the psbP gene, whereas photosynthetic eukaryotes haveseveral copies per genome. Lists of PsbP homologs have alreadybeen posted in various databases, such as UniprotKB, and havebeen presented in recent papers (De Las Rivas and Roman, 2005;Roose et al., 2007; Ishihara et al., 2007). However, these lists re-main incomplete because various homologs in organisms otherthan Arabidopsis and Synechocystis have not been exhaustivelylisted. In addition, Family F remained unknown until the presentstudy. In this respect, use of the Gclust database introduced a rad-ically new way of obtaining a complete listing of homologs in com-pletely sequenced genomes. Gclust performs automatic clusteringof homologous proteins using a heuristic called the ‘entropy-opti-mized organism count (EOOC) method’ (Sato, 2009) for findingorthologs in every genome, just as biologists do by eye and byhand. One of the major interests in the current study was, there-fore, to demonstrate the usefulness and power of the Gclust data-base for comparative analysis of protein families.

4.2. Phylogenetic analysis of PsbP families

The relationship of the numerous PsbP homologs was unclear inrecent reviews. Only simple phylogenetic trees of PsbP of plantsand cyanobacteria with PPL proteins had been presented (De LasRivas and Roman, 2005; Ishihara et al., 2007) using GloeobacterPsbP as a root. These trees were not rigorously tested with variousphylogenetic analyses, and no results of statistical tests were pre-sented. As pointed out in Section 3.4 (Fig. 4B), the Gloeobacter PsbPis divergent from other cyanobacterial PsbP, and it is difficult todetermine unequivocally its phylogenetic position with respectto other PsbP homologs, although it is associated with Family Din many analyses (Figs. 2, 3, and 4A). The root of all PsbP familiesshould, therefore, be placed near to the root of Family H2, consist-ing of PsbP of all other cyanobacteria, because the proteins in thisfamily are simplest in sequence (Fig. 1) and in structure (Fig. 5). Noother homolog of PsbP has been found in non-photosyntheticorganisms, including non-oxygenic photosynthetic bacteria. Aputative homolog in Methanothermobacter was used as a root in aprevious paper (Thornton et al., 2004), but I detected only spurioussimilarity of this protein to Synechocystis sll1418 (PsbP) at E = 10�3.Structural similarity of PsbP was also found with yeast MOG1pprotein (RAN guanine nucleotide release factor binding protein)according to the Structural Classification of Proteins (SCOP) data-base (Superfamily Mog1p/PsbP-like [55724]: Murzin et al., 1995:http://scop.mrc-lmb.cam.ac.uk/scop/), but MOG1p belongs to adistinct cluster of proteins in the Gclust database (Cluster 6800),which included a single ortholog per organism in every eukaryote(photosynthetic and non-photosynthetic). Since no homolog ofRAN is present in chloroplasts or cyanobacteria, the structural

similarity of PsbP and MOG1p does not extend to functional simi-larity. There have been some discussions on this point (Roose et al.,2007; Ifuku et al., 2008). A plausible argument might be thatMOG1p originated from the PsbP family because there is no pro-karyotic ortholog of MOG1p.

All PsbP homologs were classified into eight families, fromFamily A to Family H. Family F is a new member of the PsbP super-family. However, Family B consists of two subfamilies, and FamilyH also consists of subfamilies H1 and H2, which include the greenlineage and the red lineage plus cyanobacteria, respectively.Although the monophyletic relationship of H1 and H2 was not al-ways supported by the phylogenetic tests (Fig. 4A), structural andsequence comparison of PsbP families showed that H1 and H2 aresimilar in structure and the simplest of all PsbP families. This find-ing suggests that the cyanobacterial PsbP was the origin of the PsbPsuperfamily. The fact that H2 included PsbP of Cyanidioschyzon (redalga) and Phaeodactylum (diatom) suggests that these homologs inthe red lineage represent direct descendants of the PsbP of theendosymbiont, although the exact position of branching betweenthese H2 homologs of the red lineage and cyanobacterial H2 differsin Figs. 2 and 3. Subfamily H1 branched next, and then variousfamilies of eukaryotes diverged. Among them, Families A and Bshare the conserved b1 region, which indicates that the diversifica-tion of these two families occurred separately from the diversifica-tion of Families C, D, E, and F. Family G is close to Family H in thephylogenetic trees, but contains group-specific loops and a C-ter-minus (Figs. 1, 3 and 5). A plausible interpretation may be thatFamily G diverged from Family H1, as supported by many treesamong those tested (Fig. 4A).

An interesting fact is that each family contained members of thegreen lineage consisting of green algae (Ostreococcus and Chla-mydomonas), moss, and seed plants (Fig. 2). Families C and F con-tain members from the red lineage (red algae and heterokonts).The diversification of various PsbP families (A–F) must have oc-curred before the diversification of the green and red lineages.The lack of a Family F member in Cyanidioschyzon could be ex-plained by specific gene loss during genome reduction in this redalga living in an extreme habitat (Matsuzaki et al., 2004). Althoughmany lateral gene transfers from green algae have been suggestedin diatoms (Moustafa et al., 2009), the phylogenetic trees in Figs. 2and 3 do not suggest that there were such gene transfers in FamilyC and Family F.

4.3. Conservation of functional residues

There are a number of residues that are known to be importantin the function of PsbP protein. Major data were obtained withspinach protein. The N-terminal 9, 15, or 19 residues, includingK28/11 and K30/13 (the two residue numbers separated by a slashrefer to the position indicated by an arrow in Fig. 1 and to the posi-tion in the spinach mature sequence: UniProt P12032) are impor-tant in either binding to PSII or in activation of oxygen evolutionthrough modification of Ca2+ and Cl� binding (summarized by Ifu-ku et al., 2008). The basic residues R134/48, K280/143, and K303/160 are also important in binding the acidic protein PsbO. All theimportant residues cited here are conserved in Family A, with rareexceptions, but not in Family B, which is the closest relative ofFamily A. Only K280/143 is conserved in all families except FamilyB. These comparisons suggest that the role of PsbP as a functionalcomponent of the OEC dates from the establishment of Family A.

4.4. Evolutionary history of PsbP

Evolution of the PsbP superfamily involves not only sequencechanges but also insertion/deletion of short segments. A historyof evolution of PsbP families (Fig. 6) emerges as a result of

Page 10: Phylogenomic and structural modeling analyses of the PsbP superfamily reveal multiple small segment additions in the evolution of photosystem II-associated PsbP protein in green plants

Endosymbiosis

AncestralPsbP (H2)

AB1 B2

Red-greenbranch

H2H1 G H2D E FC CF

G r e e n l i n e a g e(Green algae + land plants)

Red lineage(Red algae +heterokonts) C

yano

bact

eria

Duplicationin eukaryoticancestor

Loop

2β4

β1

N-te

rmia

l ex

tens

ion

Exte

nded

Lo

op 1

Loop

3,4E

Loop

3,4A

,7A

Loop

3,4G

C-te

rm

Loop

6

Loop

1,5

Inte

grat

ed in

to O

EC

Fig. 6. A model of the evolution of PsbP families. The direction in time is from thebottom to the top. The Gloeobacter PsbP is not shown in this diagram. The horizontallines indicate gene duplication occurring at the events described on the left.

Two conserved Lys in the N-terminal extensionβ1

Loop7A(Basic)

Loop4A(Acidic)

Loop3(unstructured)

Top

Back

Front side

Fig. 7. Ring-shaped arrangement of Family A-specific motifs in a 3D structuremodel of Arabidopsis PsbP. The structural motifs specific to Family A are coloredover the transparent surface model of Arabidopsis PsbP (AT1G06680). The followingcolor scheme was used: red, acidic; blue, basic; green, polar; ochre, non-polarUpper panel shows a top view, and lower panel shows a side view from the backwith respect to the view in Fig. 5. Colored residues: two conserved Lys residues inthe N-terminal extension (K11, K13); beta strand 1 (N15–D24); Loop 3 (Q91–N106); Loop 4A (A136–K143); Loop 7A (G164–R173). Numbers refer to the numberin Arabidopsis PsbP, which is identical to that of spinach PsbP.

N. Sato / Molecular Phylogenetics and Evolution 56 (2010) 176–186 185

combined analysis of phylogeny and structure. Family H2 (cyano-bacterial PsbP) is considered to be the ancestor of all PsbP homo-logs because the member proteins in this family are structurallythe simplest. After endosymbiosis, repeated gene duplicationengendered families (A + B), C, D, E, F, and (G + H). Upon diversifi-cation of the red and green lineages, only three of them, C, F, andH2, were retained in the red lineage, whereas in the green lineage,further gene duplication yielded A, B1, and B2 on the one hand, andG and H1 on the other hand. In each family, specific addition ofcharacteristic short segments occurred in addition to sequencechanges. Only Family A became an integrated component of theOEC. In this diagram, the enigmatic Gloeobacter PsbP is not in-cluded because its sister relationship with other cyanobacterialPsbP was rejected (Fig. 4B). If, as undertaken by other authors(De Las Rivas and Roman, 2005; Ishihara et al., 2007), GloeobacterPsbP is placed at the root in Fig. 2 or 3, all eukaryotic PsbP homo-logs diverge successively until the appearance of other cyanobacte-rial PsbP. This will give rise to an exceptional shape of the tree for aprotein of endosymbiotic origin. Normally, eukaryotic members di-verge as a single group because endosymbiosis was a single event.The association of H2 of the red lineage with other families is not adirect result of the phylogenetic analysis in Figs. 2 and 3. However,if we assume monophyly of red H2 and cyanobacterial H2, as inphylogenetic trees, we have to assume two parallel evolutions ofH2 and other groups in the red lineage, while only a single originof all PsbP homologs is assumed for the green lineage. The positionof red H2 may be reconciled by a low evolutionary rate.

4.5. Functional implications of Family A-specific additions

In the original X-ray structure of tobacco PsbP (Ifuku et al.,2004), three regions were not resolved, namely, the N-terminal

.

extension, Loop 3, and Loop 4A. However, all these parts are spe-cific to Family A. To overcome this uncertainty, these specific re-gions were mapped on the 3D model of Arabidopsis PsbP protein(also known as OEC 23), which was prepared by homology model-ing using the tobacco PsbP as template (Fig. 7). In addition, the twoconserved Lys residues, K11 and K13, and Loop 7, which are alsospecific to Family A, were mapped in the model. Curiously, all ofthe motifs specific to Family A members were arranged in a ringshape that surrounds the molecule. This ring could be an interfacefor interaction with other components of PSII, such as PsbO andPsbQ. Notably, the acidic Loop 4A and the basic Loop 7A are alsolikely to act as electrostatic binding sites, such that Loop 7A (plusN-terminal two Lys) may be an interface with the acidic PsbO,whereas Loop 4A (plus Loop 3) may be an interface with the basicPsbQ (De Las Rivas and Roman, 2005). This suggests that theFamily A-specific addition of short segments played a decisive rolein the transition from PsbO/U/V to PsbO/P/Q complex in PSII.

Acknowledgments

This work was supported in part by a Grant-in-Aid for ScientificResearch for Priority Areas ‘‘Comparative Genomics” (No.20017006) and by the Global Center-of-Excellence Program fromthe Ministry of Education, Culture, Sports, Science and Technology(MEXT), Japan.

Appendix A. Supplementary data

Supplementary data associated with this article can be found, inthe online version, at doi:10.1016/j.ympev.2009.11.021.

Page 11: Phylogenomic and structural modeling analyses of the PsbP superfamily reveal multiple small segment additions in the evolution of photosystem II-associated PsbP protein in green plants

186 N. Sato / Molecular Phylogenetics and Evolution 56 (2010) 176–186

References

Adachi, J., Hasegawa, M., 1996. MOLPHY Version 2.3: Programs for molecularphylogenetics based on maximum likelihood. Computer Science Monographs,vol. 28. Institute of Statistical Mathematics, Tokyo, pp. 1–150.

Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.,1997. Gapped BLAST and PSI-BLAST: a new generation of protein databasesearch programs. Nucleic Acids Res. 25, 3389–3402.

Babu, M.M., Sankaran, K., 2002. DOLOP–database of bacterial lipoproteins.Bioinformatics 18, 641–643.

De Las Rivas, J., Balsera, M., Barber, J., 2004. Evolution of oxygenic photosynthesis:genome-wide analysis of the OEC extrinsic proteins. Trends Plant Sci. 9, 18–25.

De Las Rivas, J., Roman, A., 2005. Structure and evolution of the extrinsic proteinsthat stabilize the oxygen-evolving engine. Photochem. Photobiol. Sci. 4, 1003–1010.

De Las Rivas, J., Heredia, P., Roman, A., 2007. Oxygen-evolving extrinsic proteins(PsbO, P, Q, R): Bioinformatic and functional analysis. Biochim. Biophys. Acta1767, 575–582.

Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy andhigh throughput. Nucleic Acids Res. 32, 1792–1797.

Enami, I., Okumura, A., Nagao, R., Suzuki, T., Iwai, M., Shen, J.R., 2008. Structures andfunctions of the extrinsic proteins of photosystem II from different species.Photosynth. Res. 98, 349–363.

Humphrey, W., Dalke, A., Schulten, K., 1996. VMD – Visual Molecular Dynamics. J.Mol. Graphics 14, 33–38.

Ido, K., Ifuku, K., Yamamoto, Y., Ishihara, S., Murakami, A., Takabe, K., Miyake, C.,Sato, F., 2009. Knockdown of the PsbP protein does not prevent assembly of thedimeric PSII core complex but impairs accumulation of photosystem IIsupercomplexes in tobacco. Biochim. Biophys. Acta 1787, 873–881.

Ifuku, K., Nakatsu, T., Kato, H., Sato, F., 2004. Crystal structure of the PsbP protein ofphotosystem II from Nicotiana tabacum. EMBO Rep. 5, 362–367.

Ifuku, K., Yamamoto, J., Ono, T-a, Ishihara, S., Sato, F., 2005. PsbP protein, but notPsbQ protein, is essential for the regulation and stabilization of photosystem IIin higher plants. Plant Physiol. 139, 1175–1184.

Ifuku, K., Ishihara, S., Shimamoto, R., Ido, K., Sato, F., 2008. Structure, function, andevolution of the PsbP protein family in higher plants. Photosynth. Res. 98, 427–437.

Ishihara, S., Yamamoto, Y., Ifuku, K., Sato, F., 2005. Functional analysis of fourmembers of the PsbP family in Photosystem II in Nicotiana tabacum usingdifferential RNA interference. Plant Cell Physiol. 46, 1885–1893.

Ishihara, S., Takabayashi, A., Ido, K., Endo, T., Ifuku, K., Sato, F., 2007. Distinctfunctions for the two PsbP-like proteins PPL1 and PPL2 in the chloroplastthylakoid lumen of Arabidopsis. Plant Physiol. 145, 668–679.

Ishikawa, Y., Schröder, W.P., Funk, C., 2005. Functional analysis of the PsbP-likeprotein (sll1418) in Synechocystis sp. PCC 6803. Photosynth. Res. 84, 257–262.

Ishikawa, M., Fujiwara, M., Sonoike, K., Sato, N., 2009. Orthogenomics ofphotosynthetic organisms: Bioinformatic and experimental analysis ofchloroplast proteins of endosymbiotic origin in Arabidopsis and theircounterparts in Synechocystis. Plant Cell Physiol. 50, 773–788.

Jobb, G., von Haeseler, A., Strimmer, K., 2004. TREEFINDER: a powerful graphicalanalysis environment for molecular phylogenetics. BMC Evol. Biol. 4, 18.

Kraulis, P.J., 1991. MOLSCRIPT: a program to produce both detailed and schematicplots of protein structures. J. Appl. Cryst. 24, 946–950.

Leinonen, R., Nardone, F., Zhu, W., Apweiler, R., 2006. UniSave: the UniProtKBSequence/Annotation Version database. Bioinformatics 22, 1284–1285.

Martin, W., Rujan, T., Richly, E., Hansen, A., Cornelsen, S., Lins, T., Leister, D., Stoebe,B., Hasegawa, M., Penny, D., 2002. Evolutionary analysis of Arabidopsis,cyanobacterial, and chloroplast genomes reveals plastid phylogeny andthousands of cyanobacterial genes in the nucleus. Proc. Nat. Acad. Sci. USA99, 12246–12251.

Matsuzaki, M., Misumi, O., Shin-i, T., Maruyama, S., Takahara, M., Miyagishima, S.,Mori, T., Nishida, K., Yagisawa, F., Nishida, K., Yoshida, Y., Nishimura, Y., Nakao,S., Kobayashi, T., Momoyama, Y., Higashiyama, T., Minoda, A., Sano, M., Nomoto,H., Oishi, K., Hayashi, H., Ohta, F., Nishizaka, S., Haga, S., Miura, S., Morishita, T.,

Kabeya, Y., Terasawa, K., Suzuki, Y., Ishii, Y., Asakawa, S., Takano, H., Ohta, N.,Kuroiwa, H., Tanaka, K., Shimizu, N., Sugano, S., Sato, N., Nozaki, H., Ogasawara,N., Kohara, Y., Kuroiwa, T., 2004. Genome sequence of the ultrasmall unicellularred alga Cyanidioschyzon merolae 10D. Nature 428, 653–657.

Merritt, E.A., Bacon, D.J., 1997. Raster3D: Photorealistic Molecular Graphics.Methods Enzymol. 277, 505–524.

Moustafa, A., Beszteri, B., Maier, U.G., Bowler, C., Valentin, K., Bhattacharya, D., 2009.Genomic footprints of a cryptic plastid endosymbiosis in diatoms. Science 324,1651–1652.

Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C., 1995. SCOP: a structuralclassification of proteins database for the investigation of sequences andstructures. J. Mol. Biol. 247, 536–540.

Perrière, G., Gouy, M., 1996. WWW-Query: an on-line retrieval system for biologicalsequence banks. Biochimie 78, 364–369.

Robinson, C., Bolhuis, A., 2004. Tat-dependent protein targeting in prokaryotes andchloroplasts. Biochim. Biophys. Acta 1694, 135–147.

Ronquist, F., Huelsenbeck, J.P., 2003. MRBAYES 3: Bayesian phylogenetic inferenceunder mixed models. Bioinformatics 19, 1572–1574.

Roose, J.L., Wegener, K.M., Pakrasi, H.B., 2007. The extrinsic proteins of PhotosystemII. Photosynth. Res. 92, 369–387.

Šali, A., Blundell, T.L., 1993. Comparative protein modelling by satisfaction of spatialrestraints. J. Mol. Biol. 234, 779–815.

Sato, N., 2000. SISEQ: Manipulation of multiple sequence and large database files forcommon platforms. Bioinformatics 16, 180–181.

Sato, N., 2002. Comparative analysis of the genomes of cyanobacteria and plants.Genome Inform. 13, 173–182.

Sato, N., 2006. Origin and evolution of plastids: genomic view on the unification anddiversity of plastids. In: Wise, R.R., Hoober, J.K. (Eds.), The Structure andFunction of Plastids. Springer, Dordrecht, pp. 75–102.

Sato, N., 2009. Gclust: trans-kingdom classification of proteins using automaticindividual threshold setting. Bioinformatics 25, 599–605.

Sato, N., Ishikawa, M., Fujiwara, M., Sonoike, K., 2005. Mass identification ofchloroplast proteins of endosymbiont origin by phylogenetic profilingbased on organism-optimized homologous protein groups. GenomeInform. 16, 56–68.

Shimodaira, H., Hasegawa, M., 2001. CONSEL: for assessing the confidence ofphylogenetic tree selection. Bioinformatics 17, 1246–1247.

Sveshnikov, D., Funk, C., Schröder, W., 2007. The PsbP-like protein (sll1418) ofSynechocystis sp. PCC 6803 Stabilizes the donor side of Photosystem II.Photosynth. Res. 93, 101–109.

Summerfield, T.C., Winter, R.T., Eaton-Rye, J.J., 2005. Investigation of a requirementfor the PsbP-like protein in Synechocystis sp. PCC 6903. Photosynth. Res. 84,263–268.

Suorsa, M., Aro, E., 2007. Expression, assembly and auxiliary functions ofphotosystem II oxygen-evolving proteins in higher plants. Photosynth. Res.93, 89–100.

Tamura, K., Dudley, J., Nei, M., Kumar, S., 2007. MEGA4: molecular evolutionarygenetics analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24, 1596–1599.

Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving thesensitivity of progressive multiple sequence alignment through sequenceweighting, position-specific gap penalties and weight matrix choice. NucleicAcids Res. 22, 4673–4680.

Thornton, L.E., Ohkawa, H., Roose, J.L., Kashino, Y., Keren, N., Pakrasi, H.B., 2004.Homologs of plant PsbP and PsbQ proteins are necessary for regulation ofphotosystem II activity in the cyanobacterium Synechocystis 6803. Plant Cell 16,2164–2175.

Yang, Z., 2007. PAML 4: a program package for phylogenetic analysis by maximumlikelihood. Mol. Biol. Evol. 24, 1586–1591.

Yi, X., Liu, H., Hargett, S.R., Frankel, L.K., Bricker, T.M., 2007. The PsbP protein isrequired for photosystem II complex assembly/stability and photoautotrophy inArabidopsis thaliana. J. Biol. Chem. 34, 24833–24841.

Zybailov, B., Rutschow, H., Friso, G., Rudella, A., Emanuelsson, O., Sun, Q., van Wijk,K.J., 2008. Sorting signals, N-terminal modifications and abundance of thechloroplast proteome. PLoS ONE 3, e1994.