15
ORIGINAL PAPER In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species Shweta Arya & Gaurav Sharma & Preeti Gupta & Swati Tiwari Received: 14 September 2011 /Accepted: 19 December 2011 / Published online: 13 January 2012 # Springer-Verlag 2011 Abstract Covalent modification of proteins by ubiquitin (Ub) and ubiquitin-like modifiers (Ubls) regulates many cellular func- tions in eukaryotes. These modifications are likely to be associ- ated with pathogenesis, growth, and development of many protozoan parasites but molecular details about this pathway are unavailable for most protozoa. This study presents an anal- ysis of the Ub pathway in three members of the Entamoeba species. Using bioinformatics tools we have identified all Ub and Ubl genes along with their corresponding activating, conju- gating, and ligating enzymes (E1, E2s, and E3s) in three Entamoeba species, Entamoeba histolytica, Entamoeba dispar , and Entamoeba invadens. Phylogenetic trees were established for the identified E2s and RING finger E3s using maximum- likelihood method to infer the relationship among these proteins. In silico co-domain analysis of RING finger E3s implicates these proteins in a variety of functions. Several known and putative regulatory motifs were identified in the upstream regions of RING finger domain containing E3 genes. All E2 and E3 genes were analyzed in genomic context in E. histolytica and E. dispar . Most E2s and E3s were in syntenic positions in the two genomes. Association of these genes with transposable elements (TEs) was compared between E. histolytica and E. dispar . A closer association was found between RING finger E3s with TEs in E. histolytica. In summary, our analyses sug- gests that the complexity of the Ub pathway in Entamoeba species is close to that observed in higher eukaryotes. This study provides important data for further understanding the role of Ub pathway in the biology of these organisms. Abbreviations APC/C Anaphase promoting complex/cyclosome CRL Cullin-RING ligases HECT Homologous to E6-AP C-terminal RING Really interesting new gene PHD Plant homeo domain TE Transposable element UBC Ubiquitin conjugating Ubl Ubiquitin-like UPP Ubiquitinproteasome pathway Introduction Infections due to protozoan parasites cause major disease burden throughout the developing world. Efforts to develop effective vaccines against protozoan parasites have proven to be difficult due to complex life cycles, antigenic variabil- ity, and limited information about the metabolic pathways and molecular mechanisms of pathogenesis. Emergence of multi-drug-resistant strains poses another hurdle for suc- cessful treatment. The challenge is to find new ways that can be exploited, and genome sequencing of many protozo- an parasites has provided a valuable tool to identify path- ways and proteins that can be potential targets. Thus, genome wide studies of metabolic or signal transduction pathways of protozoan parasites have led to the identification of parasite-specific proteins as possible drug targets (Doerig 2004; Goodman and McFadden 2007 ; Myler 2008). Availability of genome sequences has also provided an Shweta Arya and Gaurav Sharma contributed equally to this work. Electronic supplementary material The online version of this article (doi:10.1007/s00436-011-2799-0) contains supplementary material, which is available to authorized users. S. Arya : G. Sharma : P. Gupta : S. Tiwari (*) School of Biotechnology, Jawaharlal Nehru University, New Delhi 110067, India e-mail: [email protected] Present Address: G. Sharma Institute for Microbial Technology, Chandigarh, Punjab, India Parasitol Res (2012) 111:3751 DOI 10.1007/s00436-011-2799-0

In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species

Embed Size (px)

Citation preview

Page 1: In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species

ORIGINAL PAPER

In silico analysis of ubiquitin/ubiquitin-like modifiersand their conjugating enzymes in Entamoeba species

Shweta Arya & Gaurav Sharma & Preeti Gupta &

Swati Tiwari

Received: 14 September 2011 /Accepted: 19 December 2011 /Published online: 13 January 2012# Springer-Verlag 2011

Abstract Covalent modification of proteins by ubiquitin (Ub)and ubiquitin-likemodifiers (Ubls) regulatesmany cellular func-tions in eukaryotes. These modifications are likely to be associ-ated with pathogenesis, growth, and development of manyprotozoan parasites but molecular details about this pathwayare unavailable for most protozoa. This study presents an anal-ysis of the Ub pathway in three members of the Entamoebaspecies. Using bioinformatics tools we have identified all Uband Ubl genes along with their corresponding activating, conju-gating, and ligating enzymes (E1, E2s, and E3s) in threeEntamoeba species, Entamoeba histolytica, Entamoeba dispar,and Entamoeba invadens. Phylogenetic trees were establishedfor the identified E2s and RING finger E3s using maximum-likelihoodmethod to infer the relationship among these proteins.In silico co-domain analysis of RING finger E3s implicatesthese proteins in a variety of functions. Several known andputative regulatory motifs were identified in the upstreamregions of RING finger domain containing E3 genes. All E2and E3 genes were analyzed in genomic context in E. histolyticaand E. dispar. Most E2s and E3s were in syntenic positions inthe two genomes. Association of these genes with transposableelements (TEs) was compared between E. histolytica and E.dispar. A closer association was found between RING finger

E3s with TEs in E. histolytica. In summary, our analyses sug-gests that the complexity of the Ub pathway in Entamoebaspecies is close to that observed in higher eukaryotes. This studyprovides important data for further understanding the role of Ubpathway in the biology of these organisms.

AbbreviationsAPC/C Anaphase promoting complex/cyclosomeCRL Cullin-RING ligasesHECT Homologous to E6-AP C-terminalRING Really interesting new genePHD Plant homeo domainTE Transposable elementUBC Ubiquitin conjugatingUbl Ubiquitin-likeUPP Ubiquitin–proteasome pathway

Introduction

Infections due to protozoan parasites cause major diseaseburden throughout the developing world. Efforts to developeffective vaccines against protozoan parasites have provento be difficult due to complex life cycles, antigenic variabil-ity, and limited information about the metabolic pathwaysand molecular mechanisms of pathogenesis. Emergence ofmulti-drug-resistant strains poses another hurdle for suc-cessful treatment. The challenge is to find new ways thatcan be exploited, and genome sequencing of many protozo-an parasites has provided a valuable tool to identify path-ways and proteins that can be potential targets. Thus,genome wide studies of metabolic or signal transductionpathways of protozoan parasites have led to the identificationof parasite-specific proteins as possible drug targets (Doerig2004; Goodman and McFadden 2007; Myler 2008).Availability of genome sequences has also provided an

Shweta Arya and Gaurav Sharma contributed equally to this work.

Electronic supplementary material The online version of this article(doi:10.1007/s00436-011-2799-0) contains supplementary material,which is available to authorized users.

S. Arya :G. Sharma : P. Gupta : S. Tiwari (*)School of Biotechnology, Jawaharlal Nehru University,New Delhi 110067, Indiae-mail: [email protected]

Present Address:G. SharmaInstitute for Microbial Technology,Chandigarh, Punjab, India

Parasitol Res (2012) 111:37–51DOI 10.1007/s00436-011-2799-0

Page 2: In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species

opportunity to study the basic biology of these parasites,which is required for developing effective strategies againstthem.

Ubiquitin–proteasome pathway (UPP) is a eukaryoticpathway that is involved in the post-translational modifica-tion of proteins with chains of a highly conserved proteincalled ubiquitin (Ub) followed by degradation of the ubiq-uitylated substrate proteins by the proteasomes. Other thanprotein degradation of misfolded or aged proteins, UPP isintricately linked to regulation of cell cycle, transcription,signaling, DNA repair, endocytosis, and numerous otheraspects of cell function (Ciechanover and Schwartz 2004;Hershko 2005; Raasi and Wolf 2007). Covalent addition ofUb to proteins is achieved by the action of three classes ofenzyme (see Fig. 1). Ubiquitin-activating enzyme (E1) cata-lyzes the activation of Ub by utilization of ATP. Ub issubsequently transferred to an ubiquitin-conjugating en-zyme (UBC or E2) via a trans-esterification reaction. E2sform a small family of proteins characterized by a UBCdomain of about 150 amino acids. Finally, a ubiquitin ligase(E3) directly or indirectly transfers the Ub to the substrate.There are two main classes of E3s: Homologous to E6-AP CTerminus (HECT) domain and Really Interesting New Gene(RING) finger E3s. Several variants of the RING domainhave been described that also show E3 activity (Stone et al.2005). Structurally similar domains like the U-box and PlantHomeo Domain (PHD) have also been shown to exhibit E3

activity (Coscoy et al. 2001; Hatakeyama et al. 2001). Anumber of Ub-like (Ubl) proteins and their conjugationsystems are now known that show varying degrees of sim-ilarity to Ub and to the enzymes involved in Ub conjugationprocess. All Ubls are known to have regulatory functionsand there are examples of interplay between modification ofa protein with Ub and an Ubl (Kerscher et al. 2006).

A tightly regulated program for protein expression anddegradation is likely to play a critical role for developmentalchanges in protozoan parasites. Accordingly, proteasomal inhi-bition and ubiquitin gene expression studies have shown thatthe pathway is important for proper growth and development ofprotozoa (Gonzalez et al. 1999; Lindenthal et al. 2005;Manning-Cela et al. 2006; Horrocks and Newbold 2000;LaCount et al. 2005). Molecular details of the participatingUPP proteins in regulation of important cellular functions canprovide a wealth of useful information about the biology ofthese organisms. However, these remain mostly unexplored.

Entamoeba histolytica is a protozoan parasite that causesamoebic dysentery and is transmitted through contaminatedfood and water. Due to its simple life cycle, E. histolytica canserve as a model that can be used to study the significance ofUPP pathway in protozoan parasites. In addition to E. histoly-tica, the genomes of avirulent Entamoeba dispar and a reptilianparasite Entamoeba invadens are also being sequenced, thus,allowing comparative studies within the species. In this paper,we have attempted to identify all the proteins involved in the

Fig. 1 General mechanism of protein ubiquitylation. Ubiquitin isactivated in an ATP-dependent manner by Ub-activating enzyme (E1)by formation of a high-energy thioester bond (shown with ‘∼’) involv-ing catalytic cycteine (C) with the C-terminal glycine of Ub. ActivatedUb is transferred to the catalytic cysteine (C) of Ub-conjugatingenzymes (E2s). Ub ligases (E3s) determine the substrate specificity

and allow the transfer of Ub to lysine(s) residues (K) in the substrates(S). Substrates modified with chains of four or more Ub are recognizedby the proteasome and degraded to small peptides, with the ubiquitingenerally being recycled. Ubiquitination and proteasomal degradationis often opposed by deubiquitinating enzymes, which have multiplefunctions including removing ubiquitin from substrates

38 Parasitol Res (2012) 111:37–51

Page 3: In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species

UPP pathway in the parasitic Entamoeba species using a bio-informatics approach. We have described the unique featuresand differences, if any, from similar proteins of higher eukar-yotes. Our analysis shows that Entamoeba species encode allthe UPP components and may have almost the same level ofcomplexity of Ub/Ubl modifications as shown by higher eukar-yotes. We have also carried out a comparative analysis of E.histolytica and E. dispar E2 and E3 genes in terms of theirgenomic location, association with transposable elements (TE),and potential regulatory elements in upstream regions. Ouranalysis shows a close association of clusters of transposableelements with E. histolytica RING finger E3s. The implicationof these findings is discussed in terms of speciation and gain ofpathogenicity by E. histolytica.

Materials and methods

Genomes and proteomes

The genome sequences of E. histolytica, E. invadens, and E.dispar were accessed from the database available at NCBI(http://www.ncbi.nlm.nih.gov/guide/). The proteomes of E.histolytica and E. dispar (Version 5.0) were downloadedfrom the FTP server at The Sanger Institute (http://www.sanger.ac.uk/).

Query sequences/domains used for similarity searches

Ub, Ubls, E1s, and E2s

Accession numbers of characterized proteins and domainswere obtained from UniProt database (http://www.uniprot.org/) and NCBI Conserved Domain Database (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml), respectively.Query sequences were all human proteins except wheremanually reviewed human sequence was not found inUniProt database. In such cases protein sequences ofSaccharomyces cerevisiae were used. Signature motif ofthe UBC domain (SM00212) for E2 searches was takenfrom SMART database (http://smart.embl-heidelberg.de/)and is given in Online Resource 1. All query sequencesused for database searches are given in Online Resource 2.

Ub ligases (E3s)

Multiple sequence alignment of HECT domain in SMARTwasused to create four motifs encompassing the E2 binding domainand the catalytic domain and used for Fuzzpro search.Canonical RINGmotif (SM00184) as well as the RING variantmotif (SM00744) was used to pull out the putative RINGdomain proteins In addition, motifs for several other RINGvariants not present in the protein domain databases but

experimentally shown to have E3 activity in vitro (Stone et al.2005) were also included. SMARTcanonical motif (SM00249)for PHD domain was used for Fuzzpro searches. For U-box,motifs were generated to cover the entire U-box motif frommultiple sequence alignment of U-box proteins in SMARTdatabase. All motifs and domains used for similarity searchesare given in Online Resource 1 and 3.

Similarity searches

BLAST and TBLASTN searches against the E. histolyticaproteome and genome were carried out with an e-value cut-off of 0.1 for Ub, Ubls, and E1s. Proteins showing at least 75%query coverage and cut-off values less than e−05 were used asquery for reverse BLAST or PSI-BLAST against non-redundant database.

To search for E2s for Ub/Ubl conjugation, E. histolyticaproteome was searched for proteins containing UBC motifusing ‘Fuzzpro’ from EMBOSS suite (Rice et al. 2000)allowing up to one mismatch. Output from Fuzzpro waschecked for the presence of UBC domain by HMM analysisand manually to confirm the presence of conserved residuesrequired for catalysis. These proteins were used as query forTBLASTN search against the E. histolytica genome to iden-tify variants of E2s that may have been missed out byFuzzpro due to sequence divergence. PSI-BLAST and mul-tiple sequence alignments were used to assign E2 families.

HECT domain proteins were searched using Fuzzpro withall the four motifs mentioned above. Proteins that showed thepresence of both E2-binding domain and the catalytic domainwere selected. These proteins were checked for a phenylalanineresidue at −4 position from the C-end of the domain (Salvat etal. 2004) as well as for the presence of the conserved catalyticcysteine residue to qualify as HECT domain containing protein.

RING finger and PHD motifs were searched usingFuzzpro software allowing for up to two mismatches. Theoutput was subjected to increasing levels of stringencyaccording to (Kosarev et al. 2002). For PHD domain pro-teins, the output was analyzed for the presence of an aro-matic residue at −2 position of metal ligand 7 that is adistinguishing feature of PHD domain.

All hits obtained from TBLASTN searches were checkedfor the presence of a complete open reading frame. All pro-teins were subjected to SMART, CDD, and HMM analysis forthe presence of the domains. The HMM analysis of the wholeprotein was done using the online server at The Institute ofGenome Research (http://blast.jcvi.org/web-hmm) with an e-value cut-off from 0.1 to 1.0 for the predicted domain(s).Domains showing an E-value of less than 0.5 were consideredpositive. In most cases, however, the values were much lowerthan 0.1. Instances where e-values were more than 0.5 or thepresence of the domain was not immediately apparent usingthe various domain search tools, sequences were checked

Parasitol Res (2012) 111:37–51 39

Page 4: In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species

manually and analyzed using JPred software (http://www.compbio.dundee.ac.uk/www-jpred/) to check whether theycan form the secondary structure predicted for each domain.Multiple sequence alignment was done using ClustalW pro-gram (http://www.ebi.ac.uk/Tools/clustalw2/). All alignmentsare shown in Online Resource 4.

UPP components in E. invadens and E. dispar

All proteins identified as components of UPP in E. histolyticawere used as query to search E. dispar proteome by BLASTand E. invadens genome by TBLASTN to determine theconservation of each protein across the three species. In sev-eral instances where the proteome of E. dispar did not giveany result, TBLASTNwas used to pull out the gene sequence.

Syntenic location of each gene was determined in 5-Kbregion surrounding the gene of interest by using the compar-ative gradient display at Pathema (http://pathema.jcvi.org/).The locations of LINE and SINE elements in both the speciesin the vicinity of E2 and E3 genes were also mapped. Thecomparative map of these elements was kindly made availableto us by Dr. Sudha Bhattacharya. The distance relative to thetranslational start site and the types of elements were scoredfor each E2 and E3 genes in E. histolytica and E. dispar.

Analysis of putative regulatory sequences of RING fingerE3 genes

Sequences 500 bases upstream and 20 bases downstream ofthe translational start sites were extracted from NCBI andanalyzed using MEME software (Bailey et al. 2006). Theprogram was set to detect 30 motifs with a width between 6and 16 bp, with each motif occurring zero or one time ineach sequence. A custom background Markov model filewas created using the mono and dinucleotide frequencies ofE. histolytica (Ali et al. 2007) as the Markov chain createdby MEME is sensitive to nucleotide frequencies and theseare highly skewed in case of E. histolytica. Upstreamsequences of 57 RING E3 genes of E. histolytica and 51genes of E. dispar were analyzed. The genes that were notincluded in the analysis either had a TE within 1 Kb of thetranslational start or were at the end of the scaffold so thatupstream regions could not be obtained. The output fromMEME was subjected to TOMTOM and MAST searches.

Results

All Ubls, with the exception of metazoa-specific Ubls,are encoded by Entamoeba species

All three members of Entamoeba species (i.e., E. histolytica,E. dispar, and E. invadens) analyzed encoded Ub and most

of the Ubls except those specific to metazoans. Table 1shows a comparison of all the Ubls from E. histolytica withapicomplexans, yeast, and human Ubls. Details of Ubls andcorresponding E1s in E. histolytica, E. dispar, and E. inva-dens are given in Online Resource 5. Ub is expressed inthree forms in most organisms: monomer, polyUb, or asfusion protein with ribosomal proteins. We did not findany polyUb and Ub fusion with ribosomal proteins in amoe-ba. This is unlike the situation in most of the higher eukar-yotes and apicomplexans where both polyUb genes and Ubfusion with ribosomal proteins has been reported (Ponts etal. 2008). Similar to polyUb genes in other organisms allEntamoeba Ub proteins end with a tyrosine residue after thediglycine motif. The extra residue is removed by a C-terminal hydrolase in other eukaryotes (Jentsch et al. 1991)and is likely to be the case in Entamoeba as well. Theproteome of E. dispar did not show any Ub and Hub1proteins. However, searching the E. dispar genome usingTBLASTN showed the presence of genes encoding both theproteins (Table 1 and Online Resource 5).

Four ubiquitin domain proteins (UDPs) were also encodedin all the three members (Online Resource 5). Most, but notall, UDPs in higher eukaryotes function in bringing the ubiq-uitylated substrate proteins to the proteasome (Madsen et al.2007). In general, UDPs in higher eukaryotes also containsome other domain, like Ub-associated (UBA) domain orRING domain along with the Ubl domain. Only one UDPprotein in Entamoeba (XP_653492) showed the presence of aC-terminal UBA domain and the rest did not contain any otherknown domain or motif.

An earlier study had reported SUMO, Hub1, and Urm1 inamoeba but had not found Nedd8 and Atg8 (Furukawa et al.2000). However, we could unambiguously identify bothNedd8 and Atg8 based on e-values in reciprocal best hit aswell as CLUSTAL analysis (Online Resource 4 and 5).Whereas ATG8 is predicted in the proteome of E. invadens,it is absent from the predicted proteomes of E. histolyticaand E. dispar available in the NCBI database. Nedd8 is notpredicted for any Entamoeba species. All Ubls were highlyconserved in the avirulent amoeba E. dispar and reptilianamoeba E. invadens.

Ubl activating enzymes (E1s) for all the Ubls are alsoencoded and expressed in amoeba and all were highlyconserved in both E. dispar and E. invadens (OnlineResource 5). An E1 enzyme for Hub1 has not been reportedso far. All E1 genes are present in single copy. Two impor-tant motifs, GXGXXGCE, the consensus sequence for thenucleotide binding site and PZCTXXXXP, where C is theactive site cysteine and Z is a hydrophobic residue, arepresent in all known E1s and both motifs were conservedin E1 proteins of Entamoeba.

The E1s for Nedd8 and SUMO are heterodimers. We foundproteins showing similarity to both subunits of the E1 for

40 Parasitol Res (2012) 111:37–51

Page 5: In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species

Tab

le1

UbandUbl

inEntam

oeba

andotherorganism

s

Ubl

Identitywith

Ub(%

)Entam

oeba

spp.

Apicomplexansa

S.cerevisiae

H.sapiensa

Fun

ction

Ubiqu

itin(U

b)10

0+(Polyu

biqu

itingenes,

Ub-fusion

genesabsent)

+(Polyu

biqu

itin

andUb-fusion

genes

presentin

sing

lecopy

)

+(Polyu

biqu

itin

andUb-fusion

genes

presentin

4copies)

+(Polyu

biqu

itinand

Ub-fusion

genes

presentin

25copies)

Num

erou

s

Rub

1/NEDD8

55+

++

+Regulationof

transcription,

degradation,

Ublig

ases

(Parry

andEstelle

2004

)

Smt3/SUMO1–

318

+(O

nlySUMO1)

+(O

nlySUMO1)

+(Smt3

canform

chains)

+(Several

isoforms)

Regulationof

transcription,

nuclear

localization,

DNA

repair(Zhao20

07)

Atg8

ND

++

++

Autop

hagy

(GengandKlio

nsky

2008

)

Atg12

ND

−−

++

Autop

hagy

(GengandKlio

nsky

2008

)

HUB1/Ubl5

++

++

Splicingof

pre-mRNA

(Wilk

insonet

al.20

04)

Urm

1ND

++

++

tRNA

mod

ification,

conjug

ationto

proteins,sign

ificance

ofprotein

mod

ificationun

know

n(Schliekeret

al.20

08)

ISG-15

32/37

−−

−+

Respo

nseto

IFNα/β

(RitchieandZhang

2004

)

FUB1/MNSFβ

38−

−−

+Activationof

Tcells

(Suzuk

iet

al.19

96)

FAT10

32/40

−−

−+

Proteasom

aldegradation,

apop

tosis

(Rassiet

al.20

01)

Ufm

1ND

−−

−+

Unk

nown

+representthepresence

ofaprotein,

−representtheabsenceof

aprotein,

ND

notdefined

aBased

on(Pon

tset

al.20

08)

Parasitol Res (2012) 111:37–51 41

Page 6: In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species

Nedd8, APPBP1, and Uba3 in E. histolytica (XP_650317 andXP_655964, respectively) but only one of the subunits, Uba2,for SUMO E1 was found (XP_656129). The other subunit forSUMOE1, Aos1, could not be assigned confidently. Both Aos1and APPBP1 could not be confidently predicted in apicom-plexan parasites as well (Ponts et al. 2008); however, it wassuggested that one of the unassigned UBA domain proteinscould serve the function of these proteins and this possibilitymay also exist in Entamoeba.

The E1 equivalent for Urm1, i.e., Uba4 like protein inamoeba (XP_656052), is a smaller protein compared toyeast Uba4 (Fig. 2). As opposed to Uba4 of yeast and othereukaryotes that contain a ThiF domain and a Rhodanesehomology domain (RHD), amoeba Uba4 lacks RHD, and aconserved cysteine (C225) in the ThiF domain. It is possiblethat the RHD domain function may be provided in trans asproteins containing RHD domain are encoded in amoebagenome. C225 in the ThiF domain is essential for Urm1conjugation to proteins and thiolation of tRNAs in vivo(Hochstrasser 2009; Noma et al. 2009). All the three amoe-ba genomes that have been sequenced show an alaninesubstitution at this position (Online Resource 3) althoughclosely related organism, Dictyostelium discoideum, showeda cysteine at this position. The residues around C225 showhigher similarity to Uba4-like proteins from archaea where afew members have an alanine substitution at this position.C225 has been suggested to free the cysteine in the RHDdomain to participate in another round of attack on Urm1-adenylate (Michelle et al. 2009). It is possible that theprocessivity of Urm1 modification is not very high in thisorganism due to lack of C225 but the significance of this isnot understood at present as little information is availableabout Urm1 modification of proteins.

Some classical E2 families are absent in Entamoeba

We used the UBC motif given in the SMART database tosearch for E2s. The motif is shown in Online Resource 1.We found 22 genes encoding Ub-conjugating enzymes andrelated proteins in E. histolytica and all were highly con-served in E. dispar and E. invadens (Online Resource 6).E2s for conjugating Nedd8 (UBC12, family 8) and SUMO(UBC9, family 7) were also found. We have classified theseE2s according to the families that have been described

earlier (Michelle et al. 2009). Phylogenetic tree of all E2sin the three species was created using PHYML (OnlineResource 7) and was found to be consistent with the evolu-tionary relationship shown in an earlier study (Burroughs etal. 2008). We included three E2s of a nucleomorph,Guillardia theta, while creating the phylogenetic tree. Oneof the G. theta E2s (XP_001713335) is an ortholog ofhuman UBE2D4 family belonging to family 4 accordingto the classification we have adopted. The other two E2s ofG. theta (XP_001713395 and XP_001713233) are orthologsof human UBE2U and UBE2S (Families 4 and 11), respec-tively (Ying et al. 2009). Family 4 members fromEntamoeba fall in the same clade as G. theta ortholog ofhuman UBE2D4 (Online Resource 7) whereas the family 11E2 of G. theta was on a separate clade as we could notassign any E2 from Entamoeba to family 11. This supportsour classification of E2s from Entamoeba. Specific featuresof each family (Michelle et al. 2009) were conserved (datanot shown). Presence of all families from 1 to 10 has beenshown in seven species including Caenorhabditis elegance,S. cerevisiae, and Drosophila melanogaster but Families 11and 12 are absent in C. elegance (Michelle et al. 2009). Wedid not find any member of families 1, 6, 11, and 12 inEntamoeba. These families were also absent in E. disparand E. invadens. Phylogenetic analysis suggests that E2variants, APG3 and APG10 branched off earlier from thelast common eukaryotic ancestor than families 1, 5, UEV,and RWD families of E2s (Burroughs et al. 2008).Therefore, our finding that APG3, APG10, UEV, andRWD families are present but some classical families ofE2s are missing in Entamoeba is interesting and noteworthy.

Family 6 members function in histone modification inhumans [27] and catabolite repression in yeast (Schüle et al.2000). Members of families 1, 11, and 12 are known tofunction with the Anaphase Promoting Complex/cyclosome(APC/C) in higher eukaryotes (Jin et al. 2008; Rodrigo-Brenni and Morgan 2007). Apicomplexan parasites havebeen suggested to have a modified APC/C complex basedon bioinformatics analysis of APC/C complex subunitspresent (Ponts et al. 2008). It is likely that Entamoeba mayalso have a modified APC/C complex and, therefore, doesnot encode E2s that are required to give additional complex-ity to APC/C function.

Presence of both Ubc13 and the E2 variant Mms2-likeproteins suggests that Lys63 (K63)-linked Ub modifica-tion that is important for DNA repair, cell cycle check-points, and apoptosis, is present in amoeba. In addition,the more divergent members of E2 family Atg3 andAtg10, which function in autophagy, were present. BothAtg3 and Atg10 work as E2 enzymes for the conjugationof Atg8 and Atg12, respectively. The significance ofAtg10 is not known at present since Atg12-like proteinwas not found in amoeba.

Fig. 2 Uba4-like protein from Entamoeba. A schematic diagram ofUba4 from S. cerevisiae (ScUba4) and E. histolytica (EhUba4)

42 Parasitol Res (2012) 111:37–51

Page 7: In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species

Ub ligases (E3s)

E3s are the major substrate determining factors in thepathway leading to ubiquitylation of proteins. We usedthe motifs shown in Online Resource 1 to search for differ-ent classes of E3s in amoeba. All classes of Ub ligases(HECT, RING finger, PHD, and U-box) were found inamoeba genome. The present annotation of these proteinsin the NCBI database and percent similarity of E. dispar andE. invadens proteins to E. histolytica protein is shown inOnline Resource 8.

HECT domain E3s

We identified six proteins containing the HECT domain inEntamoeba. All of these HECT domain proteins were high-ly conserved with 88–90% identity with corresponding pro-teins in E. dispar and more than 60% similarity in E.invadens (Online Resource 8). The number of HECT do-main proteins is similar to that found in apicomplexans andyeast. Two proteins showed 42–57% similarity to a smallregion in the N-terminal of a known human E3, E6-AP.Other than this, no overall similarity was found with anyother known HECT domain E3. One of these proteinsshowed calmodulin-binding motif in E. histolytica and E.dispar suggesting a direct link between Ca2+-calmodulin-mediated functions and ubiquitin pathway in these organ-isms. Rest of the HECT domain proteins did not show anyother co-occurring domain.

U-box E3s

U-box is structurally related to RING finger and proteinswith this domain can function as E3 ligases. One protein(XP_655141) with U-box was identified (Online Resource8). TBLASTN with this protein sequence did not show anyother hit in the amoeba genome. Alignment of this proteinwith known U-box proteins showed a good conservation ofall the features (Online Resource 4). The U-box protein inamoeba did not show the presence of any other co-occurringdomains therefore its function could not be predicted.

PHD E3s

PHD domain is structurally and functionally similar to RINGfinger domain and proteins containing this domain have beenshown to function as E3s (Capili et al. 2001; Coscoy et al.2001). We searched for PHD domain using Fuzzpro and goteight hits. HMM analysis, JPred, and multiple alignmentswere used as stringency criteria to strengthen the predictionsfrom Fuzzpro. While multiple alignment of domains pre-dicted by FuzzPro gave very good agreement with PHDdomain (not shown), only two proteins were positive for the

domain by HMM analysis and four showed the desiredsecondary structure. Thus, two proteins can be confidentlypredicted to have PHD domain and the possibility of the otherfour also having a PHD domain cannot be ruled out in theabsence of experimental data. These four proteins have thediacylglycerol (DAG) catalytic and accessory domains aswell as C1-like domain (data not shown). This domain archi-tecture with PHD has not been reported in any other organismso far and needs experimental validation. If validated, itwould link signaling due to DAG processing to protein turn-over or regulation in this organism.

RING finger E3s

This is the largest class of E3s in all eukaryotic organismsstudied so far. We combined multiple approaches to identifyRING finger proteins in order to get a comprehensive list.All reported variants of RING finger domain collected fromextensive literature survey searches (Online Resource 1)along with classical RING motifs, C3HC4 and C3H2C3(SM00184) were used for database searches. Outputobtained from the Fuzzpro analysis of the proteome withthese motifs was subjected to stringency criteria (OnlineResource 1) proposed earlier (Kosarev et al. 2002) andHMM analysis. Some proteins had a RING domain predic-tion at non-overlapping positions using the classical motifand by HMM analysis. For these proteins, both predicteddomains were analyzed for their ability to form typicalRING secondary structure with two β-sheets and an ∝-helix(Aravind and Koonin 2000). Based on these criteria all theseproteins were classified in various groups (Online Resource8). All these proteins were used for TBLASTN search of theamoeba genome to rule out any similar protein inadvertentlyleft out. No additional proteins were found in this analysis.

Proteins that fulfilled the stringent 2 criteria as well asHMM analysis were considered to have a very high proba-bility of forming a RING structure in vivo and were put ingroup 1. Proteins that did not fulfill the stringent 2 criteriabut were detected by Fuzzpro and were positive for RINGdomain in HMM analysis at similar position have beenplaced in group 2. In some proteins, position of RINGdomain predicted by classical motif and HMM analysiswas at non-overlapping position and domain predicted byclassical motif was predicted not to form the required sec-ondary structure but the domain predicted by HMM could.Therefore, these were included as RING domain proteinsand put in group 3.

Proteins that showed a variant motif were analyzed byHMM and JPred. Six proteins showed good expect values inHMM analysis and a ββα fold, and were placed in group 4.All proteins, whether detected by classical motif that did notfulfill the stringency 2 criteria or RING variants that failedHMM analysis were checked manually as well as by JPred

Parasitol Res (2012) 111:37–51 43

Page 8: In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species

software to predict the secondary structure. Seven proteinsshowed aββα fold and also showed conservation of the metalligands. It is a possibility that they may be able to form RINGstructure. These proteins have been put in group 5. From this,we conclude that there are 56 proteins in E. histolytica thathave a high probability of forming a RING finger structure invivo. Seven more proteins also have the possibility of having aRING finger structure. Therefore, a maximum of 63 RINGfinger proteins are encoded in this organism.

The present annotation of most of these proteins does notidentify them as RING finger proteins. Most of these pro-teins are annotated as hypothetical proteins or Zn-fingerproteins in the NCBI database. With the exception of afew, all of them showed good protein sequence conservation(generally more than 90%) in E. dispar but the level ofconservation in E. invadens was about 75%.

In order to define related groups of RING proteins, wefirst attempted to do a multiple alignment of all the RINGfinger proteins in E. histolytica. However, partly due tostrong sequence deviation outside the conserved domain,we could not generate a meaningful alignment. Therefore,isolated RING domains were used to create a multiplealignment using MUSCLE that showed good conservationof metal ligands and other known features of RING fingerproteins (Online Resource 9). A phylogenetic analysis indi-cated that most of the variant E3s were grouped togetherwhereas most of the RING finger proteins that we haveplaced in groups 1 and 2 fall together in larger clades withsignificant bootstrap values (Online Resource 10).Moreover, except for two proteins from group 5 that werein a separate clade together, most of the group 5 proteinscame together with groups 1 and 2 proteins suggesting thatthese are RING finger proteins. The loop sizes between themetal ligands were also analyzed (data not shown). Majorityof the proteins showed good correspondence with the con-sensus spacing but there were a number of proteins thatshowed a considerable variation in the spacing betweenvarious metal ligands. Similar variation has been shown inArabidopsis thaliana and these variants have been shown tohave E3 activity in vitro (Kosarev et al. 2002). One protein(XP_650859) showed an unusually large loop (69 aminoacids) between the second and the third metal ligand. Such alarge loop has not been identified in any RING fingerprotein so far and it would be interesting to see whether itfunctions as an E3.

Few Entamoeba RING finger proteins show similarityto known E3s

Of the 63 RING finger proteins only a few could be func-tionally annotated based on similarity with known E3s fromother organisms. Two RING finger proteins (XP_651657and XP_657418) were highly conserved and showed 60–

80% similarity to human APC11 and Rbx1. APC11 andRbx1 are the E3 subunits of the APC/C and SCF (Skp1,Cullin1, and F-box) complex, respectively. Both of thesecomplexes are important for the regulation of cell cycle(Skaar and Pagano 2009). One protein (XP_652269)showed similarity to human GP78/AMFR that is knownto be a RING E3 involved in endoplasmic reticulum-associated degradation (ERAD) pathway of quality con-trol of membrane proteins (Kostova et al. 2007). RINGfinger proteins showing similarity to Ubr1 (XP_650859)and Rad18 (XP_652594) support E2 data that N-endrule pathway and post-replicative DNA repair mecha-nisms are present in this organism. Other than partici-pating in the N-end rule pathway, Ubr1 is also involvedin quality control of cytosolic proteins in yeast (Heck etal. 2010). Based on the conservation of these E3s it canbe suggested that other than cell cycle, ERAD, N-endrule, and post-replicative DNA repair related functionsof UPP are crucial for the organism.

Domain architecture of RING finger E3s in Entamoeba

RING finger containing proteins were analyzed for the pres-ence of other domains for functional annotation (Fig. 3).Many proteins did not have any co-occurring domain otherthan the RING domain. Out of 63 RING finger proteins only25 proteins had another domain(s) present. The domain archi-tecture was preserved across the three species of Entamoebathat we analyzed. In-Between-RING (IBR) domain was themost frequently occurring domain. There are two RING-between RING (RBR) proteins in S. cerevisiae and six in D.melanogaster. We used the term ‘RBR’ to search the proteomeof D. discoideum that is closely related to Entamoeba, andfound only one RBR protein. Therefore, presence of 12 RBRproteins in Entamoeba is interesting and suggestive of impor-tant functions in Entamoeba. All IBR-domain containingproteins, except one, showed the IBR flanked by two RINGfinger domains (Online Resource 11). The presence of suchRING-between RING domain proteins belonging to theAriadne family of RBR proteins in E. histolytica has beenreported earlier (Eisenhaber et al. 2007). However, our anal-ysis of these proteins further shows that several of theseproteins are associated with other domains, notably UBA,PARP, and RWD domains. RWD is a protein interactiondomain that is present in two human E3s where this domainis necessary for their ligase function (Alpi et al. 2008; Carbia-Nagashima et al. 2007) and UBA domain is involved inbinding to Ub. PARP is known to occur with domains asso-ciated with ubiquitylation pathway, e.g., U-box, UIM, andUBC domains but its co-occurrence with RING has not beenreported before. In mammals, PARP domain proteins areknown to function in DNA repair, epigenetic regulation, andin maintaining genomic integrity. However, other functions

44 Parasitol Res (2012) 111:37–51

Page 9: In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species

are likely to be performed by PARP domain containing pro-teins (Citarelli et al. 2010).

Three Ub binding domains that are associated with proteinsof the Ub pathway, UBA, UBR, and Cue domain, were alsopresent in a number of predicted RING proteins in amoeba. Allthese domains have low affinity for Ub but several mechanismsallow these domains to exhibit appreciable affinity to ubiquity-lated substrates thereby mediating a number of functions apartfrom bringing ubiquitylated substrates to 26S proteasome. UBRdomain is found in those E3s that ubiquitylate substrates inaccordance to N-end rule (Varshavsky 1997). Since we foundE2s showing similarity to Rad6 family that participate in N-endrule pathway, this pathway could be functional in amoeba. ACue domain containing membrane protein (Cue1p) was firstshown to recruit an E2, UBC7, to ER in yeast (Biederer et al.

1997). Subsequently, this domain was shown to be present in anE3, GP78/AMFR, involved in ERAD and also other proteinsnot functioning in ERAD. Cue domain, however, does not binddirectly to UBC7 but its requirement for the function of GP78/AMFR appears to be due to its interaction with Ub (Kostova etal. 2007). As the Cue1 protein in yeast does not show anybinding to Ub, it is possible that Cue domain in GP78/AMFRmay function to bring ubiquitylated proteins to GP78/AMFR(Kostova et al. 2007). Cue domain shares structural homology toUBA domain and both these domains promote the ubiquityla-tion of proteins containing them.

Some RING proteins also had B-box Zn-finger and A2LZn-finger. Although the B-box Zn-finger domain is knownto co-occur with RING, A2L Zn-finger is not reported to co-occur with RING. A2L Zn-finger is found in some viral

Fig. 3 Domain architecture ofRING finger proteins of E.histolytica. Proteins are drawnto scale. The scale bar is shownon the top of the figure

Parasitol Res (2012) 111:37–51 45

Page 10: In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species

transcription factors and all proteins containing this domain,whose functions are known, are associated with TFIIHcomplex involved in transcription initiation and DNA exci-sion repair. We also found other domains that are notreported to occur with RING finger, e.g., NPL4, SPRY,and VSP domains. NPL4 is required for degradation ofproteins by the ERAD pathway and forms a complex withp97 (also known as Cdc48 or Velsolin containing Protein,VCP) and Ufd1 (Bays and Hampton 2002; Ye et al. 2001).Since Cdc48 and Ufd1 are encoded in amoeba genome, themachinery required for extraction of ERAD substrates isconserved in amoeba. SPRY domains are ancient B30.2domains that have become distinct from SPRY domains inprimary structure and predicted functions (Rhodes et al.2005). These domains are found with RING domain in someE3s. SPRY domain is found in diverse proteins with myriadfunctions. VSP domain is cysteine rich and its function isnot known. Based on this analysis, it appears that Ub-mediated pathways regulate a wide range of functions inEntamoeba.

Expression data for the genes encoding all the proteinsdescribed in this paper was found in the GEO datasets.Therefore, all the genes identified as UPP components byus and annotated as hypothetical in the database are tran-scribed. As expected, we found a number of E2s and E3s tobe differentially regulated in a number of conditions. Asexpected, a large number of UPP genes were differentiallyexpressed in the two developmental stages (GEO datasetGSE6648). The stage-specific expression has been de-scribed (Ehrenkaufer et al. 2007) based on comparison ofgene expression between a recent clinical isolate and iso-lates that have been in axenic culture for a long time wherethe recent clinical isolate is taken as representing cyst and inbetween cyst and trophozoite forms. [For convenience ofexpression, it was called ‘cyst-specific’ expression by theauthors and we are using this expression in that sense onlyin this paper]. This dataset showed differential expression ofmany UPP genes including E2s as well as E3s belonging tothe HECT, PHD, and RING families. Two HECT domainE3s showed high expression in the recent clinical isolate(30.7- and 8.8-fold, respectively). Interestingly, Nedd8, anUbl, was significantly upregulated (3.68-fold) in cysts.Nedd8 modification of cullin proteins in higher eukaryotesresults in activation of cullin-RING ligases (CRL). There area number of CRLs in higher eukaryotes that regulate cellcycle, signal transduction, transcription, and development.Identification of these CRLs and their substrates would helpin understanding the molecular details of stage conversionin amoeba.

Surprisingly, out of 13 developmentally regulated RINGfinger proteins that were differentially expressed in thisdataset, five were RBR family proteins. Since there are only12 RBR proteins in this organism, it represents a substantial

number of RBR proteins that are developmentally regulated.Several of these contained UBA domain and maybe inter-acting with a number of ubiquitylated proteins. A putativeAriadne-like protein has been shown to be involved in thedevelopment of D. discoideum (Whitney et al. 2006).Ariadne family of RBR proteins is an ancient family andextreme conservation of these families suggests importanthousekeeping functions. It is, therefore, likely that some ofthese differentially expressed UPP genes may function indevelopment of the parasite.

Synteny and distribution of transposable elementsin the vicinity of E2s and E3s in E. histolytica and E. dispar

The genomes of any species of Entamoeba have not beenassembled, but based on protein orthologs, synteny at thecontig levels is publicly available at JCVI. We have usedthese syntenic maps to further compare the gene organiza-tion of E2 and E3 genes together with the presence orabsence of transposable elements in their vicinity (5-kbregion on either side) as this aspect is not present in theJCVI synteny map. All of the E2, HECT domain, PHD, andU-box domain E3 genes were present in the same synteniclocations in both species. Six RING finger E3s could not beanalyzed as the scaffold they resided on had only one or twogenes. Of the remaining RING finger proteins, 50 were inthe same syntenic position in the two genomes. Seven RINGE3s showed synteny on one side but on the other sideclusters of TEs were present mostly in E. histolytica andtherefore the synteny was broken. The analysis has beencarried out for autonomous LINE and non-autonomousSINE retrotransposons only and does not include otherTEs. Of the syntenic RING E3s, 46% had no TE in 5-kbregion surrounding them, 33% had a TE in E. histolytica butnot in E. dispar and 10% had a TE in E. dispar but not in E.histolytica (Fig. 4a). About 12% E3s had TEs in both thespecies but not in the identical syntenic position (one suchexample is given in Online Resource 12). In case of E2s,39% had no TEs whereas 17% had TEs in either E. histo-lytica or E. dispar and 26% had an element in both species(Fig. 4b).

We next studied the distribution of different LINE andSINE elements in the vicinity of E2 and RING E3 genes.Twenty percent RING E3s in E. histolytica and 9% in E.dispar had a TE within 1 Kb. E2s, on the other hand showeda TE at more than 5 Kb except for two E2s where anelement was within 1 Kb (data not shown). Whereas E2shad a very simple distribution with a majority of themshowing only one element and a few showing LINE1/SINE 1 and LINE1/LINE2 arrangement, RING E3sexhibited clusters of various elements that were expectedlymore complex in E. histolytica compared to E. dispar(Fig. 4c and Online Resource 12).

46 Parasitol Res (2012) 111:37–51

Page 11: In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species

It has been reported that repetitive elements are not evenlydistributed in Entamoeba species and these elements have atendency to form clusters (Lorenzi et al. 2008). Distribution ofSINE elements at syntenic locations between E. histolytica andE. dispar has shown that only 20% syntenic sites had a SINEelement in both species (Kumari et al. 2011). In view of thesereports, it is interesting that TEs are present in close vicinity ofRING E3s compared to E2s and the number of RING E3 genesin E. histolytica having TEs in the vicinity is appreciably highercompared to the same syntenic loci in E. dispar.

Analysis of putative regulatory elements of RING E3 genesin E. histolytica and E. dispar

The intergenic region in amoeba is very short with a mediansize of 326 bp and 5′ UTR of about 21 bp. Three conservedregions in the core promoter of 37 genes of E. histolyticahave been shown (Purdy et al. 1996). Each of these elementshas significant divergence from core promoter elements ofhigher eukaryotes and also of other protozoa. We usedMEME (Bailey et al. 2006) software to analyze 500-bpupstream sequences of RING E3 genes as this was thelargest data group we had. We did not identify a consensusTATA box in any gene. A variant Inr (AAAAGAAGA+1) asdescribed for EhPgp1 gene (Gómez et al. 1998) and GAACbox were found in many genes in both the species. Absenceof TATA box and Inr element has been reported for EhPgp1

and EhrabB gene promoters (Gómez et al. 1998; Romero-Díaz et al. 2010). Another known motif that was identifiedin a number of genes was the C/EBP element (Gómez et al.1998). Other than these previously described motifs in char-acterized promoters of various genes in Entamoeba, someother motifs with significant e-values were obtained (OnlineResource 13) in many genes. The significance of thesemotifs in the regulation of gene expression needs to beexplored. To check whether any of the motifs are relatedto known promoter elements, we used TOMTOM tool of theMEME suite to search JASPAR, TRANSFAC, andUNIPROBE motif databases. A number of motifs gave hitswith significant P values to known promoter elements(Online Resource 13 and 14). We also looked at upstreamsequences in other organisms by MAST search of S. cerevi-siae upstream sequences. Several motifs (M3, M5, M11,M14, and M19) were found to be associated with upstreamregions of genes that included ubiquitin pathway proteins orsubstrates of the pathway although with low e-values thatcould be because the codon frequencies for yeast is verydifferent from that of Entamoeba (Online Resource 13).

Discussion

We have carried out a bioinformatics analysis to identifyvarious components of the UPP pathway in the protozoan

Fig. 4 Distribution of TE associated with RING E3s and E2s of E. histolytica (Eh) and E. dispar (Ed) that are in the same syntenic locations in thetwo genomes. a, b Percentage of E3s and E2s with or without TEs. c TE clusters associated with RING E3s and E2 genes

Parasitol Res (2012) 111:37–51 47

Page 12: In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species

parasite Entamoeba species. Since a detailed analysis of theubiquitin C-terminal hydrolases from various protozoa hasbeen reported earlier (Ponder and Bogyo 2007) we haveexcluded these proteins from our analysis. Based on thedifferences in Ub sequence of Entamoeba from all othercharacterized Ubs, it has been suggested earlier that the Ubsystem of Entamoeba may have stalled and one or morefunctions of the Ub system may have developed later in theevolutionary process (Wöstmann et al. 1996; Wöstmann etal. 1992). The present genome scale study suggests that thecomplexity that is observed in higher eukaryotes due tointer-connections between various Ub–Ubls pathways maybe present in this early eukaryote as well. Nedd8 modifica-tion of cullins stimulates the ubiquitylation of the substratesof the CRLs (Merlet et al. 2009). Similarly, many proteinsare now known to be both SUMOylated and ubiquitylated,and this convergence of the two post-translational pathwayscan result in synergistic or antagonistic outcome (Denuc andMarfany 2010). Since both these pathways exist inEntamoeba, a complex network of Ub and Ubls modifica-tions and the resulting crosstalk cannot be ruled out in thisorganism. Based on the type of diverse Ub-related domains,E2s and E3 that were found, it appears that Entamoeba iscapable of making diverse kinds of Ub chains, may have afunctional N-end rule and ERAD pathways and is possiblyinterlinked to most of the cellular functions as is the case inhigher eukaryotes.

Entamoeba encodes most of the E2s belonging to variousfamilies (Online Resource 6). However, family 1, 6, 11, and12 were not found in Entamoeba. Family 1 member, Ubc1,has been shown to promote Lys 48 (K48)-linked chainformation by APC/C complex. In budding yeast, two differ-ent E2s function with APC/C. Ubc4, a family 6 member,promotes monoubiquitylation of substrates at different ly-sine residues. Ubc1 then polyubiquitylates the substrates onpreattached Ubs (Rodrigo-Brenni and Morgan 2007). It islikely that in the absence of Ubc1 in Entamoeba, Ubc4 mayform K48-linked chains on APC/C substrates. Family 11and 12 members Ube2S and UbcH10, respectively, functionin the formation of lysine 11 (K11)-linked Ub chains onAPC/C substrates in higher eukaryotes (Jin et al. 2008),which serves as an efficient signal for proteasomal degrada-tion. Since the kinetics of ubiquitylation varies dependingupon which E2 is involved, these mechanisms ensure propersubstrate ordering which is important because temporalregulation of degradation of mitotic substrates is crucialfor successful mitosis. In higher eukaryotes, K11-linkedchains appear mostly during mitosis in higher eukaryotesand elimination of UBCH10 and UBE2S or APC/C activityresults in a striking decrease in the abundance of thesechains suggesting that most of the K11-linked Ub chainsare made by this ligase in association with these E2s(Matsumoto et al. 2010). In yeast, an ER-associated E2,

Ubc6, has been shown to catalyze the formation of K11-linked chains on substrates of ERAD pathway (Xu et al.2009). Whether, Ubc6 also participates in K11-linked chainformation on mitotic substrates of APC/C in lower eukar-yotes is not known. We found a Ubc6-like protein inEntamoeba that does not have a hydrophobic C-terminaltail that is required to anchor it to the ER. It is likely that thisE2 may participate in formation of K11-linked chains onmitotic substrates in this organism. However, in highereukaryotes, UBE2S forms chains only after UBCH10 hasinitiated ubiquitylation of the substrates. Whether solubleUbc6-like protein in amoeba works in a similar fashion ornot will determine if there are any mechanistic differencesbetween substrate-ordering mechanisms used by highereukaryotes and Entamoeba. Since the structure and topolo-gy of K11-linked chains is very different from K48-linkedchains, these chains may be recognized by specific proteinsto support other, as yet unidentified, functions. Any differ-ences in these mechanisms adopted by Entamoeba andhigher eukaryotes would open up possibilities that can beusefully exploited.

Entamoeba, like Plasmodium falciparum, has beenshown to have an atypical cell cycle. P. falciparum has amodified APC/C complex [16] and based on our observa-tion that E2s that function with the APC/C complex aremissing in Entamoeba, it is likely that Entamoeba may alsohave a modified APC/C complex. Studies on the APC/Ccomplex of these lower eukaryotes may throw some light onthe origin and evolution of the complexity of APC/C-medi-ated substrate modifications. This may also partly explainthe atypical cell cycle control in these organisms.

Other than the families described above, all other familiesof E2s were represented including the variants of E2s andE2 like proteins involved in autophagy (Online Resource 6).Presence of both Ubc6- and Ubc7-like proteins suggests thatERAD pathway is active which functions in quality controlmechanisms of the ER. This is consistent with the highlyactive secretory pathway in this organism. Additionally,Rad6 mediated post-replicative DNA repair and N-end rulepathways may be operative since we also found E3s similarto Rad18 and Ubr1 that work in these pathways. N-end rulepathway governs the stability of proteins based on the N-terminal amino acid of proteins and functions in peptideimport, chromosomal stability, and meiosis. Cytosolic qual-ity control pathway mediated by Ubr1 has been shown in S.cerevisiae (Heck et al. 2010). This pathway may be impor-tant for Entamoeba during its adaptation to host environ-ment. Ubc13- and Mms2-like proteins in amoeba suggestthat the organism can form K63-linked chains required forDNA repair and cell cycle regulation.

All families of E3s were present in Entamoeba and theirrelative numbers reflected the abundance of these familiesfound in other organisms (Online Resource 8). Except for a

48 Parasitol Res (2012) 111:37–51

Page 13: In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species

few, we could not identify the functionally equivalent E3from higher eukaryotes. Several E3s did not match with anyother protein in the database. This makes it challenging todefine the roles of these E3s in the parasite. Similar situationhas been described for apicomplexans where only cell-cycle-related E3s could be identified based on similarity toknown E3s. Co-domain analysis, however, showed a num-ber of domains normally associated with RING finger do-main and some unique domain architecture in amoebaindicating a wide variety of functions being modulated bythese E3s. It is likely that unique E3s in protozoan parasitesmay be serving parasite-specific functions and may be im-portant for the pathogenicity of the parasite. This is likely astwo out of three unique E3s showed the presence of a kinasedomain that may link the extracellular environment outsidewith protein turnover. Due to sequence divergence ofEntamoeba E3s from human E3s, they could provide attrac-tive targets for drug design, as these drugs would be unlikelyto affect the host proteins.

GEO dataset of developmentally regulated genes inEntamoeba shows a number of UPP genes includingNedd8, and some E2s and E3s to be differentially expressedin cyst and trophozoite. A significant number of RBR pro-teins showed differential expression in the two developmen-tal stages. Taken together with a report showing that a RBRprotein is essential for development of D. discoideum(Whitney et al. 2006), it is possible that UPP genes maybe important for the development of the organism. Whether,UPP serves only to eliminate the proteins that are no longerneeded for the next stage or play a more intimate role inshaping the developmental program by regulating gene ex-pression profile of the parasites remains to be explored.

Our analysis did not reveal any differences in UPP com-ponents between pathogenic and nonpathogenic amoebathat can be correlated with pathogenesis. Given that eachE2 and E3 participates in regulating potentially a largenumber of functions, it was expected that a direct compar-ison of protein sequences would not yield this information.However, differential expression of these proteins, specifi-cally E3s, can potentially alter the concentrations of varioussubstrate proteins thereby affecting a number of functions.In order to probe this further, we studied the potentialregulatory elements and the genomic context in which eachE2 and E3 gene is present in the pathogenic and nonpatho-genic amoeba.

Identification of regulatory elements in upstream sequen-ces of genes has not been carried out extensively in amoeba ona genome scale except in one study that has tried to correlateupstream sequences with high and low expressing genes at thewhole genome level (Hackney et al. 2007). We have analyzedthe upstream sequences of RING E3 genes to identify knownand novel motifs that could regulate gene expression.Whereaswe found the GAAC box and the alternative Inr element, we

did not find the TATA box and Inr element sequence in mostof the RING E3s. Absence of these elements has been shownfor other Entamoeba genes and it is also known that AT richsequences of six base pairs or more can substitute for TATAactivity in association with other control elements (Gómez etal. 1998; Purdy et al. 1996; Romero-Díaz et al. 2010). Otherthan these known motifs, a number of motifs were identifiedwith significant e-values (Online Resource 13). Some of thesemotifs show similarities to known promoter elements ofhigher eukaryotes (Online resource 13 and 14). Interestingly,some of these motifs are also found in upstream sequences ofUPP-related genes or their substrates in S. cerevisiae (OnlineResource 13). Thus, there is a possibility that some of theidentified motifs are potential regulatory elements. However,experimental verification will be required as it is very difficultto identify regulatory elements based on computationalapproaches alone because of the position-specific variability.

We found a close association of RING E3s in E. histolyticawith LINE and SINE elements. We have not looked at otherknown TEs in Entamoeba thus our results may be under-estimating the linkage between these elements and E2 andE3 genes. However, our results show that RING E3s havecomplex clusters of LINE and SINE elements in E. histolyticacompared to E2s and E. dispar RING E3s. These elementswere already present in the common ancestor of the threeamoeba species (Lorenzi et al. 2008) but it has been proposedthat SINEs may have expanded after the divergence of E.histolytica and E. dispar (Kumari et al. 2011). It is also knownthat E. dispar has a lower representation of LINE and SINEelements than that observed in E. histolytica (Lorenzi et al.2008). Therefore, it is expected that E. histolytica will have alarger number of genes associated with these elements.However, it has been reported that repeat density of E. histo-lytica is the highest (less than 4 every 10 Kb) among the threespecies. Thus, these elements are enriched in relatively smallgenomic regions in this organism. Our findings thereforeindicate that E3s are preferentially located in regions that arerich in these elements compared to E2s.

Transposable elements can alter genome architecture andgene amplification and expression by a number of mecha-nisms resulting in changes in number, function, or levels ofgene products. A number of studies have speculated upon therole of TEs in speciation and pathogenicity of Entamoeba. Anumber of gene families are physically linked to TEs inEntamoeba, notably, the Hsp70, BspA-like, and AIG families(Lorenzi et al. 2010). A putative membrane insertion domainof BspA-family proteins containing leucine-rich repeats istruncated by a SINE or LINE element at the 5′ end that mayprevent the surface localization of these proteins (Lorenzi etal. 2010). Insertion of multiple TEs inDrosophila is known toreduce the expression of Hsp70 (Shilova et al. 2006) genesand both AIG and Hsp70 proteins have reduced expression inE. dispar compared to E. histolytica (Lorenzi et al. 2010).

Parasitol Res (2012) 111:37–51 49

Page 14: In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species

Based on these observations, it has been speculated that TEscould alter gene expression and function and thus may beimportant for parasite fitness. Since each E3 can regulate theturnover of many substrates, changes in expression levels ofE3s can potentially alter the concentration of many substratesthereby having a dramatic effect on cellular function. Changesin expression levels of E3s have been correlated with dramaticchanges in the phenotypes of cells, for example in cancers(Newton and Vucic 2007). Comparative expression profilingin the two species and identification of substrates of E. histo-lytica RING E3s that have TEs in close proximity can provideclues to the functions that may be regulated by these E3s thatmay explain gain of pathogenicity by E. histolytica.

This study provides a useful resource that will hopefullyencourage further studies in this important area to betterunderstand the biology of Entamoeba species and to devel-op effective strategies for control of amoebiasis.

Acknowledgments This work was supported by grants from theDepartment of Science and Technology to ST. SA was supported byfellowship from the University Grants Commission. PG was supportedby the Defense Research and Development Organization grant to ST.We thank Dr. Sudha Bhattacharya for sharing the data on transposons.We are grateful to Dr Sudha and Alok Bhattacharya for critical readingof the manuscript and for many stimulating discussions.

References

Ali I, Ehrenkaufer G, Hackney J, Singh U (2007) Growth of theprotozoan parasite Entamoeba histolytica in 5-azacytidine haslimited effects on parasite gene expression. BMC Genomics 8:7

Alpi A, Pace P, Babu M, Patel K (2008) Mechanistic insight into site-restricted monoubiquitination of FANCD2 by Ube2t, FANCL,and FANCI. Mol Cell 32:767–777

Aravind L, Koonin E (2000) The U box is a modified RING finger—acommon domain in ubiquitination. Curr Biol 10:R132–R134

Bailey T, Williams N, Misleh C, Li W (2006) MEME: Discoverng andanalyzing DNA and protein sequence motifs. Nucleic Acids Res34:W369–W373

Bays N, Hampton R (2002) Cdc48-Ufd1-Npl4: stuck in the middlewith Ub. Curr Biol 12:R366–R371

Biederer T, Volkwein C, Sommer T (1997) Role of Cue1p in ubiquiti-nation and degradation at the ER surface. Science 278:1806–1809

Burroughs A, Jaffee M, Iyer L, Aravind L (2008) Anatomy of the E2ligase fold: implications for enzymology and evolution of ubiq-uitin/ub-like protein conjugation. J Struct Biol 162:205–218

Capili A, Schultz D, RauscherIII F, Borden K (2001) Solution structureof the PHD domain from the KAP-1 corepressor: structural deter-minants for PHD, RING and LIM zinc-binding domains. EMBO J20:165–177

Carbia-Nagashima A, Gerez J, Perez-Castro C, Paez-Pereda M, SilbersteinS, Stalla GK, Holsboer F, Arzt E (2007) RSUME, a small RWD-containing protein, enhances SUMO conjugation and stabilizes HIF-1alpha during hypoxia. Cell 131:309–323

Ciechanover A, Schwartz A (2004) The ubiquitin system: pathogenesisof human diseases and drug targeting. Biochim Biophys Acta1695:3–17

Citarelli M, Teotia S, Lamb R (2010) Evolutionary history of the poly(ADP-ribose) polymerase gene family in eukaryotes. BMC EvolBiol 10:308

Coscoy L, Sanchez D, Ganem D (2001) A novel class of herpesvirus-encoded membrane-bound E3 ubiquitin ligases regulates endocytosisof proteins involved in immune recognition. J Cell Biol 155:1265–1273

Denuc A, Marfany G (2010) SUMO and ubiquitin paths converge.Biochem Soc Trans 38:34–39

Doerig C (2004) Protein kinases as targets for anti-parasitic chemo-therapy. Biochim Biophys Acta 1697:155–168

Ehrenkaufer G, Haque R, Hackney J, Eichinger D, Singh U (2007)Identification of developmentally regulated genes in Entamoebahistolytica: insights into mechanisms of stage conversion in aprotozoa parasite. Cell Microbiol 9:1426–1444

Eisenhaber B, Chumak N, Eisenhaber F, Hauser M (2007) The ringbetween ring fingers (RBR) protein family. Genome Biol 8:209–219

Furukawa K, Mizushima N, Noda T, Ohsumi Y (2000) A proteinconjugation system in yeast with homology to biosynthetic en-zyme reaction of prokaryotes. J Biol Chem 275:7462–7465

Geng J, Klionsky D (2008) The Atg8 and Atg12 ubiquitin-like conju-gation systems in macroautophagy. ‘Protein modifications: be-yond the usual suspects’ review series. EMBO Rep 9:859–864

Gómez C, Pérez D, López-Bayghen E, Orozco E (1998) Transcrip-tional analysis of the EhPgp1 promoter of Entamoeba histolyticamultidrug-resistant mutant. J Biol Chem 273:7277–7284

Gonzalez J, Bai G, Frevert U, Corey E, Eichinger D (1999)Proteasome-dependent cyst formation and stage-specific ubiquitinmRNA accumulation in Entamoeba invadens. Eur J Biochem264:897–904

Goodman C, McFadden G (2007) Fatty acid biosynthesis as a drugtarget in apicomplexan parasites. Curr Drug Targets 8:15–30

Hackney J, Ehrenkaufer G, Singh U (2007) Identification of putativetranscriptional regulatory networks in Entamoeba histolytica us-ing Bayesian inference. Nucleic Acids Res 35:2141–2152

Hatakeyama S, Yada M, Matsumoto M, Ishida N, Nakayama K (2001)U Box proteins as a new family of ubiquitin-protein ligases. J BiolChem 276:33111–33120

Heck J, Cheung S, Hampton R (2010) Cytoplasmic protein qualitycontrol degradation mediated by parallel actions of the E3 ubiq-uitin ligases Ubr1 and San1. Proc Natl Acad Sci U S A 107:1106–11011

Hershko A (2005) The ubiquitin system for protein degradation andsome of its roles in the control of the cell division cycle. CellDeath Differ 12:1191–1197

Hochstrasser M (2009) Origin and function of ubiquitin-like proteins.Nature 458:422–429

Horrocks P, Newbold C (2000) Intraerythrocytic polyubiquitin expres-sion in Plasmodium falciparum is subjected to developmental andheat-shock control. Mol Biochem Parasitol 105:115–125

Jentsch S, Seufert W, Hauser H (1991) Genetic analysis of the ubiq-uitin system. Biochim Biophys Acta 1089:127–139

Jin L, Williamson A, Banerjee S, Philipp I, Rape M (2008) Mechanismof ubiquitin-chain formation by the human anaphase-promotingcomplex. Cell 133:653–665

Kerscher O, Felberbaum R, Hochstrasser M (2006) Modification ofproteins by ubiquitin and ubiquitin-like proteins. Annu Rev CellDev Biol 22:159–180

Kosarev P, Mayer K, Hardtke C (2002) Evaluation and classification ofRING-finger domains encoded by the Arabidopsis genome. Ge-nome Biol 3:RESEARCH0016.0011–RESEARCH0016.0012

Kostova Z, Tsai Y, Weissman A (2007) Ubiquitin ligases, criticalmediators of endoplasmic reticulum-associated degradation.Semin Cell Dev Biol 18:770–779

Kumari V, Sharma R, Yadav V, Gupta A, Bhattacharya A, Bhattacharya S(2011) Differential distribution of a SINE element in the Entamoebahistolytica and Entamoeba dispar genomes: role of the LINEencoded endonuclease. BMC Genomics 12:267

50 Parasitol Res (2012) 111:37–51

Page 15: In silico analysis of ubiquitin/ubiquitin-like modifiers and their conjugating enzymes in Entamoeba species

LaCount D, Vignali M, Chettier R, Phansalkar A, Bell R, HesselberthJ, Schoenfeld L, Ota I, Sahasrabudhe S, Kurschner C, Fields S,Hughes R (2005) A protein interaction network of the malariaparasite Plasmodium falciparum. Nature 438:103–107

Lindenthal C, Weich N, Chia Y, Heussler V, Klinkert M (2005) Theproteasome inhibitor MLN-273 blocks exoerythrocytic and eryth-rocytic development of Plasmodium parasites. Parasitology 131:37–44

Lorenzi H, Thiagarajan M, Haas B, Wortman J, Hall N, Caler E(2008) Genome wide survey, discovery and evolution of re-petitive elements in three Entamoeba species. BMC Genomics9:595

Lorenzi H, Puiu D, Miller J, Brinkac L, Amedeo P, Hall N, Caler E(2010) New assembly, reannotation and analysis of the Entamoe-ba histolytica genome reveal new genomic features and proteincontent information. PLoS Negl Trop Dis 4:e716

Madsen L, Schulze A, Seeger M, Hartmann-Petersen R (2007) Ubiq-uitin domain proteins in disease. BMC Biochem 8(Suppl 1):S1

Manning-Cela R, Jaishankar S, Swindle J (2006) Life-cycle andgrowth-phase-dependent regulation of the ubiquitin genes of Try-panosoma cruzi. Arch Med Res 37:593–601

Matsumoto M, Wickliffe K, Dong K, Yu C, Bosanac I, Bustos D, Phu L,Kirkpatrick D, Hymowitz S, Rape M, Kelly RF, Dixit VM (2010)K11-linked polyubiquitination in cell cycle control revealed by aK11 linkage-specific antibody. Mol Cell 39:477–484

Merlet J, Burger J, Gomes J, Pintard L (2009) Regulation of cullin-RING E3 ubiquitin-ligases by neddylation and dimerization. CellMol Life Sci 66:1924–1938

Michelle C, Vourch P, Mignon L, Andres C (2009) What was the set ofubiquitin and ubiquitin-like conjugating enzymes in the eukaryotecommon ancestor? J Mol Evol 68:616–628

Myler M (2008) Searching the Tritryp genomes for drug targets. AdvExp Med Biol 625:133–140

Newton K, Vucic D (2007) Ubiquitin ligases in cancer: ushers fordegradation. Cancer Invest 25:502–513

Noma A, Sakaguchi Y, Suzuki T (2009) Mechanistic characteriza-tion of the sulfur-relay system for eukaryotic 2-thiouridinebiogenesis at tRNA wobble positions. Nucleic Acids Res 37:1335–1352

Parry G, Estelle M (2004) Regulation of cullin-based ubiquitin ligasesby the Nedd8/RUB ubiquitin-like proteins. Semin Cell Dev Biol15:221–229

Ponder E, Bogyo M (2007) Ubiquitin-like modifiers and their decon-jugating enzymes in medically important parasitic protozoa.Eukaryot Cell 6:1943–1952

Ponts N, Yang J, Chung D, Prudhomme J, Girke T, Horrocks P, RochKL (2008) Deciphering the ubiquitin-mediated pathway in api-complexan parasites: a potential strategy to interfere with parasitevirulence. PLoS One 3:e2386

Purdy J, Pho L, Mann B, Petri WJ (1996) Upstream regulatory elementscontrolling expression of the Entamoeba histolytica lectin. MolBiochem Parasitol 34:7891–7903

Raasi S, Wolf D (2007) Ubiquitin receptors and ERAD: a network ofpathways to the proteasome. Semin Cell Dev Biol 18:780–791

Rassi S, Schmidtke G, Groettrup M (2001) The ubiquitin-like proteinFAT10 forms covalent conjugates and induces apoptosis. J BiolChem 276:35334–35343

Rhodes D, Bd B, Trowsdale J (2005) Relationship between SPRY andB30.2 protein domains. Evolution of a component of immunedefence? Immunology 116:411–417

Rice P, Longden I, Bleasby A (2000) EMBOSS: the European Molec-ular Biology Open Software Suite. Trends Genet 16:276–277

Ritchie K, Zhang D (2004) ISG15: the immunological kin of ubiquitin.Semin Cell Dev Biol 15:237–246

Rodrigo-Brenni M, Morgan D (2007) Sequential E2s drive polyubi-quitin chain assembly on APC targets. Cell 130:127–139

Romero-Díaz M, Gómez C, López-Reyes I, Martínez M, Orozco E,Rodríguez M (2010) Structural and functional analysis of theEntamoeba histolytica EhrabB gene promoter. Gene 455:32–42

Salvat C, Wang G, Dastur A, Lyon N, Huibregtse J (2004) The -4phenylalanine is required for substrate ubiquitination catalyzed byHECT ubiquitin ligases. J Biol Chem 279:18935–18943

Schlieker C, Veen AVd, Damon J, Spooner E, Ploegh H (2008) Afunctional proteomics approach links the ubiquitin-related modi-fier Urm1 to a tRNA modification pathway. Proc Natl Acad Sci US A 105:18255–18260

Schüle T, Rose M, Entian K, Thumm M, Wolf D (2000) Ubc8pfunctions in catabolite degradation of fructose-1, 6-bisphosphatein yeast. EMBO J 19:2161–2167

Shilova V, Garbuz D, Myasyankina E, Chen B, Evgen’ev M, Feder M,Zatsepina O (2006) Remarkable site specificity of local transpo-sition into the Hsp70 promoter of Drosophila melanogaster. Ge-netics 173:809–820

Skaar J, Pagano M (2009) Control of cell growth by the SCF and APC/C ubiquitin ligases. Curr Opin Cell Biol 21:816–824

Stone S, Hauksdottir H, Troy A, Herschleb J, Kraft E, Callis J (2005)Functional analysis of the RING-type ubiquitin ligase family ofArabidopsis. Plant Physiol 137:13–30

Suzuki K, Nakamura M, Nariai Y, Dekio S, Tanigawa Y (1996)Monoclonal nonspecific suppressor beta (MNSF beta) inhibitsthe production of TNF-alpha by lipopolysaccharide-activatedmacrophages. Immunology 195:187–198

Varshavsky A (1997) The N-end rule pathway of protein degradation.Genes Cells 2:13–28

Whitney N, Pearson L, Lunsford R, McGill L, Gomer R, Lindsey D(2006) A putative Ariadne-like ubiquitin ligase is required for Dic-tyostelium discoideum development. Eukaryot Cell 5:1820–1825

Wilkinson C, Dittmar G, Ohi M, Uetz P, Jones N, Finley D (2004)Ubiquitin-like protein Hub1 is required for pre-mRNA splicingand localization of an essential splicing factor in fission yeast.Curr Biol 14:2283–2288

Wöstmann C, Tannich E, Grunwald T (1992) Ubiquitin of Entamoebahistolytica deviates in six amino acid residues from the consensusof all other known ubiquitins. FEBS 308:54–58

Wöstmann C, Liakopoulous D, Ciechanover A, Grunwald T (1996)Characterization of ubiquitin genes and transcripts and demon-stration of an ubiquitin-conjugating system in Entamoeba histo-lytica. Mol Biochem Parasitol 82:81–92

Xu P, Duong D, Seyfried N, Cheng D, Xie Y, Robert J, Rush J,Hochstrasser M, Finley D, Peng J (2009) Quantitative proteomicsreveals the function of unconventional ubiquitin chains in protea-somal degradation. Cell 137:133–145

Ye Y, Meyer H, Rapoport T (2001) The AAA ATPase Cdc48/p97 andits partners transport proteins from the ER into the cytosol. Nature414:652–656

Ying M, Zhan Z, Wang W, Chen D (2009) Origin and evolution ofubiquitin-conjugating enzymes from Guillardia theta nucleo-morph to hominoid. Gene 447:72–85

Zhao J (2007) Sumoylation regulates diverse biological processes. CellMol Life Sci 64:3017–3033

Parasitol Res (2012) 111:37–51 51