11
Turning gold into ‘junk’: transposable elements utilize central proteins of cellular networks Gyo ¨ rgy Abrusa ´n 1, *, Andra ´ s Szila ´ gyi 2 , Yang Zhang 3 and Bala ´ zs Papp 1 1 Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Center of the Hungarian Academy of Sciences, Temesva ´ ry krt. 62. Szeged H-6701, Hungary, 2 Institute of Enzymology, Hungarian Academy of Sciences, Karolina u ´ t 29.Budapest 1113, Hungary and 3 Center for Computational Medicine and Bioinformatics, Department of Biological Chemistry, Medical School, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218, USA Received July 15, 2012; Revised December 1, 2012; Accepted December 22, 2012 ABSTRACT The numerous discovered cases of domesticated transposable element (TE) proteins led to the recog- nition that TEs are a significant source of evolution- ary innovation. However, much less is known about the reverse process, whether and to what degree the evolution of TEs is influenced by the genome of their hosts. We addressed this issue by searching for cases of incorporation of host genes into the sequence of TEs and examined the systems-level properties of these genes using the Saccharomyces cerevisiae and Drosophila melanogaster genomes. We identified 51 cases where the evolutionary scenario was the incorporation of a host gene fragment into a TE consensus sequence, and we show that both the yeast and fly homologues of the incorporated protein sequences have central positions in the cellular networks. An analysis of se- lective pressure (Ka/Ks ratio) detected significant selection in 37% of the cases. Recent research on retrovirus-host interactions shows that virus proteins preferentially target hubs of the host inter- action networks enabling them to take over the host cell using only a few proteins. We propose that TEs face a similar evolutionary pressure to evolve pro- teins with high interacting capacities and take some of the necessary protein domains directly from their hosts. INTRODUCTION In recent years, the traditional view that transposable elements (TEs) are only a burden to their host organism has shifted. Although their parasitic nature is not ques- tioned, the discoveries that many proteins originate from TEs and that TEs contributed to the invention of several key cellular machineries of multicellular organisms high- lighted their significance in evolutionary innovations (1–3). The best known cases of TE domestication include the RAG protein of the immune system of verte- brates (4), CENP-B protein of centromeres (5–7), light sensing in plants (8), regulation of telomere length (9) or developmental regulation [PAX6 gene, (10)]. Although numerous cases of domestication have already been identified in model organisms, their real number remains unclear, as estimates range from thousands to dozens even in the well-characterized human genome [see (11) versus (12,13)]. Besides providing the raw material for novel genes, TEs also contributed to regulation and generation of allelic diversity: 25% of promoters in human genes contain TE sequences (14), whereas the activity of Helitrons, the eukaryotic rolling circle transposons [re- viewed in (15)] and MuDR (MULE) transposons (16) resulted in the sometimes massive amplification of func- tional genes in their hosts (17–20). The ability of DNA transposons to mobilize fragments of DNA has even resulted in the development of highly efficient vectors (e.g. Sleeping Beauty transposon) for gene transfer (21). Naturally occurring gene capturing has been well studied in the maize and rice genomes, where it occurs at a high rate, involves only particular repeat types like Helitrons or MuDR repeats and is a major force shaping the genome. During gene capturing, fragments or entire genes are incorporated into the transposon, and the sub- sequent amplification of the repeat results in a high copy number of the gene as well. However, Helitrons are unusual among other TEs in their ability to mobilize adjacent DNA [also the prokaryotic relatives of Helitrons are known to mobilize DNA fragments, including antibiotic resistance genes and virulence factors (22)], and even in Helitrons, the captured gene fragments only rarely contribute to the evolution of the transposon sequence itself (17). As the global sequencing effort completed the genomes of most classic model organisms, attention has turned *To whom correspondence should be addressed. Tel: +3662599600; Fax: +3662433506; Email: [email protected] 3190–3200 Nucleic Acids Research, 2013, Vol. 41, No. 5 Published online 21 January 2013 doi:10.1093/nar/gkt011 ß The Author(s) 2013. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. at University of Michigan on April 16, 2013 http://nar.oxfordjournals.org/ Downloaded from

Turning gold into ‘junk’: transposable elements utilize ... · Turning gold into ‘junk’: transposable elements utilize central proteins of cellular networks Gyo¨rgy Abrusa´n1,*,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Turning gold into ‘junk’: transposable elements utilize ... · Turning gold into ‘junk’: transposable elements utilize central proteins of cellular networks Gyo¨rgy Abrusa´n1,*,

Turning gold into ‘junk’: transposable elementsutilize central proteins of cellular networksGyorgy Abrusan1,*, Andras Szilagyi2, Yang Zhang3 and Balazs Papp1

1Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Center of the HungarianAcademy of Sciences, Temesvary krt. 62. Szeged H-6701, Hungary, 2Institute of Enzymology, HungarianAcademy of Sciences, Karolina ut 29.Budapest 1113, Hungary and 3Center for Computational Medicine andBioinformatics, Department of Biological Chemistry, Medical School, University of Michigan, 100 WashtenawAvenue, Ann Arbor, MI 48109-2218, USA

Received July 15, 2012; Revised December 1, 2012; Accepted December 22, 2012

ABSTRACT

The numerous discovered cases of domesticatedtransposable element (TE) proteins led to the recog-nition that TEs are a significant source of evolution-ary innovation. However, much less is known aboutthe reverse process, whether and to what degreethe evolution of TEs is influenced by the genomeof their hosts. We addressed this issue by searchingfor cases of incorporation of host genes into thesequence of TEs and examined the systems-levelproperties of these genes using the Saccharomycescerevisiae and Drosophila melanogaster genomes.We identified 51 cases where the evolutionaryscenario was the incorporation of a host genefragment into a TE consensus sequence, and weshow that both the yeast and fly homologues ofthe incorporated protein sequences have centralpositions in the cellular networks. An analysis of se-lective pressure (Ka/Ks ratio) detected significantselection in 37% of the cases. Recent research onretrovirus-host interactions shows that virusproteins preferentially target hubs of the host inter-action networks enabling them to take over the hostcell using only a few proteins. We propose that TEsface a similar evolutionary pressure to evolve pro-teins with high interacting capacities and take someof the necessary protein domains directly from theirhosts.

INTRODUCTION

In recent years, the traditional view that transposableelements (TEs) are only a burden to their host organismhas shifted. Although their parasitic nature is not ques-tioned, the discoveries that many proteins originate fromTEs and that TEs contributed to the invention of several

key cellular machineries of multicellular organisms high-lighted their significance in evolutionary innovations(1–3). The best known cases of TE domesticationinclude the RAG protein of the immune system of verte-brates (4), CENP-B protein of centromeres (5–7), lightsensing in plants (8), regulation of telomere length (9) ordevelopmental regulation [PAX6 gene, (10)]. Althoughnumerous cases of domestication have already beenidentified in model organisms, their real number remainsunclear, as estimates range from thousands to dozens evenin the well-characterized human genome [see (11) versus(12,13)]. Besides providing the raw material for novelgenes, TEs also contributed to regulation and generationof allelic diversity: 25% of promoters in human genescontain TE sequences (14), whereas the activity ofHelitrons, the eukaryotic rolling circle transposons [re-viewed in (15)] and MuDR (MULE) transposons (16)resulted in the sometimes massive amplification of func-tional genes in their hosts (17–20). The ability of DNAtransposons to mobilize fragments of DNA has evenresulted in the development of highly efficient vectors(e.g. Sleeping Beauty transposon) for gene transfer (21).

Naturally occurring gene capturing has been wellstudied in the maize and rice genomes, where it occursat a high rate, involves only particular repeat types likeHelitrons or MuDR repeats and is a major force shapingthe genome. During gene capturing, fragments or entiregenes are incorporated into the transposon, and the sub-sequent amplification of the repeat results in a high copynumber of the gene as well. However, Helitrons areunusual among other TEs in their ability to mobilizeadjacent DNA [also the prokaryotic relatives ofHelitrons are known to mobilize DNA fragments,including antibiotic resistance genes and virulencefactors (22)], and even in Helitrons, the captured genefragments only rarely contribute to the evolution of thetransposon sequence itself (17).

As the global sequencing effort completed the genomesof most classic model organisms, attention has turned

*To whom correspondence should be addressed. Tel: +3662599600; Fax: +3662433506; Email: [email protected]

3190–3200 Nucleic Acids Research, 2013, Vol. 41, No. 5 Published online 21 January 2013doi:10.1093/nar/gkt011

� The Author(s) 2013. Published by Oxford University Press.This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

at University of M

ichigan on April 16, 2013

http://nar.oxfordjournals.org/D

ownloaded from

Page 2: Turning gold into ‘junk’: transposable elements utilize ... · Turning gold into ‘junk’: transposable elements utilize central proteins of cellular networks Gyo¨rgy Abrusa´n1,*,

towards several other eukaryotic genomes, either owing totheir phylogenetic importance [i.e. (23)], or owing to beingimportant for a narrower research community (24).Besides providing key insights into genome organizationand function, these studies have also revealed a large di-versity of TEs that previously had not been appreciated:repeat classes that had been thought to be extinct inmammals were found [like DNA transposons in bats,(25)], entirely novel classes of repeats were discovered,e.g. Polintons (26,27), and the diversity of knownfamilies was greatly expanded (28,29).

Here, we investigate the incorporation of host genesinto TE sequences, however, in a narrower sense than itwas reported for Helitrons or MULEs; we focus only onthose cases that ‘made it’ to the consensus sequence of thetransposon and thus could influence its evolution. Theincorporation of a protein domain into a TE has beendescribed mostly in those cases, where it resulted in theemergence of a novel repeat type; in non-LTR repeats, theacquisition of an endonuclease, RNase H domain and anORF1 protein in the early history of R2–R4-like repeatsresulted in the emergence of their currently most commonfamilies [L1, I, Jockey, CR1; (30)]. Incorporation of geneshas been documented in LTR retrotransposons, where the(multiple) transitions from the transposon state to a viralstate were enabled by the acquisition—usually from otherviruses—of envelope proteins (31). Also, the acquisition ofa small number of proteins with unclear function by LTRelements has been described (32), and among DNA trans-posons, the acquisition of the helicase domain in Helitronshas been reported (15).

The main objective of this article is to search for cases ofprotein incorporation into TEs and test the systems-levelproperties of these proteins, i.e. whether the incorporatedsequences originate from a random selection of hostproteins, or TEs selectively incorporate genes withdistinct properties within the cellular networks.Currently, cellular networks are well characterized onlyin a small number of model organisms; therefore, we usebudding yeast (Saccharomyces cerevisiae) and the fruitfly(Drosophila melanogaster), as yeast is the eukaryote withthe most-understood interactome, whereas among multi-cellular organisms, the Drosophila interactome is espe-cially well characterized. Our results indicate that (i) theacquisition of host proteins by TEs is not a rare excep-tional event but is relatively common; (ii) the incorporatedgenes participate in significantly more protein–proteininteractions (PPIs) than expected by chance; (iii) arecentral within the interaction networks (i.e. have highbetweenness and closeness centrality); and (iv) a consider-able fraction of them are subject to selection, thuscontribute to the evolution of the TEs.

MATERIALS AND METHODS

Data sources and homology search

We searched 4848 verified yeast ORF sequences and13 909 Drosophila genes against RepBase (v.15.12), themain database of eukaryotic TEs. Only consensus se-quences were used from RepBase to exclude dubious

TEs from the analysis. In the case of Drosophila genes,we used only the core region of each gene, i.e. the regionsthat are present in all known alternative splicing products.This was necessary because TEs can occasionally beincorporated into transcripts by alternative splicing, andthe functionality of such splice products is uncertain(33).The sequences of eukaryotic TEs were downloadedfrom RepBase [http://www.girinst.org, v. 15.12, (34)].The sequences of yeast proteins were downloaded fromthe Saccharomyces Genome Database (http://www.yeastgenome.org); Drosophila genes were downloadedfrom FlyBase (http://flybase.org).The DNA sequences of TEs were translated in all six

frames, and the sequence comparisons between yeast andDrosophila proteins and the translated TE sequences weremade with the jackhmmer tool of the hmmer package(35), with a bit score cutoff 27. In the homologoussequence fragments of TE, yeast and Drosophila proteins,we identified conserved domains using the Pfam database(36) v. 24, (http://pfam.sanger.ac.uk) with hmmscan. Todecide whether the homology between a yeast protein anda TE sequence represents transposon domestication or thereverse process, the incorporation of a protein fragmentinto a TE, we implemented the following, automatedprotocol. First, we implemented the taxonomic tree of�180 000 taxa, using known phylogenetic relationshipsfrom the Pfam database (ncbi_taxonomy table). Next,we screened the Uniprot database (http://www.uniprot.org) with the sequence fragment from yeast or fly froma homologous pair of TE - yeast/Drosophila proteins,using jackhmmer (bit score threshold 27), excluding allmatches of viral or TE origin. Using the species of theSwissprot hits, we identified a subtree on the global tree;in the cases where this resulted in no hits, we used Uniprothits. We repeated the same procedure for the repeatfragment, using the 6-frame translated RepBase as thesequence database, and then compared the two trees(Figure 1B). The tree which the broader taxonomic span(i.e. contains the other) points to the source of thesequence (TE domestication versus protein incorporationinto a TE). This method performs well in the case ofprotein incorporation events, where the phylogeneticspread of a protein domain is typically wide, but its homo-logue is present only in a small number of TEs; however, itis less reliable in the case of ancient TE domestications: inthe case of common domains, e.g. Zinc fingers that arewidespread both in TEs and host proteins, the sequenceexchange events happened too long ago to reliably identifyits source taxon, and also several independent domestica-tion events may not be distinguished from each other,leading to a very broad phylogenetic distribution of suchproteins. In the phylogenetic analysis of the activity-regulated cytoskeletal-associated protein (Arc) gene,multiple alignments were made with muscle (37), and themaximum likelihood phylogenetic tree (1000 bootstrapreplications) was built with MEGA5 (38).

Statistical analyses

We used Monte Carlo simulations to determine whetherthe number of genetic and protein interactions (degree),

Nucleic Acids Research, 2013, Vol. 41, No. 5 3191

at University of M

ichigan on April 16, 2013

http://nar.oxfordjournals.org/D

ownloaded from

Page 3: Turning gold into ‘junk’: transposable elements utilize ... · Turning gold into ‘junk’: transposable elements utilize central proteins of cellular networks Gyo¨rgy Abrusa´n1,*,

A

B

Figure 1. (A) Examples of TEs with significant homology to yeast and Drosophila genes. Black bars indicate TE ORFs, gray bars indicate thelocation of the homologous yeast/Drosophila genes. Wherever present, we used the ORF coordinates provided by RepBase; for the repeats whereRepBase provides no gene prediction, we predicted the ORFs with Glimmer3. (B) The workflow used to determine whether the homology between aTE and a gene is a result of domestication or the incorporation of a host protein into a TE.

3192 Nucleic Acids Research, 2013, Vol. 41, No. 5

at University of M

ichigan on April 16, 2013

http://nar.oxfordjournals.org/D

ownloaded from

Page 4: Turning gold into ‘junk’: transposable elements utilize ... · Turning gold into ‘junk’: transposable elements utilize central proteins of cellular networks Gyo¨rgy Abrusa´n1,*,

betweennes centrality and closeness centrality of the genefragments that were incorporated into TEs are signifi-cantly different from the random expectation. First, wetook 100 000 random samples of genes without replace-ment from the yeast and Drosophila data sets anddetermined the parameter of interest, i.e. the mediannode degree of the proteins in the sample. Next, wedetermined the number of random samples with highermedian value than what we observed in the proteins hom-ologous with TEs. Significance (p) was determined asp= (n+1)/(N+1), where n is the number of randomsamples with medians equal or higher than in theobserved sample, and N is the total number of randomsamples. All analyses were carried out with perl scriptsdeveloped in-house. Betweenness centrality and closenesscentrality were calculated with Pajek, a program for theanalysis of large networks (http://pajek.imfm.si); in theanalyses of yeast interactions, only those interactionswere used where at least one of the interacting partnersis a verified ORF.

Analysis of evolutionary rates in the captured proteins

We tested for significant selection in the incorporatedproteins with two methods. Wherever the incorporatedprotein fragment was present in more than one TEfamily, we compared the two closest homologousfamilies to decide whether their Ka/Ks ratio is signifi-cantly different than 1. We identified Ka and Ks valueswith PAML (39) and tested whether their ratio signifi-cantly differs from one with a likelihood ratio test: wefixed the Ka/Ks ratio at 1, and fitted a similarmaximum-likelihood model to the alignment of the twosequences (40). The difference between the log-likelihoodsof the two models was tested with Chi-square tests, totest whether the null model assuming neutral evolution(Ka/Ks=1) in the TE protein performs significantlyworse than the one where Ka and Ks could vary independ-ently. In those cases where additional TE homologs couldnot be found, we applied a different (and less powerful)procedure; we searched the NCBI ref_mrna database withthe captured fragment of the TE protein with tblastn.Using the best match to the TE fragment, we searchedNCBI ref_mrna again, and using the results of the twosearches identified an outgroup sequence, which is atleast as distant (in terms of bit score) both to the TEfragment and its best homologue as they are to eachother. These three sequences were used to construct anunrooted phylogenetic tree, and we identified theseparate evolutionary rates for all of its branches withPAML. We tested whether the Ka/Ks ratio of the TEbranch is significantly different from 1, similarly as fortwo homologous TEs, with a likelihood ratio test: wefixed the Ka/Ks ratio of the TE branch of the tree at 1and fitted a similar maximum-likelihood model to thealignment of the three sequences, and the differencebetween two models was used to test whether the nullmodel with Ka/Ks=1 in the TE branch of the treeperforms significantly worse than the one where the evo-lutionary rate was allowed to vary.

RESULTS

Identification of yeast and fly protein homologues in TEsand determination of their evolutionary origin

We identified 38 yeast proteins and 145 Drosophilaproteins that show significant similarity to mobile elements(Figure 1, Supplementary Table S1, Supplementary UCSCGenome Browser track). By comparing the taxonomic dis-tribution of the homologues of each protein with the taxo-nomic distribution of its corresponding TE sequence (seeMethods and Figure 1B), we determined whether thematches are likely to be the result of domestication orcapturing of a protein fragment by a TE. We identifiedonly six cases that represent a domestication of a TEsequence, whereas the phylogenies of 108 genes supportthe protein incorporation scenario (Supplementary TableS1, see also Supplementary Figure S1A–C for examples).In 67 cases, the homologous sequences are so widespreadboth in the hosts and among TEs that our method couldnot infer whether the sequence was originating from a hostor a TE, and in two cases, the phylogenies within TEs andhosts show no relationship (thus, horizontal transfer maybe involved).A particularly interesting case of domestication is the

Arc gene of Drosophila (FBgn0033926). Arc genes inmammals received considerable attention in recent yearsbecause they are key regulators of synaptic plasticityrequired for normal brain functioning and long-termmemory formation (41,42). In deuterostomes, they arepresent only in tetrapods and contain a domesticatedfragment of a gag protein from a Gypsy retrotransposon(43). In Drosophila, the Arc genes are also expressed inneurons and regulate behavioral responses for stress(44), although unlike in mammals, they do not influencesynaptic plasticity. Drosophila Arc genes also contain adomesticated fragment of a gag protein from an (insect)Gypsy retrotransposon and show homology to mamma-lian Arc genes (Figure 2). However, the absence ofArc-like genes in other protostomes than insects (and inother deuterostomes than tetrapods) together with theirphylogeny (Figure 2) suggests that the gag proteins ofGypsy retrotransposons were recruited twice independ-ently, and in both cases, the resulting ‘Arc’ genes gainedfunctions in the neural system and can be seen as anexample of ‘convergent domestication’.

Proteins incorporated into repeats occupy central positionsin cellular networks

Most proteins do not operate in isolation but interact withother proteins and form multi-protein complexes, whichperform a particular cellular function. Using PPIs fromthe BioGRID (v. 3.1.83) database for yeast (45) andfrom the FlyBase database (46) for Drosophila, wetested with Monte Carlo simulations whether the incorpo-rated genes have distinct positions in the protein inter-action network, i.e. whether the median number ofprotein interactions (degree) of the captured genes, theirbetweenness centrality (the fraction of shortest paths ofthe network that pass through a particular node) andcloseness centrality (the inverse of the mean distance

Nucleic Acids Research, 2013, Vol. 41, No. 5 3193

at University of M

ichigan on April 16, 2013

http://nar.oxfordjournals.org/D

ownloaded from

Page 5: Turning gold into ‘junk’: transposable elements utilize ... · Turning gold into ‘junk’: transposable elements utilize central proteins of cellular networks Gyo¨rgy Abrusa´n1,*,

A

B

Figure 2. Evolution of Arc proteins. (A) Examples of Arc genes in vertebrates (human, mouse, chicken) and invertebrates (Drosophila melanogaster,Stomoxys calictrans, Drosophila silvestris). Black bars indicate the location of the regions that are homologous both to the domesticated Gypsytransposon and other Arc genes and contain a Retrotrans_gag conserved domain (pfam accession PF03732). (B) A maximum likelihood tree of thehomologous regions of the Arc proteins and several Gypsy retrotransposons. Although the bootstrap support (1000 replications) is low for manybranches, the presence of two Gypsy families between the Arc genes and the absence of Arc proteins in deuterostomes other than tetrapods andprotostomes other than insects indicate that Gypsy gag proteins were domesticated twice independently.

3194 Nucleic Acids Research, 2013, Vol. 41, No. 5

at University of M

ichigan on April 16, 2013

http://nar.oxfordjournals.org/D

ownloaded from

Page 6: Turning gold into ‘junk’: transposable elements utilize ... · Turning gold into ‘junk’: transposable elements utilize central proteins of cellular networks Gyo¨rgy Abrusa´n1,*,

between node v and all other nodes reachable from it) aresignificantly higher than expected by chance. We foundthat the incorporated genes that are present in the PPIdatabases have significantly higher degree and centralitymeasures than expected by chance (Figures 3C and 4) bothin yeast and Drosophila. To rule out any detection biasescaused by the phylogenetic distance between Drosophila/yeast and the host species of TEs, we determined thedivergence time between Drosophila/yeast and the TEhosts with the TimeTree application (47) and testedwhether the network characteristics of incorporatedproteins are positively correlated with divergence. Noneof the network parameters showed a significant correl-ation (Supplementary Figure S2).

An important question is whether the incorporatedprotein fragments/domains are themselves highly interact-ing or merely come from highly interacting proteins.Currently, this can be tested only indirectly because dataon direct domain–domain interactions are sill limited andhave been compiled only for proteins present in the PDBdatabase (48), and thus represent only a small subset of all

protein interactions. We used the 3did database (49),which contains 6260 interactions between 4302 Pfamdomains of PDB entries, to test whether the Pfamdomains found in the incorporated sequences aredomains that interact with more domains than it wouldbe expected from randomly selected ones. First, wesearched for the presence of conserved protein domainsin the protein fragments homologous to a TE using thePfam database. Altogether, we identified 149 conserveddomains in the yeast and fly genes homologous to a TE(Supplementary Tables S2 and S3), of which 74 are foundin the protein incorporation cases. These represent only 37different domains though, as frequently similar domainsare incorporated into different TEs. From the 37 differentPfam domains of the incorporated proteins, 27 are presentin the 3did database. Using Monte Carlo simulations, wefound that the mean number of interactions of thesedomains with other Pfam domains is significantly higher(3.44, P=0.023) than expected by chance (2.14+/� 0.56),supporting the hypothesis that TEs pick up domains withhigher number of interacting partners than the average.

Saccharomyces cerevisiae Drosophila melanogasternr of

genes in dataset

mean of random samples SD sample P

nr of genes in dataset

mean of random samples SD sample P

fitness 20 0.977 0.061 0.972 0.748 - - - - -degree, PPI 23 13.891 4.625 30 0.005 35 8.796 3.41 20 0.001degree, GI 23 27.055 9.65 23 0.638 14 3.321 1.526 6.5 0.046betweenness, PPI 23 4.75E-05 2.74E-05 1.11E-04 0.032 35 2.28E-05 2.36E-05 1.26E-04 0.008betweenness, GI 23 4.15E-05 3.68E-05 4.57E-05 0.306 14 1.41E-04 2.41E-04 1.81E-04 0.234closeness, PPI 23 0.382 0.012 0.393 0.145 35 0.325 0.011 0.348 0.021closeness, GI 23 0.366 0.01 0.369 0.361 14 0.249 0.012 0.276 0.011direct PPI (nodes) 23 2.199 2.278 9 0.016 35 4.471 3.135 13 0.012direct PPI (edges) 23 1.323 1.687 6 0.025 35 2.964 2.495 14 0.002

PPI - protein-protein interactionsGI - genetic interactions

1 4 7 10 13 16 19 22 25 28

Degree, protein-protein interactions (PPI)

0

2000

4000

6000

8000

10000

12000

14000

16000

Num

ber

of o

bser

vatio

ns

0.0E-011.5E-05

2.9E-054.4E-05

5.9E-057.3E-05

8.8E-051.0E-04

1.2E-041.3E-04

1.5E-041.6E-04

1.8E-041.9E-04

Betweenness centrality (PPI)

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

22000

Num

ber o

f obs

erva

tions

A

C

B

Figure 3. Characteristics of the genes that were incorporated into TEs. We performed Monte Carlo simulations to test whether fitness and networkcharacteristics like degree (the number of interactions with other nodes), betweenness centrality (the fraction of shortest paths that pass through anode) and closeness centrality (the inverse of the mean distance between node v and all other nodes reachable from it) are significantly different in theincorporated genes than the random expectation. (A) Distribution of the median degree (PPIs) in 100 000 random samples, the arrow represents themedian of the 35 Drosophila genes for which PPI information was available. (B) Distribution of the median betweenness centrality in 100 000random samples, and the observed level in Drosophila. (C) Statistical summary of Monte Carlo simulations. We did not perform tests for the fitnesseffect for Drosophila gene knockouts, as we are not aware of studies that provide such data at a genomic scale.

Nucleic Acids Research, 2013, Vol. 41, No. 5 3195

at University of M

ichigan on April 16, 2013

http://nar.oxfordjournals.org/D

ownloaded from

Page 7: Turning gold into ‘junk’: transposable elements utilize ... · Turning gold into ‘junk’: transposable elements utilize central proteins of cellular networks Gyo¨rgy Abrusa´n1,*,

Besides physical interactions, genetic interactionsprovide an alternative means to depict functional connec-tions between genes. A genetic interaction is defined as thedifference of the fitness effect of a double gene deletionmutant in comparison with the expected multiplicativeeffect of the two individual deletions. For example, anextreme case is a synthetic lethal interaction, a lethaldouble mutant phenotype where the individual deletionproducts of the two genes are both viable phenotypes.Recently, large-scale genetic interaction maps have

become available for yeast [i.e. (50)], that enabled thecharacterization of the entire functional landscape of theyeast cell. Genetic interaction data for Drosophila aremuch less abundant and are available only for a smallfraction (12%) of genes. Similarly to PPIs, we usedgenetic interactions deposited in the BioGRID andFlyBase databases and tested whether the incorporatedgenes have higher node degree, betweenness and closenesscentrality in the genetic interaction network as well. Wefound that node degree and closeness centrality of theincorporated genes is significantly higher that therandom expectation only in Drosophila, whereas between-ness centrality is not significantly different from therandom expectation neither in yeast nor Drosophila(Figure 3B and C).

Finally, using Monte Carlo simulations, we testedwhether the incorporated proteins interact more fre-quently with each other in the host cellular networkthan random proteins and whether they have related func-tions. Although the number of direct PPIs is low inboth species, we found that their number is still muchhigher than expected between randomly chosen proteins(Figure 3), indicating that TEs are under selection to in-corporate genes with particular functions. To test this, weestimated the enrichment of GO terms in them withGOrilla (51); however, the incorporated genes cannot beassigned to a single GO term either in yeast or Drosophila,the most enriched (P< 10�5) molecular function terms aremetal ion binding in Drosophila and cofactor binding andglyceraldehyde-3-phosphate dehydrogenase activity inyeast (Supplementary Table S4).

Testing the functionality of proteins with captured genefragments

Randomly inserting or deleting a sequence into a proteincan lead to the loss of its function, and thus an importantquestion concerning the TE proteins that captured a DNAfragment of their host is whether such proteins remainedfunctional. Although domain rearrangements are rela-tively common in the evolution of genes (52), and recentstudies show that mid-domain breaks can also result infunctional proteins (53), it is not possible to prove insilico that a particular chimeric protein is or was func-tional (i.e. enzymatically active). Nevertheless, severallines of evidence indicate that many of these TE proteinscontribute to the fitness of the TEs.

A functional protein is likely to be subject to purifyingor adaptive selection. We tested for signatures of selectionin the captured gene fragments of the TE proteins thathave acquired such a gene fragment, and the length ofthe captured sequence was at least 50 aa residues(Supplementary Table S5). We used two methods (seeMaterials and Methods for details); whenever it waspossible we compared the two closest homologous TEfamilies sharing the same incorporated protein fragment,and in the remaining cases, we built an unrooted phylo-genetic tree with the TE sequence, the closest RefSeqprotein and an outgroup, and tested for selection on theTE branch of the tree. There is very little or no differencebetween the incorporated gene fragment and its closest

Figure 4. (A) The network of all PPIs of Drosophila genes with hom-ology to a TE, for which PPI data were available in FlyBase (35 genes,highlighted in black). The median number of PPIs is 20. (B) An exampleof a PPI network for a randomly selected set of Drosophila genes (also35 genes, highlighted in black). The median number of PPIs is 9, corres-ponding to the average of the random samples (see Figure 3).

3196 Nucleic Acids Research, 2013, Vol. 41, No. 5

at University of M

ichigan on April 16, 2013

http://nar.oxfordjournals.org/D

ownloaded from

Page 8: Turning gold into ‘junk’: transposable elements utilize ... · Turning gold into ‘junk’: transposable elements utilize central proteins of cellular networks Gyo¨rgy Abrusa´n1,*,

RefSeq homolog (Ks< 0.01) in nine cases; thus, these se-quences could not be tested reliably; additionally in 19cases, the TE sequence and the closest RefSeq orRepBase homologs are so highly diverged (Ks> 10) thatthe saturation of synonymous substitutions also makesany tests of selection unreliable. From the 24 cases ofprotein incorporation where 0.01<Ks< 10, we coulddetect significant selection in nine cases [P< 0.05, likeli-hood ratio test, (40)], with one case indicating adaptiveevolution (hAT-N22_DR) and the remaining eightindicating purifying selection (Supplementary Table S5).

The incorporation of a new domain may result in aprotein that contains conflicting molecular features, forexample, the presence of both extracellular and nucleardomains within the same protein. To exclude those caseswhere the domain composition/structure already indicatesa dysfunctional protein, we tested all TE proteins with acaptured gene fragment with MisPred, a pipeline designedto detect mispredicted and abnormal proteins based onsuch conflicts (54). In the majority of the proteins withincorporated gene fragments, MisPred found no signs ofabnormality, except five cases: BEL1-I_SM and Helitron-N3_ZM contain a truncated Pfam domain owing to mid-domain breaks, whereas the respective proteins inGypsy-1-I_MI, Gypsy-40_Mad-I and SRV_MM-intcontain transmembrane helices (55), which is unexpectedin the case of TEs, as they are not part of membranes. Inthe case of Gypsy-1-I_MI, the transmembrane helix is notin the incorporated fragment; thus, it may be amisannotation or may have an unknown function. (Wefound additional four cases of proteins with transmem-brane helices in TEs where the origin of the domain wasunclear or owing to domestication).

Finally to investigate how the captured gene fragmentsinfluence the function of the TE proteins, we predicted the3D structure of each chimeric protein that carries afragment of a non-TE gene and is shorter than 500amino acids (13 cases) with I-TASSER (56,57). Owing tothe lack of sufficiently good templates in the Protein DataBank (www.pdb.org), only three of the proteins have suf-ficiently high quality models (estimated TM-score� 0.5)that also the positioning of the individual domains islikely to be correct, and therefore their function can bepredicted with high confidence (Figure 5). These modelswere analysed with COFACTOR, a part of theI-TASSER pipeline that predicts the function and catalyticcenters of proteins using their tertiary structure. Althoughin the three structures, the incorporated protein fragmentprovides a different functionality (oxidoreductase activity,methyltransferase activity - RNA binding, DNA binding;see Figure 5), the sequence of the incorporated fragmentoverlaps with the predicted binding sites of the proteins(Figure 5), further supporting the hypothesis that capturinghost genes contributed to the emergence of new functions.

DISCUSSION

Overall, our findings suggest that not only TE proteinscontribute to the evolution of their hosts but the reverseprocess of protein ‘junkification’ might be also significant

in explaining the origin and the diversity of the TE se-quences. The central position of the homologues of theincorporated genes both in the yeast and DrosophilaPPI networks suggests that TEs acquire genes with par-ticular characteristics and function, and that these se-quences are either not picked up randomly by therepeats or are not retained randomly in the repeats.Different TEs show variability in their insertion prefer-ences; non-LTR retrotransposons, which do not movehorizontally or are active in hosts with compact

Figure 5. Examples of chimeric TE protein structures with anincorporated fragment of a host gene. The structures were predictedwith I-TASSER, their function and catalytic centers were predictedwith COFACTOR. In all cases, the estimated TM-score with the truetopology is >0.5; thus, the models have an essentially correct topology.Alpha helices are highlighted with blue, beta sheets with yellow, greenregions indicate the fraction of the sequence that is homologous to anon-TE protein and red highlights the catalytic core predicted byCOFACTOR. (A) DNA8-8B_Mad from the apple genome, estimatedTM score with the correct fold is 0.53. The incorporated fragmentshows 80% sequence similarity to a short chain dehydrogenase(B9RTW7) with oxidoreductase activity (GO:0016491). (B) Helitron_N3_ZM from maize, estimated TM score is 0.64. The incorporatedfragment shows 90% amino acid sequence similarity to maize fibrillarin(B6T4G7). The predicted highest scoring gene ontology terms for themolecular function of the protein are methyltransferase activity(GO:0008168) and RNA binding (GO:0003723). (C) MARINER2_DM transposon from Drosophila, estimated TM score is 0.74. Theincorporated fragment is only 30% similar to the Drosophila geneCG18367-PA. The highest scoring molecular function GO term isDNA binding (GO:0003677).

Nucleic Acids Research, 2013, Vol. 41, No. 5 3197

at University of M

ichigan on April 16, 2013

http://nar.oxfordjournals.org/D

ownloaded from

Page 9: Turning gold into ‘junk’: transposable elements utilize ... · Turning gold into ‘junk’: transposable elements utilize central proteins of cellular networks Gyo¨rgy Abrusa´n1,*,

genomes, typically target gene-poor, AT-rich regions [i.e.L1s or Alus in the human genome, (12)] or heterochroma-tin [i.e. Ty5 retrotransposons in the yeast genome, (58)],most likely to minimize their deleterious effect on hostfitness. In contrast, genomic parasites capable of horizon-tal transfer like DNA transposons, LTR retrotransposonsand retroviruses either show little target selectivity, orpreferentially insert near actively transcribed genes(59,60), which, in consequence, might be more likelyincorporated into a TE. A test of this hypothesis wouldbe if genes with homologues to different repeat classeswould show different patterns, i.e. genes with homologyto an LTR retrotransposon would be hubs of PPInetworks, whereas genes with homology to non-LTRretrotransposons would not; however, the number ofgenes with homology to non-LTR retroelements is toolow to allow a meaningful comparison.An alternative hypothesis is that TEs incorporate genes

fragments randomly, but only a small fraction of se-quences—with high number of interactions and central-ity—remain in the TE consensus owing to selection. Anumber of observations support this hypothesis. First, inthe majority of the TEs, the captured protein fragmentresides within the predicted genes of the TE and not inthe intergenic regions of the repeat, have a similar strandorientation to the nearest TE protein (SupplementaryUCSC Browser track) and in a significant fraction of thecases where there has been enough time to accumulatenucleotide differences between the host gene and theincorporated sequence (Ks> 0.01), significant selectioncould be detected (Supplementary Table S5). Second, allrepeats in our analysis are consensus sequences that arepresent in multiple copies in their host genome; thus, theincorporation of foreign sequences clearly did not makethese repeats dysfunctional. Third, research on virus-hostinteractions indicates that incorporation of proteinswith high degree of interactions and centrality may bebeneficial for the TEs. Recently, Calderwood et al. (61)demonstrated that the proteins of Epstein–Barr virus pref-erentially interact with hubs of human protein interactionnetworks, and this pattern was subsequently confirmed formany other viruses and even parasitic bacteria (62). Themost likely cause of the preferential interaction withhighly connected proteins is that targeting hubs of thehosts’ cellular network is the most efficient way of usingits resources, i.e. diverting its pathways to the use of theparasite, especially if the parasite has only a few proteinsto achieve this task. As autonomous TEs encode only fewproteins (frequently only one), the efficient use of the hostresources may favor the evolution of multifunctionalproteins with abilities to interact with several pathwaysof the host interactome, and the simplest way to achievethis is to acquire such protein domains directly from thehost. In addition, retroviruses and retrotransposons wereshown to interact with overlapping sets of proteins (63),further corroborating this hypothesis.As the identified cases of gene capturing did not happen

in yeast or Drosophila, it raises the question to whatextent the yeast and Drosophila homologues of thecaptured proteins have similar properties to the capturedones. We used these species to investigate the systems-level

characteristics of the incorporated proteins because theyhave much better characterized interactomes than mostother model organisms, and the fact that we observe asimilar pattern in two very distantly related species, andalso in domain–domain interactions, provides strongsupport for the generality of our results. Recent findingsindicate that PPIs are highly conserved and evolve threeorders of magnitude slower than protein sequences them-selves (64), which explains the qualitatively similar resultsin the two species. However, it is unclear how far geneticinteractions are conserved across species. Such compari-sons are challenging owing to technological differencesbetween model organisms (e.g. RNAi is used for geneknockdown in multicellular organisms while in in-framedeletion is used in yeast), and also the number of genes forwhich information is available is very different [see (65) forreview]. The lack of a significant effect in yeast, and thelow number of genes with known genetic interactions in flyindicates that any conclusions on genetic interactionscannot be readily generalized at this point.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online:Supplementary Tables 1–5, Supplementary Figures 1and 2, Supplementary UCSC Genome Browser track.

ACKNOWLEDGEMENTS

The authors thank Csaba Pal for useful comments andsuggestions.

FUNDING

Funding for open access charge: Hungarian ScientificResearch Fund [PD83571 to G.A., NK77978 to A.Sz.and PD75261 to P.B.]; International Human FrontierScience Program Organization (to B.P.); ‘LenduletProgram’ of the Hungarian Academy of Sciences to (toB.P.); NSF Career Award [DBI 0746198 to Y.Z.];National Institute of General Medical Sciences[GM083107 and GM084222 to Y.Z.].

Conflict of interest statement. None declared.

REFERENCES

1. Volff,J. (2006) Turning junk into gold: domestication oftransposable elements and the creation of new genes ineukaryotes. Bioessays, 28, 913–922.

2. Jurka,J., Kapitonov,V.V., Kohany,O. and Jurka,M.V. (2007)Repetitive sequences in complex genomes: structure and evolution.Annu. Rev. Genomics. Hum. Genet., 8, 241–259, 10.1146/annurev.genom.8.080706.092416.

3. Feschotte,C. and Pritham,E.J. (2007) DNA transposons and theevolution of eukaryotic genomes. Annu. Rev. Genet., 41, 331–368,10.1146/annurev.genet.40.110405.090448.

4. Kapitonov,V.V. and Jurka,J. (2005) RAG1 core and V(D)Jrecombination signal sequences were derived from Transibtransposons. PLoS Biol., 3, e18110.1371/journal.pbio.0030181.

5. Smit,A.F. and Riggs,A.D. (1996) Tiggers and DNA transposonfossils in the human genome. Proc. Natl Acad. Sci. USA, 93,1443–1448.

3198 Nucleic Acids Research, 2013, Vol. 41, No. 5

at University of M

ichigan on April 16, 2013

http://nar.oxfordjournals.org/D

ownloaded from

Page 10: Turning gold into ‘junk’: transposable elements utilize ... · Turning gold into ‘junk’: transposable elements utilize central proteins of cellular networks Gyo¨rgy Abrusa´n1,*,

6. Kipling,D. and Warburton,P.E. (1997) Centromeres, CENP-B andTigger too. Trends Genet., 13, 141–145.

7. Casola,C., Hucks,D. and Feschotte,C. (2008) Convergentdomestication of pogo-like transposases into centromere-bindingproteins in fission yeast and mammals. Mol. Biol. Evol, 25, 29–41,10.1093/molbev/msm221.

8. Lin,R., Ding,L., Casola,C., Ripoll,D.R., Feschotte,C. andWang,H. (2007) Transposase-derived transcription factors regulatelight signaling in Arabidopsis. Science, 318, 1302–1305, 10.1126/science.1146281.

9. Eickbush,T.H. (1997) Telomerase and retrotransposons: whichcame first? Science, 277, 911–912.

10. Ivics,Z., Izsvak,Z., Minter,A. and Hackett,P.B. (1996)Identification of functional domains and evolution of Tc1-liketransposable elements. Proc. Natl Acad. Sci. USA, 93, 5008–5013.

11. Britten,R. (2006) Transposable elements have contributed tothousands of human proteins. Proc. Natl Acad. Sci. USA, 103,1798–1803, 10.1073/pnas.0510007103.

12. Lander,E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C.,Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,W. et al.(2001) Initial sequencing and analysis of the human genome.Nature, 409, 860–921, 10.1038/35057062.

13. Zdobnov,E.M., Campillos,M., Harrington,E.D., Torrents,D. andBork,P. (2005) Protein coding potential of retroviruses and othertransposable elements in vertebrate genomes. Nucleic Acids Res.,33, 946–954, 10.1093/nar/gki236.

14. Jordan,I.K., Rogozin,I.B., Glazko,G.V. and Koonin,E.V. (2003)Origin of a substantial fraction of human regulatory sequencesfrom transposable elements. Trends Genet., 19, 68–72.

15. Kapitonov,V.V. and Jurka,J. (2001) Rolling-circle transposons ineukaryotes. Proc. Natl Acad. Sci. USA, 98, 8714–8719, 10.1073/pnas.151269298.

16. Jiang,N., Bao,Z., Zhang,X., Eddy,S.R. and Wessler,S.R. (2004)Pack-MULE transposable elements mediate gene evolution inplants. Nature, 431, 569–573, 10.1038/nature02953.

17. Yang,L. and Bennetzen,J.L. (2009) Distribution, diversity,evolution, and survival of Helitrons in the maize genome. Proc.Natl Acad. Sci. USA, 106, 19922–19927, 10.1073/pnas.0908008106.

18. Pritham,E.J. and Feschotte,C. (2007) Massive amplification ofrolling-circle transposons in the lineage of the bat Myotislucifugus. Proc. Natl Acad. Sci. USA, 104, 1895–1900, 10.1073/pnas.0609601104.

19. Du,C., Fefelova,N., Caronna,J., He,L. and Dooner,H.K. (2009)The polychromatic Helitron landscape of the maize genome. Proc.Natl Acad. Sci. USA, 106, 19916–19921, 10.1073/pnas.0904742106.

20. Hanada,K., Vallejo,V., Nobuta,K., Slotkin,R.K., Lisch,D.,Meyers,B.C., Shiu,S. and Jiang,N. (2009) The functional role ofpack-MULEs in rice inferred from purifying selection andexpression profile. Plant Cell, 21, 25–38, 10.1105/tpc.108.063206.

21. Ivics,Z. and Izsvak,Z. (2010) The expanding universe oftransposon technologies for gene and cell engineering. Mob.DNA, 1, 2510.1186/1759-8753-1-25.

22. Toleman,M.A., Bennett,P.M. and Walsh,T.R. (2006) ISCRelements: novel gene-capturing systems of the 21st century?Microbiol. Mol. Biol. Rev., 70, 296–316, 10.1128/MMBR.00048-05.

23. Warren,W.C., Hillier,L.W., Marshall Graves,J.A., Birney,E.,Ponting,C.P., Grutzner,F., Belov,K., Miller,W., Clarke,L.,Chinwalla,A.T. et al. (2008) Genome analysis of the platypusreveals unique signatures of evolution. Nature, 453, 175–183,10.1038/nature06936.

24. Colbourne,J.K., Pfrender,M.E., Gilbert,D., Thomas,W.K.,Tucker,A., Oakley,T.H., Tokishita,S., Aerts,A., Arnold,G.J.,Basu,M.K. et al. (2011) The ecoresponsive genome of Daphniapulex. Science, 331, 555–561, 10.1126/science.1197761.

25. Ray,D.A., Feschotte,C., Pagan,H.J.T., Smith,J.D., Pritham,E.J.,Arensburger,P., Atkinson,P.W. and Craig,N.L. (2008)Multiple waves of recent DNA transposon activity in the bat,Myotis lucifugus. Genome Res., 18, 717–728, 10.1101/gr.071886.107.

26. Pritham,E.J., Putliwala,T. and Feschotte,C. (2007) Mavericks, anovel class of giant transposable elements widespread ineukaryotes and related to DNA viruses. Gene, 390, 3–17, 10.1016/j.gene.2006.08.008.

27. Kapitonov,V.V. and Jurka,J. (2006) Self-synthesizing DNAtransposons in eukaryotes. Proc. Natl Acad. Sci. USA, 103,4540–4545, 10.1073/pnas.0600833103.

28. de la Chaux,N. and Wagner,A. (2011) BEL/Pao retrotransposonsin metazoan genomes. BMC Evol. Biol., 11, 154, 10.1186/1471-2148-11-154.

29. Kojima,K.K., Kapitonov,V.V. and Jurka,J. (2011) Recentexpansion of a new Ingi-related clade of Vingi non-LTRretrotransposons in hedgehogs. Mol. Biol. Evol., 28, 17–20,10.1093/molbev/msq220.

30. Malik,H.S., Burke,W.D. and Eickbush,T.H. (1999) The age andevolution of non-LTR retrotransposable elements. Mol. Biol.Evol., 16, 793–805.

31. Malik,H.S., Henikoff,S. and Eickbush,T.H. (2000) Poised forcontagion: evolutionary origins of the infectious abilities ofinvertebrate retroviruses. Genome Res., 10, 1307–1318.

32. Havecker,E.R., Gao,X. and Voytas,D.F. (2004) The diversity ofLTR retrotransposons. Genome Biol., 5, 225, 10.1186/gb-2004-5-6-225.

33. Gotea,V. and Makalowski,W. (2006) Do transposable elementsreally contribute to proteomes? Trends Genet., 22, 260–267,10.1016/j.tig.2006.03.006.

34. Jurka,J., Kapitonov,V.V., Pavlicek,A., Klonowski,P., Kohany,O.and Walichiewicz,J. (2005) Repbase Update, a database ofeukaryotic repetitive elements. Cytogenet. Genome Res., 110,462–467, 10.1159/000084979.

35. Eddy,S.R. (2009) A new generation of homology searchtools based on probabilistic inference. Genome Inform., 23,205–211.

36. Finn,R.D., Mistry,J., Tate,J., Coggill,P., Heger,A., Pollington,J.E.,Gavin,O.L., Gunasekaran,P., Ceric,G., Forslund,K. et al. (2010)The Pfam protein families database. Nucleic Acids Res., 38,D211–D222, 10.1093/nar/gkp985.

37. Edgar,R.C. (2004) MUSCLE: a multiple sequence alignmentmethod with reduced time and space complexity. BMCBioinformatics, 5, 113, 10.1186/1471-2105-5-113.

38. Tamura,K., Peterson,D., Peterson,N., Stecher,G., Nei,M. andKumar,S. (2011) MEGA5: molecular evolutionary geneticsanalysis using maximum likelihood, evolutionary distance, andmaximum parsimony methods. Mol. Biol. Evol., 28, 2731–2739,10.1093/molbev/msr121.

39. Yang,Z. (2007) PAML 4: phylogenetic analysis by maximumlikelihood. Mol. Biol. Evol., 24, 1586–1591.

40. Yang,Z. and Bielawski,J.P. (2000) Statistical methods fordetecting molecular adaptation. Trends Ecol. Evol., 15, 496–503.

41. Shepherd,J.D. and Bear,M.F. (2011) New views of Arc, a masterregulator of synaptic plasticity. Nat. Neurosci., 14, 279–284,10.1038/nn.2708.

42. Korb,E. and Finkbeiner,S. (2011) Arc in synaptic plasticity: fromgene to behavior. Trends Neurosci., 34, 591–598, 10.1016/j.tins.2011.08.007.

43. Campillos,M., Doerks,T., Shah,P.K. and Bork,P. (2006)Computational characterization of multiple Gag-like humanproteins. Trends Genet., 22, 585–589, 10.1016/j.tig.2006.09.006.

44. Mattaliano,M.D., Montana,E.S., Parisky,K.M., Littleton,J.T. andGriffith,L.C. (2007) The Drosophila ARC homolog regulatesbehavioral responses to starvation. Mol. Cell. Neurosci., 36,211–221, 10.1016/j.mcn.2007.06.008.

45. Stark,C., Breitkreutz,B., Chatr-Aryamontri,A., Boucher,L.,Oughtred,R., Livstone,M.S., Nixon,J., Van Auken,K., Wang,X.,Shi,X. et al. (2011) The BioGRID Interaction Database: 2011update. Nucleic Acids Res., 39, D698–D704.

46. McQuilton,P., St Pierre,S.E. and Thurmond,J. (2012) FlyBase101–the basics of navigating FlyBase. Nucleic Acids Res., 40,D706–D714, 10.1093/nar/gkr1030.

47. Hedges,S.B., Dudley,J. and Kumar,S. (2006) TimeTree: a publicknowledge-base of divergence times among organisms.Bioinformatics, 22, 2971–2972, 10.1093/bioinformatics/btl505.

48. Berman,H., Henrick,K., Nakamura,H. and Markley,J.L. (2007)The worldwide Protein Data Bank (wwPDB): ensuring a single,uniform archive of PDB data. Nucleic Acids Res., 35,D301–D303, 10.1093/nar/gkl971.

49. Stein,A., Ceol,A. and Aloy,P. (2011) 3did: identification andclassification of domain-based interactions of known

Nucleic Acids Research, 2013, Vol. 41, No. 5 3199

at University of M

ichigan on April 16, 2013

http://nar.oxfordjournals.org/D

ownloaded from

Page 11: Turning gold into ‘junk’: transposable elements utilize ... · Turning gold into ‘junk’: transposable elements utilize central proteins of cellular networks Gyo¨rgy Abrusa´n1,*,

three-dimensional structure. Nucleic Acids Res., 39, D718–D723,10.1093/nar/gkq962.

50. Costanzo,M., Baryshnikova,A., Bellay,J., Kim,Y., Spear,E.D.,Sevier,C.S., Ding,H., Koh,J.L.Y., Toufighi,K., Mostafavi,S. et al.(2010) The genetic landscape of a cell. Science, 327, 425–431,10.1126/science.1180823.

51. Eden,E., Navon,R., Steinfeld,I., Lipson,D. and Yakhini,Z. (2009)Gorilla: a tool for discovery and visualization of enriched GOterms in ranked gene lists. BMC Bioinformatics, 10, 48, 10.1186/1471-2105-10-48.

52. Wu,Y., Rasmussen,M.D. and Kellis,M. (2012) Evolution at thesubgene level: domain rearrangements in the drosophilaphylogeny. Mol. Biol. Evol., 29, 689–705, 10.1093/molbev/msr222.

53. Rogers,R.L. and Hartl,D.L. (2012) Chimeric genes as a source ofrapid evolution in Drosophila melanogaster. Mol. Biol. Evol., 29,517–529, 10.1093/molbev/msr184.

54. Nagy,A., Hegyi,H., Farkas,K., Tordai,H., Kozma,E., Banyai,L.and Patthy,L. (2008) Identification and correction of abnormal,incomplete and mispredicted proteins in public databases. BMCBioinformatics, 9, 353, 10.1186/1471-2105-9-353.

55. Krogh,A., Larsson,B., von Heijne,G. and Sonnhammer,E.L.(2001) Predicting transmembrane protein topology with a hiddenMarkov model: application to complete genomes. J. Mol. Biol.,305, 567–580, 10.1006/jmbi.2000.4315.

56. Roy,A., Kucukural,A. and Zhang,Y. (2010) I-TASSER: a unifiedplatform for automated protein structure and function prediction.Nat. Protoc., 5, 725–738, 10.1038/nprot.2010.5.

57. Zhang,Y. (2008) I-TASSER server for protein 3D structureprediction. BMC Bioinformatics, 9, 40, 10.1186/1471-2105-9-40.

58. Gao,X., Hou,Y., Ebina,H., Levin,H.L. and Voytas,D.F. (2008)Chromodomains direct integration of retrotransposons toheterochromatin. Genome Res., 18, 359–369, 10.1101/gr.7146408.

59. Bushman,F.D. (2003) Targeting survival: integration siteselection by retroviruses and LTR-retrotransposons. Cell, 115,135–138.

60. Liang,Q., Kong,J., Stalker,J. and Bradley,A. (2009) Chromosomalmobilization and reintegration of Sleeping Beauty and PiggyBactransposons. Genesis, 47, 404–408, 10.1002/dvg.20508.

61. Calderwood,M.A., Venkatesan,K., Xing,L., Chase,M.R.,Vazquez,A., Holthaus,A.M., Ewence,A.E., Li,N., Hirozane-Kishikawa,T., Hill,D.E. et al. (2007) Epstein-Barr virus and virushuman protein interaction maps. Proc. Natl Acad. Sci. USA, 104,7606–7611, 10.1073/pnas.0702332104.

62. Dyer,M.D., Murali,T.M. and Sobral,B.W. (2008) The landscapeof human proteins interacting with viruses and other pathogens.PLoS Pathog., 4, e32, 10.1371/journal.ppat.0040032.

63. Irwin,B., Aye,M., Baldi,P., Beliakova-Bethell,N., Cheng,H.,Dou,Y., Liou,W. and Sandmeyer,S. (2005) Retroviruses and yeastretrotransposons use overlapping sets of host genes. Genome Res.,15, 641–654, 10.1101/gr.3739005.

64. Qian,W., He,X., Chan,E., Xu,H. and Zhang,J. (2011) Measuringthe evolutionary rate of protein-protein interaction. Proc. NatlAcad. Sci. USA, 108, 8725–8730, 10.1073/pnas.1104695108.

65. Dixon,S.J., Costanzo,M., Baryshnikova,A., Andrews,B. andBoone,C. (2009) Systematic mapping of genetic interactionnetworks. Annu. Rev. Genet., 43, 601–625, 10.1146/annurev.genet.39.073003.114751.

3200 Nucleic Acids Research, 2013, Vol. 41, No. 5

at University of M

ichigan on April 16, 2013

http://nar.oxfordjournals.org/D

ownloaded from