Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Diversity of Virophages in Metagenomic Data Sets
Jinglie Zhou,a,b,c Weijia Zhang,b Shuling Yan,d Jinzhou Xiao,a,b Yuanyuan Zhang,b,c Bailin Li,a,b Yingjie Pan,a,b Yongjie Wanga,b
Laboratory of Quality and Safety Risk Assessment for Aquatic Products on Storage & Preservation (Shanghai), Ministry of Agriculture, Shanghai, Chinaa; College of FoodScience and Technology, Shanghai Ocean University, Shanghai, Chinab; Department of Biological Science, College of Sciences and Mathematics, Auburn University,Auburn, Alabama, USAc; Institute of Biochemistry and Molecular Cell Biology, University of Goettingen, Goettingen, Germanyd
Virophages, e.g., Sputnik, Mavirus, and Organic Lake virophage (OLV), are unusual parasites of giant double-strandedDNA (dsDNA) viruses, yet little is known about their diversity. Here, we describe the global distribution, abundance, andgenetic diversity of virophages based on analyzing and mapping comprehensive metagenomic databases. The results reveala distinct abundance and worldwide distribution of virophages, involving almost all geographical zones and a variety ofunique environments. These environments ranged from deep ocean to inland, iced to hydrothermal lakes, and human gut-to animal-associated habitats. Four complete virophage genomic sequences (Yellowstone Lake virophages [YSLVs]) wereobtained, as was one nearly complete sequence (Ace Lake Mavirus [ALM]). The genomes obtained were 27,849 bp long with26 predicted open reading frames (ORFs) (YSLV1), 23,184 bp with 21 ORFs (YSLV2), 27,050 bp with 23 ORFs (YSLV3),28,306 bp with 34 ORFs (YSLV4), and 17,767 bp with 22 ORFs (ALM). The homologous counterparts of five genes, includ-ing putative FtsK-HerA family DNA packaging ATPase and genes encoding DNA helicase/primase, cysteine protease, ma-jor capsid protein (MCP), and minor capsid protein (mCP), were present in all virophages studied thus far. They alsoshared a conserved gene cluster comprising the two core genes of MCP and mCP. Comparative genomic and phylogeneticanalyses showed that YSLVs, having a closer relationship to each other than to the other virophages, were more closely re-lated to OLV than to Sputnik but distantly related to Mavirus and ALM. These findings indicate that virophages appear tobe widespread and genetically diverse, with at least 3 major lineages.
Virophages, a group of circular double-stranded DNA (dsDNA)viruses, are icosahedral in shape and approximately 50 to 100
nm in size (1–4). Virophages have three unique features (2). First,the nuclear phase is absent during the infection cycle of vi-rophages. Second, the replication of virophages takes place in aviral factory of the giant host DNA viruses. Third, they depend onenzymes from host viruses instead of host cells. Accordingly, vi-rophages are considered to be parasites of giant DNA viruses, e.g.,mimiviruses and phycodnaviruses (1–3). Giant DNA viruses pos-sess huge genome sizes (up to �1,259 kb), some of which are evenlarger than those of certain bacteria (5–7). The infection andpropagation of virophages lead to a significant decrease in hostvirus particles and, consequently, an increase in host cell survival(1–3). Additionally, exchanges of genes may occur between vi-rophages and giant DNA viruses (1–3, 8, 9). Therefore, virophagesare potential mediators of lateral gene transfer between large DNAviruses (8, 9).
Thus far, four virophages have been identified in distinctlocations (Table 1). The first reported virophage, Sputnik, wasisolated from an Acanthamoeba species infected with the largemamavirus in a water-cooling tower in Paris, France (2). Thesecond virophage, Mavirus, was observed in a marinephagotrophic flagellate (Cafeteria roenbergensis) in the pres-ence of the host virus, Cafeteria roenbergensis virus, originatingfrom the coastal waters of Texas (1, 10). The third virophage,Organic Lake virophage (OLV), discovered in a hypersalinemeromictic lake in Antarctica, is thought to parasitize largeDNA viruses infecting microalgae (3, 11). At the time of thisreport, a fourth virophage, Sputnik 2, together with its hostvirus, Lentille, has been detected in the contact lens solution ofa patient with keratitis in France (12). The fact that virophagesexist in a wide range of virus and eukaryotic hosts, as well as in
a variety of unique habitats, implies the possibility that they aremore widely distributed and diverse than previously thought.
To obtain greater insight into the unusual diversity of theglobal distribution and abundance of virophages, in this study,metagenomic databases on the Community cyberinfrastruc-ture for Advanced Microbial Ecology Research and Analysis(CAMERA) 2.0 Portal (https://portal.camera.calit2.net/) (13)were analyzed comprehensively. Four complete genomic se-quences of virophages and one nearly complete sequence wereassembled based on the metagenomic DNA sequences of Yel-lowstone Lake, Wyoming, and Ace Lake, Antarctica. Compar-ative genomics and phylogenetic analyses were performed inorder to better understand the genomic sequence features,phylogeny, and evolution of virophages.
MATERIALS AND METHODSAnalysis of metagenomic databases. The gene sequences of the threeknown virophages, Sputnik, Mavirus, and OLV (Sputnik 2 was ex-cluded in the analysis since it was a new strain of Sputnik), were down-loaded from the NCBI genome database and blasted against the NCBInr database. The genomic sequence of another Sputnik, strain 3, wasalso available in GenBank; however, because Sputnik 2 and Sputnik 3actually have the same sequence, Sputnik 3 was also not included in theanalysis. Genes showing blastp hits to virophages only or no hits (E-
Received 10 December 2012 Accepted 5 February 2013
Published ahead of print 13 February 2013
Address correspondence to Yongjie Wang, [email protected].
Supplemental material for this article may be found at http://dx.doi.org/10.1128/JVI.03398-12.
Copyright © 2013, American Society for Microbiology. All Rights Reserved.
doi:10.1128/JVI.03398-12
April 2013 Volume 87 Number 8 Journal of Virology p. 4225–4236 jvi.asm.org 4225
on January 6, 2019 by guesthttp://jvi.asm
.org/D
ownloaded from
value�10�5) were considered virophage-specific marker genes andwere used to evaluate the global distribution and abundance of vi-rophages. The genes were searched (tblastx, E-value�10�5) againstdatabases of all metagenomic pyrosequencing reads and all Sangerreads on the CAMERA 2.0 Portal. The screened virophage-related se-quences were further confirmed based on a blast similarity searchagainst the NCBI nr databases. Mapping of the global distributionpattern of virophages was visualized through MapInfo Professional(version 11.0; Pitney Bowes Software, Inc.). The abundance of vi-rophages is presented as the ratio of the number of virophage-likesequences in a given metagenomic data set and the total number ofsequences in that respective data set, normalized to 1,000,000.
Analysis of virophage conserved genes. All gene sequences of vi-rophages Sputnik, Mavirus, and OLV were compared to the NCBI nrdatabase using both blastp and PSI-BLAST searches (14, 15). Homolo-gous genes shared among these three virophages were considered to beconserved. Their sequence similarities were also proofed based on multi-ple sequence alignment using MUSCLE (16) on Geneious Pro (version5.5.7; Biomatters Ltd.).
Assembling of genomic sequences of new virophages. Major capsidprotein (MCP), the homolog of MV18 (Mavirus), V20 (Sputnik), andOLV09 (OLV), was searched (tblastx, E-value�10�5) against all met-agenomic pyrosequencing read databases and all Sanger read data-bases on the CAMERA 2.0 Portal. Sequences significantly similar tothese three MCPs were screened, downloaded, and treated as vi-rophage MCP-related sequences. Subsequently, they were assembledto obtain MCP-related contigs. Each contig served as a reference se-quence to which all reads from the corresponding metagenomic data-base were assembled. Once an extended sequence with a relativelylonger size and higher coverage was obtained after assembly, it wasused as the next reference to assemble all reads from metagenomicdatabases. This procedure was repeated until the assembled sequencestopped extending. If there was a repeat region of approximately 100bp at both ends of the sequence obtained, it was eventually self-assem-bled to a circular DNA sequence. All sequence assemblies were per-formed using Geneious Pro. The sequence assembly parameters usedin this study were a minimum overlap of 25 bp with �90% sequenceidentity, as well as 50% maximum mismatches per read.
Prediction and annotation of ORFs. The prediction and annotationof virophage open reading frames (ORFs) followed the procedures de-scribed in the literature (17, 18). Each predicted ORF encompassed a startcodon of ATG, minimum size of 135 bp, standard genetic code, and a stopcodon. The blastp, tblastx, and PSI-BLAST programs were used for se-quence similarity comparisons of the predicted ORFs to NCBI nr data-bases (14, 15). A local database that contained the translated protein se-
quences of all predicted ORFs in Sputnik, Mavirus, and OLV, as well as thefive new virophages described in this study, was also included in the blastsearch. ORFs were searched for characteristic sequence signatures usingthe InterProScan program (19).
Phylogenetic analysis. Amino acid sequences were aligned usingMUSCLE (16), and the phylogenetic trees were reconstructed by usingPhyML (version 3.0, Méthodes et Algorithmes pour la Bioinforma-tique, LIRMM, CNRS—Université de Montpellier; http://www.atgc-montpellier.fr/phyml/) (20).
Nucleotide sequence accession numbers. The genomic sequences ofthe four Yellowstone Lake virophages (YSLVs) and Ace Lake Mavirus(ALM) have been deposited in GenBank under the accession numbersKC556924 (YSLV1), KC556925 (YSLV2), KC556926 (YSLV3), KC556922(YSLV4), and KC556923 (ALM).
RESULTS AND DISCUSSIONDiversity of global distribution and abundance of virophages.The blast similarity search (E-value�10�5) indicated that atotal of 44 ORFs turned out to be virophage-specific markergenes, comprising 16 ORFs of Sputnik, 13 of Mavirus, and 15 ofOLV (Table 2). These genes were used as query sequencesand searched against all metagenomic data deposited in theCAMERA database. The CAMERA database is a web-basedanalysis portal that allows for depositing, locating, analyzing,visualizing, and sharing microbial data obtained from variousenvironments, such as marine, soil, freshwater, wastewater, hotsprings, animal hosts, and other habitats (13). Therefore, thegeneral tendency of the global distribution and abundance ofvirophages can be predicted according to the virophage-relatedsequence information of blast hits provided by the CAMERA2.0 Portal. The search found 1,766 pyrosequencing reads and
TABLE 1 Features of virophages
Virophage Location
Host
Genome
Virus EukaryoteSize(bp)
No. ofORFs
C�Gcontent (%)
Sputnik A cooling tower in Paris, France Acanthamoeba polyphagamimivirus
A. polyphaga 18,343 21 27.0
Mavirus Coastal waters of Texas Cafeteria roenbergensis virus Marine phagotrophic flagellate(C. roenbergensis)
19,063 20 30.3
OLV Organic Lake, a hypersalinemeromictic lake in Antarctica
Large DNA viruses Prasinophytes (phototrophic algae) 26,421 26 39.1
Sputnik 2 Contact lens fluid of a patientwith keratitis, France
Lentille virus A. polyphaga 18,338 20 28.5
YSLV1 Yellowstone Lake Phycodna- or mimiviruses? Microalgae? 27,849 26 33.4YSLV2 Yellowstone Lake Phycodna- or mimiviruses? Microalgae? 23,184 21 33.6YSLV3 Yellowstone Lake Phycodna- or mimiviruses? Microalgae? 27,050 23 34.9YSLV4 Yellowstone Lake Phycodna- or mimiviruses? Microalgae? 28,306 34 37.2ALM Ace Lake in Antarctica mimiviruses? Phagotrophic protozoan? 17,767 22 26.7
TABLE 2 Virophage-specific genes
Virophage Genes
Sputnik V01, V02, V03, V04, V05, V07, V08, V09, V14, V15, V16, V17,V18, V19, V20, V21
Mavirus MV04, MV05, MV07, MV08, MV09, MV10, MV11, MV12,MV14, MV15, MV16, MV17, MV18
OLV OLV01, OLV02, OLV03, OLV04, OLV05, OLV06, OLV07,OLV08, OLV09, OLV10, OLV11, OLV15, OLV21, OLV24,OLV26
Zhou et al.
4226 jvi.asm.org Journal of Virology
on January 6, 2019 by guesthttp://jvi.asm
.org/D
ownloaded from
204 Sanger reads related to Sputnik, 203 pyrosequencing readsand 253 Sanger reads akin to Mavirus, and more than 50,000pyrosequencing reads and Sanger reads similar to OLV (seeTable S1 in the supplemental material). The redundant readswere incorporated and removed. Finally, 23,599 virophage-re-lated sequences were obtained. Among them, 148 were Mavirushits, 812 were Sputnik hits, and 22,639 were OLV hits, account-ing for 95% of the total sequences associated with virophages(23,599). It appeared that OLV and its relatives were moreabundant than Sputnik and Mavirus virophages in the environ-ments.
As depicted in Figure 1, virophages were distributed widelythroughout the world, including almost all geographical zones.The habitats of virophages were also localized in a variety ofenvironments, ranging from the deep ocean to inland (Fig. 2).The abundance of virophages tended to increase from theocean to land environments, was the highest in freshwater hab-itats, and was relatively greater in ocean sediment than in deepseawater (Fig. 2A). As for vertical distribution, in general, vi-rophage abundance decreased with the increase in ocean depth(Fig. 2B). The epipelagic zone seemed to be enriched with vi-rophages. This was probably because this illuminated zone atthe surface of the sea is colonized by the most living organismsin the sea. Interestingly, although there is a large differencebetween the conditions of the abyssopelagic and the mesope-lagic zones, it seemed that the numbers of virophage-related
sequences observed in these two zones were quite similar(Fig. 2B). Whether real virophage enrichment was present inthe abyssopelagic zone or whether it was a result of the vi-rophage-infected host viruses and/or host cells settling to thedeep sea remains to be studied further. In terms of geographicalzones, the frigid zones turned out to have the greatest abun-dance of virophages, followed by the tropical zones (Fig. 2C).Obvious limitations and biases of the data deposited in CAMERAexist, and caution should be taken during attempts to interpret theglobal distribution and abundance of virophages. However, thesefindings open a new window into further exploration and survey ofthe diversity of unique virophages worldwide.
In addition, unexpectedly, a small number of virophage-related sequences was detected in nonaquatic environments,e.g., 65 sequences from the human gut, 11 from animal-asso-ciated habitats, 7 from soils, 4 from glacier metagenomes, and1 from air in the East Coast of Singapore. Thus far, little isknown with regard to such unusual diversity (21). Taken to-gether, comparative analyses of metagenomic databases re-vealed the global distribution and distinct abundance of vi-rophage-related sequences, which suggested that virophagesare common entities on Earth. Large-scale sampling and anal-yses are necessary to obtain a complete picture of the diversityof virophages.
Four complete genomes of Yellowstone Lake virophages andone nearly complete genome of Ace Lake Mavirus. Major capsid
FIG 1 Geographic distribution and corresponding abundance of virophages. Colored dots indicate distinct abundances of virophages in metagenomic data setsobtained from a specific area of latitude and longitude (see Table S1 in the supplemental material). Abundance was normalized to 1,000,000.
Diversity of Virophages
April 2013 Volume 87 Number 8 jvi.asm.org 4227
on January 6, 2019 by guesthttp://jvi.asm
.org/D
ownloaded from
protein is generally considered to be a conserved proteinamong viruses, and it is widely used to reconstruct phyloge-netic trees. It was also conserved in virophages, based on blastsequence similarity searches and sequence alignment. In ourstudy, four complete virophage genomes and one nearly com-plete virophage genome were obtained from two metagenomicdatabases named Yellowstone Lake: Genetic and Gene Diver-sity in a Freshwater Lake and Antarctica Aquatic MicrobialMetagenome, which were downloaded from the CAMERA 2.0Portal. These virophages were tentatively named YSLV1,YSLV2, YSLV3, YSLV4, and ALM. Detailed results of the met-agenome assembly, i.e., genome coverage, the number of readsrecruited to each genome, and the size of the data sets fromwhich the metagenomes originated, are shown in Table 3; seealso Figures S1 and S2 in the supplemental material.
They were all dsDNA viruses, with G�C contents of 33.4%
(YSLV1), 33.6% (YSLV2), 34.9% (YSLV3), 37.2% (YSLV4), and26.7% (ALM) (Table 1). Their genomes were 27,849 bp in lengthwith 26 predicted ORFs (YSLV1), 23,184 bp with 21 predictedORFs (YSLV2), 27,050 bp with 23 predicted ORFs (YSLV3),28,306 bp with 34 predicted ORFs (YSLV4), and 17,767 bp with 22
FIG 2 Abundance of virophages in different environments (A), ocean depths (B), and latitudes (C). Abundance was normalized to 1,000,000.
TABLE 3 Data on metagenomic assemblies of the five new virophages
Name
No. ofreadsrecruitedto eachgenome
No. ofidenticalsites
Pairwiseidentity(%)
Genome coverage Size ofdataset(Gb)Mean Minimum Maximum
YSLV1 5,544 22,271 98.0 67.9 16 127 11.1YSLV2 834 21,453 97.7 13.1 3 27 11.1YSLV3 1,098 25,529 98.2 15.1 3 35 11.1YSLV4 1,119 25,732 97.2 14.5 4 32 11.1ALM 494 13,654 95.4 14.4 4 26 32.4
Zhou et al.
4228 jvi.asm.org Journal of Virology
on January 6, 2019 by guesthttp://jvi.asm
.org/D
ownloaded from
predicted ORFs (ALM) (Table 1 and Fig. 3 and 4). The YSLVs andOLV were generally alike in genome size, number of ORFs, andG�C content (Table 1).
Among 126 predicted ORFs from these five new virophages,
59 showed significant similarity to 33 of 67 ORFs of threeknown virophages, 11 showed similarity to the nucleocytoplas-mic large DNA viruses (NCLDs) of eukaryotes (including phy-codnaviruses, Marseilleviruses, and mimiviruses), and 3
FIG 3 Circular maps of the complete genomes of Yellowstone Lake virophages. Homologous genes are indicated in the same color, the five conserved genes are labeledwith red asterisks, and the inner circles represent G�C content plots. The dashed-line boxes represent the conserved gene cluster in all eight virophages, the dotted-lineboxes represent the gene cluster shared by YSLVs 2, 3, and 4 and OLV, and the dash-dot-dot–line boxes represent the gene cluster present in YSLVs 3 and 4.
FIG 4 Linear genomic map of ALM and Mavirus. Homologous genes are shown in the same color, while syntenic regions are presented in green, light blue, andorange. The five virophage conserved genes are labeled with red asterisks, and the conserved gene cluster is marked with dashed-line boxes.
Diversity of Virophages
April 2013 Volume 87 Number 8 jvi.asm.org 4229
on January 6, 2019 by guesthttp://jvi.asm
.org/D
ownloaded from
TA
BLE
4O
RFs
and
thei
rh
omol
ogs
pred
icte
din
YSL
Vs
and
ALM
Vir
oph
age,
OR
F
Pos
itio
nL
engt
ha
Bes
tbl
astp
hit
inG
enB
ank
nr
data
base
and/
orvi
rus
data
set
NC
BI
con
serv
eddo
mai
n(i
den
tifi
er,E
-val
ue,
alig
nm
ent
len
gth
inaa
,al
ign
men
tpo
siti
on[s
tart
–en
d])
Inte
rpro
scan
mat
ches
(ide
nti
fier
,E-v
alu
e[s]
,al
ign
men
tpo
siti
on[s
][s
tart
–en
d])
Star
tE
nd
nt
aaO
RF,
prot
ein
enco
ded,
orm
ass
Spec
ies
Acc
essi
onn
o.E
-val
ue
%aa
iden
tity
Alig
nm
ent
len
gth
inaa
(pos
itio
nst
art–
end)
YSL
V1
11
771
771
256
Hyp
oth
etic
alpr
otei
nO
LV
4O
rgan
icLa
kevi
roph
age
AD
X05
765
2.01
E�
7951
246
(1–2
46)
AA
Ado
mai
n(p
fam
1340
1,4.
46E�
04,1
24,5
7–15
8)P
loop
-con
tain
ing
nu
cleo
side
trip
hos
phat
eh
ydro
lase
s(S
SF52
540,
2.0E
�6,
49–2
26)
296
876
820
166
313
8710
4933
911
2
414
7537
752,
301
766
Pu
tati
veD
NA
prim
ase/
poly
mer
ase
Org
anic
Lake
viro
phag
eA
DX
0578
47.
79E�
2623
.962
6(5
0–67
5)P
loop
-con
tain
ing
nu
cleo
side
trip
hos
phat
eh
ydro
lase
s(S
SF52
540,
6.5E
�13
,48
6–66
3)5
3959
3825
135
44SP
(1–1
7),T
M(4
–22)
640
3744
2939
313
07
4563
5129
567
188
VP
11M
icro
mon
aspu
silla
reov
iru
sY
P_6
5455
45.
95E�
1524
146
(4–1
49)
TM
(39–
5969
-91)
852
7589
553,
681
1,22
6A
TH
OO
K(P
R00
929,
6.8E
�6,
6.8E
�6,
6.8E
�6,
345–
355,
380–
391,
417–
427)
990
5197
2867
822
5H
ypot
het
ical
prot
ein
MV
06M
avir
us
YP
_004
3002
840.
004
3484
(48–
131)
N-t
erm
inal
cata
lyti
cdo
mai
nof
GIY
-YIG
intr
onen
don
ucl
ease
I-T
evI,
I-B
moI
,I-B
anI,
I-B
thII
,an
dsi
mila
rpr
otei
ns
(cd1
0437
,3.3
8E�
03,9
0,48
–129
)
Alp
ha/
beta
-hyd
rola
ses
(SSF
5347
4,1.
1E�
8,15
9–27
3)10
9816
1074
292
730
8H
ypot
het
ical
prot
ein
OL
V11
Org
anic
Lake
viro
phag
eA
DX
0577
22.
83E�
5637
.329
9(7
–305
)11
1199
910
743
1,25
741
812
1220
113
709
1,50
950
213
1374
014
087
348
115
1414
199
1435
415
651
1514
586
1433
525
283
1615
090
1475
833
311
017
1526
015
084
177
5818
1556
715
905
339
112
1915
974
1646
248
916
220
1675
516
588
168
55SP
(1–2
5)21
1676
516
935
171
5622
2030
216
940
3,36
311
20U
nn
amed
prot
ein
prod
uct
Pae
niba
cillu
ssp
.JD
R-2
YP
_003
0101
914.
00E�
5625
834
(114
–947
)H
ypot
het
ical
prot
ein
Cya
nop
hag
eN
AT
L2A
-133
8.00
E�
3431
355
(380
–734
)23
2039
520
967
573
190
Hyp
oth
etic
alpr
otei
nO
LV
7O
rgan
icLa
kevi
roph
age
AD
X05
768
8.35
E�
2131
.616
8(2
2–18
9)24
2093
921
115
177
58SP
(1–3
1),T
M(5
–25)
2522
991
2112
01,
872
623
Maj
orca
psid
prot
ein
Org
anic
Lake
viro
phag
eA
DX
0577
01.
00E�
3227
579
(1–5
79)
2625
733
2313
32,
601
866
Pu
tati
vem
inor
caps
idpr
otei
nO
rgan
icLa
kevi
roph
age
AD
X05
769
4.00
E�
1529
205
(643
–847
)P
uta
tive
isom
eras
eY
bhE
(SSF
1019
08,1
.6E�
10,
119–
433)
2727
321
2588
81,
434
477
Pu
tati
vem
inor
caps
idpr
otei
nO
rgan
icLa
kevi
roph
age
AD
X05
769
2.00
E�
1534
125
(332
–456
)28
2774
127
427
315
104
Tlr
6Fp
prot
ein
Tet
rahy
men
ath
erm
ophi
laA
F451
864_
65.
28E�
1032
.580
(25–
104)
Hyp
oth
etic
alpr
otei
nM
AR
_433
Mar
seill
evir
us
YP
_003
4071
577.
00E�
1335
78(9
–86)
YSL
V2
11
765
765
254
Hyp
oth
etic
alpr
otei
nO
LV
4O
rgan
icLa
kevi
roph
age
AD
X05
765
1.30
E�
4840
.824
2(3
–244
)2
807
1322
516
171
Hyp
oth
etic
alpr
otei
nP
RSM
4_06
2P
roch
loro
cocc
usph
age
P-R
SM4
YP
_004
3231
911.
85E�
0840
70(1
2–81
)3
1570
1319
252
834
1691
2725
1,03
534
4H
ypot
het
ical
prot
ein
OL
V11
Org
anic
Lake
viro
phag
eA
DX
0577
22.
32E�
4033
.929
7(1
9–31
5)5
2839
4758
1,92
063
9P
uta
tive
AT
P-d
epen
den
tR
NA
hel
icas
eA
cant
ham
oeba
poly
phag
am
imiv
iru
sY
P_0
0398
7051
2.63
E�
3229
.844
4(1
79–6
22)
DE
AD
-lik
eh
elic
ases
supe
rfam
ily(s
mar
t004
87,
4.48
E�
10,1
77,1
74–3
50)
Hel
icas
e_C
(PF0
0271
,2.
4E�
6,55
1–60
6)
SNF2
fam
ilyN
-ter
min
aldo
mai
n(p
fam
0017
6,1.
19E�
09,2
38,1
82–4
19)
DE
AD
-lik
eh
elic
ases
supe
rfam
ily(S
M00
487,
0.00
13,1
75–3
78)
Hel
icas
esu
perf
amily
C-
term
inal
dom
ain
(cd0
0079
,1.5
2E�
04,1
45,
461–
605)
Plo
op-c
onta
inin
gn
ucl
eosi
detr
iph
osph
ate
hyd
rola
ses
(SSF
5254
0,8.
7E�
23,
2.3E
�20
,137
–378
,38
5–63
6)
Zhou et al.
4230 jvi.asm.org Journal of Virology
on January 6, 2019 by guesthttp://jvi.asm
.org/D
ownloaded from
649
9277
722,
781
926
Cte
rmin
us:
hyp
oth
etic
alpr
otei
n16
2275
902
Org
anic
Lake
phyc
odn
avir
us
2A
DX
0640
56.
07E�
0750
66(8
34–8
99)
Met
hyl
tran
sfer
ase
dom
ain
(pfa
m13
659,
2.52
E�
07,
111,
151–
261)
S-A
den
osyl
-L-m
eth
ion
ine-
depe
nde
nt
met
hyl
tran
sfer
ases
(SSF
5333
5,3.
3E�
10,
156–
307)
778
6280
8622
574
TM
(34–
52)
885
3881
6437
512
49
9182
8628
555
184
1096
5612
484
2,82
994
2H
elic
ase
Aca
ntha
moe
baca
stel
lani
im
amav
iru
sA
EQ
6015
44.
43E�
4430
.546
5(2
95–7
59)
Ori
gin
ofre
plic
atio
nbi
ndi
ng
prot
ein
(pfa
m02
399,
7.47
E�
08,
148,
353–
500)
Plo
op-c
onta
inin
gn
ucl
eosi
detr
iph
osph
ate
hyd
rola
ses
(SSF
5254
0,2.
5E�
8,27
6–52
1)11
1272
414
148
1,42
547
4D
EA
D-l
ike
hel
icas
essu
perf
amily
(sm
art0
0487
,2.
98E�
06,1
79,3
41–5
19)
1214
260
1484
758
819
5H
ypot
het
ical
prot
ein
OL
V7
Org
anic
Lake
viro
phag
eA
DX
0576
81.
07E�
1529
.716
1(3
2–19
2)SP
(1–3
0),T
M(1
5–35
,41–
56,7
7–97
)13
1520
714
911
297
9814
1519
216
394
1,20
340
0P
uta
tive
min
orca
psid
prot
ein
Org
anic
Lake
viro
phag
eA
DX
0576
93.
00E�
1924
.939
3(1
–393
)15
1648
918
243
1,75
558
4P
uta
tive
caps
idpr
otei
nV
20Sp
utn
ikvi
roph
age
YP
_002
1223
812.
15E�
3726
.155
4(1
0–56
3)16
1872
618
424
303
100
SP(1
–19)
,TM
(4–2
2)17
1882
619
386
561
186
1819
764
2037
861
520
4H
ypot
het
ical
prot
ein
MV
08M
avir
us
YP
_004
3002
860.
4632
96(8
8–18
3)19
2046
021
449
990
329
2021
536
2184
130
610
1H
ypot
het
ical
prot
ein
Par
amec
ium
burs
aria
chlo
rella
viru
s1
NP
_048
469
3.43
E�
1133
.880
(7–8
6)
2121
902
2311
612
1540
4N
term
inu
s:h
ypot
het
ical
prot
ein
OL
V5
Org
anic
Lake
viro
phag
eA
DX
0576
61.
24E�
0824
.619
7(2
5–22
1)
YSL
V3
11
765
765
254
Hyp
oth
etic
alpr
otei
nO
LV
4O
rgan
icLa
kevi
roph
age
AD
X05
765
7.58
E�
5440
248
(4–2
51)
295
076
218
962
397
613
0833
311
04
1772
1311
462
153
SP(1
–18)
525
2820
1051
917
2H
ypot
het
ical
prot
ein
OL
V7
Org
anic
Lake
viro
phag
eA
DX
0576
84.
81E�
1430
.516
4(5
–168
)6
3442
2606
837
278
Hyp
oth
etic
alpr
otei
nO
LV
12O
rgan
icLa
kevi
roph
age
AD
X05
773
6.04
E�
2755
.711
3(1
35–2
47)
740
2035
2349
816
5H
ypot
het
ical
prot
ein
HM
PR
EF9
628_
0128
2E
ubac
teri
acea
eba
cter
ium
CM
5Z
P_0
9316
646
2.85
E�
1332
.215
8(7
–164
)Si
te-s
peci
fic
DN
Am
eth
ylas
e(C
OG
0338
,1.2
2E�
16,
160,
5–16
4)P
uta
tive
mod
ifica
tion
met
hyl
ase
Dpn
IIA
Clo
stri
dium
phag
eph
iSM
101
YP
_699
979
3.00
E�
1132
160
(5–1
64)
DN
Aad
enin
em
eth
ylas
e(T
IGR
0057
1,6.
15E�
11,
154,
5–15
8)
S-A
den
osyl
-L-m
eth
ion
ine-
depe
nde
nt
met
hyl
tran
sfer
ases
(SSF
5333
5,5.
6E�
21,
5–16
5)8
6254
3954
2,30
176
6D
12cl
ass
N6
aden
ine-
spec
ific
DN
Am
eth
yltr
ansf
eras
e(p
fam
0208
6,2.
44E�
07,
152,
13–1
64)
972
4963
1793
331
0H
ypot
het
ical
prot
ein
OL
V11
Org
anic
Lake
viro
phag
eA
DX
0577
21.
35E�
4839
.428
5(1
6–30
0)
1077
1573
1140
513
4
1110
417
7820
2,59
886
5D
5-A
TP
ase-
hel
icas
e,pa
rtia
lM
oum
ouvi
rus
och
anA
EY
9929
81.
27E�
3329
.842
6(2
87–7
12)
Ph
age/
plas
mid
prim
ase,
P4
fam
ily,C
-ter
min
aldo
mai
n(T
IGR
0161
3,1.
40E�
22,2
93,4
95–7
87)
D5_
N(P
F087
06,1
.4E�
11,
390–
542)
D5
Nte
rmin
us-
like
(pfa
m08
706,
2.13
E�
06,
89,4
34–5
22)
Pri
CT
_2(P
F087
07,1
.2E�
4,27
9–34
5)
1211
103
1060
050
416
7H
ypot
het
ical
prot
ein
Mav
iru
sY
P_0
0430
0284
5.79
E�
0247
.251
(24–
74)
Ph
age-
asso
ciat
edD
NA
prim
ase
(CO
G33
78,
1.08
E�
22,3
79,3
94–7
72)
1311
975
1130
766
922
214
1203
314
423
2,39
179
615
1445
014
650
201
66SP
(1–2
5),T
M(5
–25)
1614
726
1731
12,
586
861
1717
402
1772
232
110
618
1806
519
318
1,25
441
7P
uta
tive
min
orca
psid
prot
ein
Org
anic
Lake
viro
phag
eA
DX
0576
94.
26E�
1225
.238
6(2
6–41
1)19
1941
221
148
1,73
757
8P
uta
tive
caps
idpr
otei
nV
20Sp
utn
ikvi
roph
age
YP
_002
1223
814.
79E�
3926
.854
4(9
–552
)20
2176
122
132
372
123
(Con
tin
ued
onfo
llow
ing
page
)
Diversity of Virophages
April 2013 Volume 87 Number 8 jvi.asm.org 4231
on January 6, 2019 by guesthttp://jvi.asm
.org/D
ownloaded from
TA
BLE
4(C
onti
nu
ed)
Vir
oph
age,
OR
F
Pos
itio
nL
engt
ha
Bes
tbl
astp
hit
inG
enB
ank
nr
data
base
and/
orvi
rus
data
set
NC
BI
con
serv
eddo
mai
n(i
den
tifi
er,E
-val
ue,
alig
nm
ent
len
gth
inaa
,al
ign
men
tpo
siti
on[s
tart
–en
d])
Inte
rpro
scan
mat
ches
(ide
nti
fier
,E-v
alu
e[s]
,al
ign
men
tpo
siti
on[s
][s
tart
–en
d])
Star
tE
nd
nt
aaO
RF,
prot
ein
enco
ded,
orm
ass
Spec
ies
Acc
essi
onn
o.E
-val
ue
%aa
iden
tity
Alig
nm
ent
len
gth
inaa
(pos
itio
nst
art–
end)
2122
204
2386
81,
665
554
Cte
rmin
us:
hyp
oth
etic
alpr
otei
nO
LV10
Org
anic
Lake
viro
phag
eA
DX
0577
10.
1327
133
(421
–553
)22
2394
226
155
2,21
473
723
2619
627
023
828
275
Hyp
oth
etic
alpr
otei
nO
LV
5O
rgan
icLa
kevi
roph
age
AD
X05
766
1.98
E�
1327
.619
0(2
6–21
5)
YSL
V4
11
768
768
255
Hyp
oth
etic
alpr
otei
nO
LV
4O
rgan
icLa
kevi
roph
age
AD
X05
765
3.10
E�
6744
248
(1–2
48)
283
119
2210
9236
33
1982
2317
336
111
423
4133
0096
031
9R
ibon
ucl
eosi
de-d
iph
osph
ate
redu
ctas
esm
alls
ubu
nit
Pha
eocy
stis
glob
osa
viru
s12
TA
ET
7295
73.
39E�
136
60.5
317
(3–3
19)
Rib
onu
cleo
tide
redu
ctas
e,R
2/be
tasu
bun
it,f
erri
tin
-lik
edi
iron
-bin
din
gdo
mai
n(c
d010
49,
9.46
E�
102,
274,
13–2
86)
Rib
onu
c_re
d_sm
(PF0
0268
,1.
0E�
98,7
–283
)Fe
rrit
in-l
ike
(SSF
4724
0,3.
1E�
105,
1–30
3)
533
3734
8915
350
SP(1
–43)
,TM
(25–
45)
634
9636
8719
263
SP(1
–21)
,TM
(5–2
5)7
3730
4248
519
172
Hyp
oth
etic
alpr
otei
nP
RSM
4_06
2P
roch
loro
cocc
usph
age
P-R
SM4
YP
_004
3231
917.
07E�
0943
.476
(12–
87)
N6_
MT
ASE
(PS0
0092
,�1.
0,76
–82)
846
3843
3930
099
948
2046
8014
146
1051
9655
2533
010
911
5700
8342
2,64
388
0C
term
inu
s:D
5-lik
eh
elic
ase-
prim
ase
Mar
seill
evir
us
YP
_003
4071
839.
04E�
2028
.221
8(4
74–6
91)
Vir
E(P
F052
72,4
.4E�
5,55
4–69
5)P
riC
T_2
(PF0
8707
,1.1
E�
4,30
0–37
5)12
8402
8551
150
49T
M(1
4–32
)13
8602
8802
201
6614
8836
9816
981
326
Hyp
oth
etic
alpr
otei
nO
LV11
Org
anic
Lake
viro
phag
eA
DX
0577
27.
68E�
5942
.129
6(1
1–30
6)15
9862
1030
244
114
616
1087
410
299
576
191
Hyp
oth
etic
alpr
otei
nO
LV7
Org
anic
Lake
viro
phag
eA
DX
0576
88.
06E�
2737
.214
7(3
7–18
3)17
1099
111
410
420
139
1811
412
1185
544
414
719
1188
012
311
432
143
2012
462
1266
220
166
2112
890
1407
41,
185
394
Pu
tati
vem
inor
caps
idpr
otei
nO
rgan
icLa
kevi
roph
age
AD
X05
769
3.20
E�
4031
.235
9(2
7–38
5)22
1417
316
026
1,85
461
7M
ajor
caps
idpr
otei
nO
rgan
icLa
kevi
roph
age
AD
X05
770
6.10
E�
4828
569
(2–5
70)
2316
143
1817
02,
028
675
2418
232
1955
71,
326
441
2521
005
1974
01,
266
421
Cte
rmin
us:
hyp
oth
etic
alpr
otei
nO
LV12
Org
anic
Lake
viro
phag
eA
DX
0577
38.
69E�
3842
.220
1(2
09–4
09)
Alp
ha/
beta
-hyd
rola
ses
(SSF
5347
4,1.
3E�
9,23
8–36
0)N
term
inu
s:h
ypot
het
ical
prot
ein
OLV
12O
rgan
icLa
kevi
roph
age
AD
X05
773
2.19
E�
1936
194
(14–
207)
2621
151
2183
468
422
7C
term
inu
s:h
ypot
het
ical
prot
ein
OL
V2
Org
anic
Lake
viro
phag
eA
DX
0576
39.
46E�
1041
.191
(137
–227
)27
2246
621
915
552
183
2823
048
2268
636
312
029
2406
623
677
390
129
3024
923
2436
056
418
731
2523
025
613
384
127
3225
909
2646
655
818
5N
-Ace
tylt
ran
sfer
ase
GC
N5
Clo
stri
dium
phyt
ofer
men
tans
ISD
gY
P_0
0156
0895
1.03
E�
0427
.782
(67–
148)
Ace
tylt
ran
sfer
ase
fam
ily(p
fam
0058
3,1.
11E�
09,
85,6
2–14
6)
Ace
tylt
ran
sf_1
(PF0
0583
,2.
5E�
10,6
7–14
6)
3326
677
2719
852
217
3A
cyl-
coen
zym
eA
N-
acyl
tran
sfer
ases
(Nat
)(S
SF55
729,
1.2E
�12
,48
–168
)
3427
327
2830
497
832
5H
ypot
het
ical
prot
ein
OL
V5
Org
anic
Lake
viro
phag
eA
DX
0576
63.
04E�
2528
.430
5(1
4–31
8)
Zhou et al.
4232 jvi.asm.org Journal of Virology
on January 6, 2019 by guesthttp://jvi.asm
.org/D
ownloaded from
ALM 1
706
840
135
442
890
2551
1,66
255
3H
ypot
het
ical
prot
ein
Mon
osig
abr
evic
ollis
MX
1X
P_0
0174
3771
5.06
E�
3728
.936
3(8
–370
)P
hag
e/pl
asm
idpr
imas
e,P
4fa
mily
,C-t
erm
inal
dom
ain
(TIG
R01
613,
1.80
E�
05,2
46,1
63–4
08)
Win
ged
hel
ixD
NA
-bin
din
gdo
mai
n(S
SF46
785,
7.9E
�6,
366–
441)
Hig
hly
deri
ved
D5-
like
hel
icas
e-pr
imas
eM
arse
illev
iru
sY
P_0
0340
6787
5.51
E�
1626
.428
3(1
44–4
26)
Plo
op-c
onta
inin
gn
ucl
eosi
detr
iph
osph
ate
hyd
rola
ses
(SSF
5254
0,2.
2E�
6,12
6–29
4)3
2868
2593
276
91P
uta
tive
RV
Esu
perf
amily
inte
gras
eM
avir
us
YP
_004
3002
803.
63E�
0839
.491
(1–9
1)C
hro
mo
(ch
rom
atin
orga
niz
atio
nm
odifi
er)
dom
ain
(pfa
m00
385,
8.54
E�
03,2
1,56
–76)
Ch
rom
atin
orga
niz
atio
nm
odifi
erdo
mai
n(S
M00
298,
8.0E
�10
,35
–90)
436
8930
3365
721
8P
uta
tive
RV
Esu
perf
amily
inte
gras
eM
avir
us
YP
_004
3002
802.
21E�
3641
.119
2(1
–192
)In
tegr
ase
core
dom
ain
(pfa
m00
665,
4.94
E�
07,
114,
46–1
59)
Rve
(PF0
0665
,8.4
E�
16,4
8–15
7)R
Nas
eH
-lik
e(S
SF53
098,
1.6E
�18
,38
–209
)5
3773
5533
1,76
158
6P
uta
tive
prot
ein
-pri
med
B-
fam
ilyD
NA
poly
mer
ase
Mav
iru
sY
P_0
0430
0281
8.21
E�
8835
536
(47–
582)
DN
Apo
lym
eras
ety
peB
,or
gan
ella
ran
dvi
ral
(pfa
m03
175,
1.34
E�
03,
224,
291–
514)
DN
A/R
NA
poly
mer
ases
(SSF
5667
2,1.
2E�
19,
290–
569)
655
7858
5027
390
Hyp
oth
etic
alpr
otei
nM
V04
Mav
iru
sY
P_0
0430
0282
5.9
3145
(2–4
6)7
5884
6315
432
143
880
2763
661,
662
553
Pu
tati
vem
ajor
caps
idpr
otei
nM
V18
Mav
iru
sY
P_0
0430
0296
1.18
E�
7631
.352
2(3
–524
)
989
4780
5789
129
6H
ypot
het
ical
prot
ein
MV
17M
avir
us
YP
_004
3002
951.
85E�
6542
.129
1(1
–291
)10
9511
8984
528
175
Pu
tati
vecy
stei
ne
prot
ease
MV
16M
avir
us
YP
_004
3002
941.
25E�
5452
.217
5(1
–175
)
1110
541
9537
1,00
533
4P
uta
tive
FtsK
-Her
Afa
mily
AT
Pas
eM
V15
Mav
iru
sY
P_0
0430
0293
7.25
E�
8050
266
(69–
334)
AA
A_1
7(P
F132
07,1
.1E�
6,95
–228
)12
1135
410
566
789
262
Hyp
oth
etic
alpr
otei
nM
V14
Mav
iru
sY
P_0
0430
0292
5.20
E�
5543
.625
9(1
–259
)13
1140
612
266
861
286
Nte
rmin
us:
hyp
oth
etic
alpr
otei
nM
V13
Mav
iru
sY
P_0
0430
0291
9.03
E�
0629
.512
9(1
–129
)
1412
160
1293
077
125
6H
ypot
het
ical
prot
ein
MV
12M
avir
us
YP
_004
3002
901.
88E�
6952
.619
6(5
7–25
2)T
M(3
4–52
)15
1295
613
714
759
252
1613
774
1440
663
321
0H
ypot
het
ical
prot
ein
MV
09M
avir
us
YP
_004
3002
870.
2129
97(1
03–1
99)
1714
430
1502
059
119
6H
ypot
het
ical
prot
ein
MV
08M
avir
us
YP
_004
3002
861.
02E�
1248
.384
(112
–195
)18
1579
415
060
735
244
Hyp
oth
etic
alpr
otei
nM
V13
Mav
iru
sY
P_0
0430
0291
7.52
E�
1329
.119
8(2
9–22
6)Li
pase
(cla
ss3)
(cd0
0519
,3.
78E�
08,1
23,9
6–21
8)Li
pase
_3(P
F017
64,2
.1E�
4,13
2–16
2)19
1580
616
093
288
95A
lph
a/be
ta-h
ydro
lase
s(S
SF53
474,
5.1E
�10
,95
–178
)20
1626
117
145
885
294
2117
400
1722
417
758
2217
579
1744
513
544
aaa
,am
ino
acid
s.
Diversity of Virophages
April 2013 Volume 87 Number 8 jvi.asm.org 4233
on January 6, 2019 by guesthttp://jvi.asm
.org/D
ownloaded from
showed similarity to sequences of unicellular eukaryotic organ-isms (marine choanoflagellate Monosiga brevicollis and ciliatedprotozoan Tetrahymena thermophila); 67 ORFs had no se-quence hits to current NCBI databases (Table 4). Given that thevirus and eukaryotic hosts of the virophages obtained in thisstudy may be the NCLDs and the protists mentioned above (ortheir associated relatives), it is conceivable that horizontal genetransfer and/or gene recombination occurred between ancestorvirophages and their viruses, as well as cellular hosts. Such genereplacement traces have been observed in virophages (Sputnik,Mavirus, and OLV) and their hosts (1–3). In addition, signifi-cant sequence similarity (E-value�10�5) was not detected be-tween virophages and any viruses infecting multicellular or-ganisms, which suggested that virophages diverged early andsubsequently underwent a strict and unique evolution withtheir viruses and unicellular eukaryotic hosts.
Conserved genes of virophages. Based on a blastp and PSI-BLAST search against NCBI nr databases and a local databasecomprising all ORFs of eight virophages (five in this study andthree published), five genes were found to be present in all eightvirophages (Table 5). They were putative FtsK-HerA familyDNA packaging ATPase and genes encoding putative DNAhelicase/primase (HEL/PRIM), putative cysteine protease(PRSC), putative MCP, and putative minor capsid protein(mCP). These four genes had blastp hits to virophage genesonly (E-value�10�1), with the exception of HEL/PRIM (Table4). Sequence alignment of these four proteins also revealedunambiguous similarity of amino acids (data not shown).Hence, it is reasonable to define them as virophage conservedcore genes. The HEL/PRIM homolog was predicted accordingto either functional domains or sequence similarity, since sig-nificant sequence similarity was undetectable among some vi-rophage species (Table 4).
Besides these five conserved genes, the four YSLVs shared twoother homologous genes with unknown functions, which werepresent in the OLV as well, but not in Sputnik, Mavirus, or ALM(Table 5 and Fig. 5). Interestingly, in all four YSLVs, homologcounterparts of the conserved genes of ATPase, PRSC, and mCPalways showed the highest sequence similarity to that in OLV (Ta-ble 4); their second and third matches were strictly in the order ofSputnik and Mavirus. In most cases, their blast E-values were�10�5 for Mavirus hits but �10�10 for Sputnik hits. Taken to-gether, these results suggested that the YSLVs were more closelyrelated to OLV than to Sputnik and that they were distantly relatedto Mavirus.
The evolutionary relationship between Mavirus and ALM wasevident, as they shared 13 homologous genes (Table 4 and Fig. 4).Among them, five were virophage conserved genes, three encodedputative GIY-YIG endonuclease, putative rve (integrase coredomain) superfamily integrase, and putative protein-primed B-family DNA polymerase, and five were functionally unknown.Furthermore, three syntenic regions existed between Mavirus and
TABLE 5 Gene homologues present in virophages
Gene product
ORF(s) (size in aaa) in indicated virophage
YSLV1 YSLV2 YSLV3 YSLV4 OLV Sputnik ALM Mavirus
Putative FtsK-HerA family ATPase 01 (256) 01 (254) 01 (254) 01 (255) 04 (256) 03 (245) 11 (334) 15 (310)Putative DNA helicase/primase/polymerase 04 (766) 10 (942) 11 (865) 11 (880) 25 (777) 13 (779) 02 (553) 01 (652)Putative GIY-YIG endonuclease 09 (225) 12 (167) 24 (129) 14 (114) 06 (165)Hypothetical protein 10 (308) 04 (344) 09 (310) 14 (326) 11 (298)Putative cysteine protease 23 (190) 12 (195) 05 (172) 16 (191) 07 (190) 09 (175) 10 (175) 16 (189)Putative major capsid protein 25 (623) 15 (584) 19 (578) 22 (617) 09 (576) 20 (595) 08 (553) 18 (606)Putative minor capsid protein 27 (477), 26 (866) 14 (400) 18 (417) 21 (394) 08 (389) 18 (167), 19 (218) 09 (296) 17 (303)Hypothetical protein 28 (104) 20 (101) 03 (110) 26 (227) 02 (123)Hypothetical protein 02 (171) 07 (172)Hypothetical protein 09 (184) 07 (143)Hypothetical protein 18 (204) 17 (196) 08 (122)Hypothetical protein 21 (404) 23 (275) 34 (325) 05 (290) 21 (438)Hypothetical protein 06 (278) 25 (421) 12 (347)Hypothetical protein 10 (134) 17 (139)Hypothetical protein 21 (554) 10 (236) 12 (262) 14 (271)Putative rve superfamily integrase 03 (91), 04 (218) 02 (358)Putative protein-primed B-family DNA
polymerase05 (586) 03 (617)
Hypothetical protein 06 (90) 04 (112)Hypothetical protein 13 (286), 18 (244) 13 (712)Hypothetical protein 14 (256) 12 (211)Hypothetical protein 16 (210) 09 (190)a nt, nucleotides; aa, amino acids.
FIG 5 Numbers of homologous genes shared among OLV and YSLVs.
Zhou et al.
4234 jvi.asm.org Journal of Virology
on January 6, 2019 by guesthttp://jvi.asm
.org/D
ownloaded from
ALM (Fig. 4); however, two of these regions ran in opposite direc-tions in the two virophages (Fig. 4).
Conserved gene clusters. In this study, a gene cluster (ororder) was considered to be several adjacent genes whose ar-rangement was conserved in some virophages; if present in alleight virophages, it was defined as a conserved gene cluster. Asshown in Figures 3 and 4, a conserved gene cluster, comprisedof the two conserved genes MCP and mCP, was present in alleight virophages. YSLVs 2, 3, and 4 and OLV shared a genecluster consisting of the core gene ATPase and an ORF of un-known function. Furthermore, a gene cluster of the conservedPRIM/HEL gene and an ORF with unknown function was de-tected in YSLVs 3 and 4, and Mavirus and ALM had three geneclusters in common.
Phylogeny and evolution. Three virophage core genes, encod-ing ATPase, PRSC, and MCP, were used to reconstruct the phylo-genetic tree. As shown in Figure 6, three phylogenetic affiliationgroups were observed. YSLVs and OLV seemed to form a group ofclosely related virophages, and Mavirus and ALM were apparently
derived from a common ancestor, whereas Sputnik was an or-phaned group. Such phylogenetic clustering of virophages was inagreement with the findings of the physical features of genomicDNA molecules, conserved genes, and gene orders as mentionedabove. In addition, the phylogenetic trees of MCP and PRSC sug-gested that YSLVs were much closer to each other than to OLV(Fig. 6). This observation was consistent with the local tblastxresults (search against a local database containing all ORFs of theeight virophages) that the best MCP hits of YSLVs were alwaysthemselves. Although it was impossible to shed light on the evo-lutionary relationship between these four YSLVs based on the cur-rent data, YSLVs 3 and 4 appeared to be the closest relatives. Theywere sister lineages on the MCP tree supported by a 70% bootstrapvalue (Fig. 6), shared the largest number of homologous genes(10) (Fig. 5), and had the highest number of gene clusters (three)(Fig. 3).
Habitat diversity of virophages. Though they were moreclosely related to each other than to any other dsDNA virusesknown so far, the habitats of these virophages were extremely
FIG 6 Unrooted phylogenetic trees of DNA packaging ATPases (A), cysteine proteases (B), and major capsid proteins (C) of virophages. The five new virophagesare shown in boldface. The numbers at the branches represent bootstrap values.
Diversity of Virophages
April 2013 Volume 87 Number 8 jvi.asm.org 4235
on January 6, 2019 by guesthttp://jvi.asm
.org/D
ownloaded from
diverse. Mavirus was from the coastal waters of Texas (1). Itsclosest relative ALM, however, was discovered in a hypersalinemeromictic lake, Ace Lake (68°28=49�S, 78°11=19�E), in Ant-arctica. This lake is covered with ice for as long as 11 months toan entire year, with an average temperature of approximately0°C (22). OLV was also found in the neighboring Organic Lakein Antarctica (3). In contrast, YSLVs, close to OLV, were foundin a freshwater lake (Yellowstone Lake) with a temperatureranging from 12 to 73°C in Yellowstone National Park, Wyo-ming (23). Hence, these results indicated that virophages haveadapted to habitats with a wide range of temperature varia-tions.
In conclusion, the distinct abundance and global distribu-tion of virophages, including almost all geographical zones aswell as a variety of environments (ranging from the deep oceanto inland and iced to hydrothermal lakes), indicated that vi-rophages appear to be widespread and genetically diverse, withat least three major lineages. Moreover, the overall low se-quence similarity between the shared homologous genes in vi-rophages and their distant phylogenetic relationships sug-gested that the genetic diversity of virophages is far beyondwhat we know thus far.
ACKNOWLEDGMENTS
This work was supported by The Program for Professor of Special Ap-pointment (Eastern Scholar) grant 20101222 from Shanghai Institutionsof Higher Learning, Shanghai Talent Development Fund grant 2011010from Shanghai Municipal Human Resources and Social Security Bureau,and Science and Technology Development Program grant 10540503000from Shanghai Municipal Science and Technology Commission, China.
We thank two anonymous reviewers for their insightful comments onthe manuscript.
REFERENCES1. Fischer MG, Suttle CA. 2011. A virophage at the origin of large DNA
transposons. Science 332:231–234.2. La Scola B, Desnues C, Pagnier I, Robert C, Barrassi L, Fournous G,
Merchat M, Suzan-Monti M, Forterre P, Koonin E, Raoult D. 2008. Thevirophage as a unique parasite of the giant mimivirus. Nature 455:100 –104.
3. Yau S, Lauro FM, DeMaere MZ, Brown MV, Thomas T, Raftery MJ,Andrews-Pfannkoch C, Lewis M, Hoffman JM, Gibson JA, CavicchioliR. 2011. Virophage control of Antarctic algal host-virus dynamics. Proc.Natl. Acad. Sci. U. S. A. 108:6163– 6168.
4. Sun S, La Scola B, Bowman VD, Ryan CM, Whitelegge JP, Raoult D,Rossmann MG. 2010. Structural studies of the Sputnik virophage. J. Virol.84:894 – 897.
5. Arslan D, Legendre M, Seltzer V, Abergel C, Claverie JM. 2011. DistantMimivirus relative with a larger genome highlights the fundamental fea-tures of Megaviridae. Proc. Natl. Acad. Sci. U. S. A. 108:17486 –17491.
6. Raoult D, Audic S, Robert C, Abergel C, Renesto P, Ogata H, La Scola
B, Suzan M, Claverie JM. 2004. The 1.2-megabase genome sequence ofMimivirus. Science 306:1344 –1350.
7. Van Etten JL, Lane LC, Dunigan DD. 2010. DNA viruses: the really bigones (giruses). Annu. Rev. Microbiol. 64:83–99.
8. Claverie JM, Abergel C. 2009. Mimivirus and its virophage. Annu. Rev.Genet. 43:49 – 66.
9. Raoult D, Boyer M. 2010. Amoebae as genitors and reservoirs of giantviruses. Intervirology 53:321–329.
10. Fischer MG, Allen MJ, Wilson WH, Suttle CA. 2010. Giant virus with aremarkable complement of genes infects marine zooplankton. Proc. Natl.Acad. Sci. U. S. A. 107:19508 –19513.
11. Dunigan DD, Fitzgerald LA, Van Etten JL. 2006. Phycodnaviruses: apeek at genetic diversity. Virus Res. 117:119 –132.
12. Desnues C, La Scola B, Yutin N, Fournous G, Robert C, Azza S, JardotP, Monteil S, Campocasso A, Koonin EV, Raoult D. 2012. Provi-rophages and transpovirons as the diverse mobilome of giant viruses.Proc. Natl. Acad. Sci. U. S. A. 109:18078 –18083.
13. Sun S, Chen J, Li W, Altintas I, Lin A, Peltier S, Stocks K, Allen EE,Ellisman M, Grethe J, Wooley J. 2011. Community cyberinfrastructurefor Advanced Microbial Ecology Research and Analysis: the CAMERAresource. Nucleic Acids Res. 39:D546 –D551.
14. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation ofprotein database search programs. Nucleic Acids Res. 25:3389 –3402.
15. Altschul SF, Wootton JC, Gertz EM, Agarwala R, Morgulis A, SchafferAA, Yu YK. 2005. Protein database searches using compositionally ad-justed substitution matrices. FEBS J. 272:5101–5109.
16. Edgar RC. 2004. MUSCLE: a multiple sequence alignment method withreduced time and space complexity. BMC Bioinformatics 5:113. doi:10.1186/1471-2105-5-113.
17. Wang Y, Bininda-Emonds OR, van Oers MM, Vlak JM, Jehle JA. 2011.The genome of Oryctes rhinoceros nudivirus provides novel insight intothe evolution of nuclear arthropod-specific large circular double-strandedDNA viruses. Virus Genes 42:444 – 456.
18. Wang Y, Kleespies RG, Huger AM, Jehle JA. 2007. The genome ofGryllus bimaculatus nudivirus indicates an ancient diversification of bac-ulovirus-related nonoccluded nudiviruses of insects. J. Virol. 81:5395–5406.
19. Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R,Lopez R. 2005. InterProScan: protein domains identifier. Nucleic AcidsRes. 33:W116 –W120.
20. Guindon S, Lethiec F, Duroux P, Gascuel O. 2005. PHYML Online—aweb server for fast maximum likelihood-based phylogenetic inference.Nucleic Acids Res. 33:W557–W559.
21. Parola P, Renvoise A, Botelho-Nevers E, La Scola B, Desnues C, RaoultD. 2012. Acanthamoeba polyphaga mimivirus virophage seroconversionin travelers returning from Laos. Emerg. Infect. Dis. 18:1500 –1502.
22. Coolen MJL, Hopmans EC, Rijpstra WIC, Muyzer G, Schouten S,Volkman JK, Sinninghe Damsté JS. 2004. Evolution of the methane cyclein Ace Lake (Antarctica) during the Holocene: response of methanogensand methanotrophs to environmental change. Org. Geochem. 35:1151–1167.
23. Clingenpeel S, Macur RE, Kan J, Inskeep WP, Lovalvo D, Varley J,Mathur E, Nealson K, Gorby Y, Jiang H, LaFracois T, McDermott TR.2011. Yellowstone Lake: high-energy geochemistry and rich bacterial di-versity. Environ. Microbiol. 13:2172–2185.
Zhou et al.
4236 jvi.asm.org Journal of Virology
on January 6, 2019 by guesthttp://jvi.asm
.org/D
ownloaded from