6
An overlapping essential gene in the Potyviridae Betty Y.-W. Chung* , W. Allen Miller , John F. Atkins* , and Andrew E. Firth* § *BioSciences Institute, University College Cork, Cork, Ireland; Department of Plant Pathology, 351 Bessey Hall, Iowa State University, Ames, IA 50011; and Department of Human Genetics, University of Utah, Salt Lake City, UT 84112 Communicated by Paul Ahlquist, University of Wisconsin, Madison, WI, February 4, 2008 (received for review October 31, 2007) The family Potyviridae includes >30% of known plant virus spe- cies, many of which are of great agricultural significance. These viruses have a positive sense RNA genome that is 10 kb long and contains a single long ORF. The ORF is translated into a large polyprotein, which is cleaved into 10 mature proteins. We report the discovery of a short ORF embedded within the P3 cistron of the polyprotein but translated in the 2 reading-frame. The ORF, termed pipo, is conserved and has a strong bioinformatic coding signature throughout the large and diverse Potyviridae family. Mutations that knock out expression of the PIPO protein in Turnip mosaic potyvirus but leave the polyprotein amino acid sequence unaltered are lethal to the virus. Immunoblotting with antisera raised against two nonoverlapping 14-aa antigens, derived from the PIPO amino acid sequence, reveals the expression of an 25- kDa PIPO fusion product in planta. This is consistent with expres- sion of PIPO as a P3-PIPO fusion product via ribosomal frameshift- ing or transcriptional slippage at a highly conserved G 1-2 A 6-7 motif at the 5 end of pipo. This discovery suggests that other short overlapping genes may remain hidden even in well studied virus genomes (as well as cellular organisms) and demonstrates the utility of the software package MLOGD as a tool for identifying such genes. P3 PIPO Potyvirus Turnip mosaic virus frameshift T he genomes of many viruses are under strong selective pressure to compress maximal coding and regulatory infor- mation into minimal sequence space. Overlapping coding se- quences (CDSs) are particularly common in such viruses, al- though they also occur in more complex genomes, including eukaryotes (1). Such CDSs can be difficult to detect using conventional gene-finding software (2), especially when short. The software package MLOGD, however, was designed specif- ically for locating short overlapping CDSs in sequence align- ments and overcomes many of the difficulties with alternative methods (2, 3). MLOGD includes explicit models for sequence evolution in double-coding regions as well as models for single- coding and noncoding regions. It can be used to predict whether query ORFs are likely to be coding, via a likelihood ratio test, where the null model comprises any known CDSs, and the alternative model comprises the known CDSs plus the query ORF. We have been using MLOGD to search for new virus- encoded genes. The family Potyviridae is the largest plant virus family and includes 30% of known plant viruses (4, 5). The genomic RNA is positive sense and 10 kb long. To date, all experimental and sequencing evidence has supported the existence of only one functional ORF in the six Potyviridae genera, with the exception of the genus Bymovirus, in which the ORF is divided between two genomic RNAs (5). The ORF encodes a large (340- to 395-kDa) polyprotein that is cleaved into 10 mature proteins (Fig. 1; reviewed in ref. 6). The genomic RNA has a covalently linked 5-terminal protein (VPg) and a 3 polyA tail. Cap-independent translation is mediated by the 5UTR (7). Subgenomic tran- scripts are apparently not produced (8). Such a polyprotein expression strategy is common to many virus families, most notably the Picornaviridae. The potential for functional short overlapping ORFs in such viruses has been largely overlooked. However, when we applied MLOGD to the Potyviridae, we found evidence for an overlapping CDS within the P3 cistron. Here, we describe the bioinformatic evidence and experimental verifica- tion for the new CDS. Results Bioinformatics. The new CDS, pipo (Pretty Interesting Potyviridae ORF), was first identified in an alignment of Turnip mosaic virus (TuMV) (9) sequences, using the gene-finding software MLOGD (3). MLOGD has been tested extensively, using thou- sands of known virus CDSs as a test set, and it has been shown that, for overlapping CDSs, a total of just 20 independent base variations are sufficient to detect a new CDS with 90% confidence. The input alignment comprised the GenBank RefSeq NC002509 and 20 of its genome neighbors (i.e., other TuMV genome sequences of RefSeq quality but not identical to the chosen RefSeq). In TuMV (9,835 nt; polyprotein coords 131..9625), pipo has coords 3079..3258 (60 codons; 2 frame relative to polyprotein; Fig. 1), which places it within the P3 cistron (coords 2591..3655). The total number of independent base variations within pipo is 92, thus providing MLOGD with a robust signal. The MLOGD output showed that pipo has a strong coding signature and the ORF is present across the alignment. Further inspection revealed that the ORF (60 codons; 2 frame relative to polyprotein) is present in all 48 Potyvirus RefSeqs in GenBank. MLOGD, applied to the 48-RefSeq alignment, detected a very strong coding signature in the ORF, with 2,000 independent base variations now available to MLOGD. Scores outside of the ORF are negative, whereas, within the ORF, there are at least five nonoverlapping and hence completely independent high-positively scoring windows (Fig. 2). Starting from a highly conserved G 1-2 A 6-7 motif (Fig. 3), there are no pipo-frame termination codons in any of the 48 RefSeqs for at least 60 codons. In 29 of 48 RefSeqs, there is an AUG codon 9 codons 3 of the motif. Of the remaining 19 RefSeqs, 1 has an upstream AUG codon, 15 have downstream AUG codons (but only 3 are within 10 codons), and 4 have no AUG codon at Author contributions: B.Y.-W.C. and A.E.F. contributed equally to this work; B.Y.-W.C., W.A.M., J.F.A., and A.E.F. designed research; B.Y.-W.C. and A.E.F. performed research; B.Y.-W.C., W.A.M., J.F.A., and A.E.F. analyzed data; and B.Y.-W.C., W.A.M., J.F.A., and A.E.F. wrote the paper. The authors declare no conflict of interest. § To whom correspondence should be addressed. E-mail: a.fi[email protected]. This article contains supporting information online at www.pnas.org/cgi/content/full/ 0800468105/DCSupplemental. © 2008 by The National Academy of Sciences of the USA P1 HC-Pro P3 CI NIb coat 6K1 6K2 NIa- NIa- VPg Pro 131 9625 3079 3258 53GFP PIPO GGAAAAAA 0 frame +2 frame Fig. 1. TuMV genome map (9,835 nt). In the plasmid p35STuMV-GFP, GFP is fused between P1 and HC-Pro. The position of the overlapping CDS, pipo, and the G 2 A 6 motif are indicated. www.pnas.orgcgidoi10.1073pnas.0800468105 PNAS April 15, 2008 vol. 105 no. 15 5897–5902 MICROBIOLOGY Downloaded by guest on July 23, 2020

An overlapping essential gene in the Potyviridae · 60 to 115 codons. Of relevance to possible leaky scanning, pipo is preceded by many initiation codons, e.g., 57 AUG codons lie

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An overlapping essential gene in the Potyviridae · 60 to 115 codons. Of relevance to possible leaky scanning, pipo is preceded by many initiation codons, e.g., 57 AUG codons lie

An overlapping essential gene in the PotyviridaeBetty Y.-W. Chung*†, W. Allen Miller†, John F. Atkins*‡, and Andrew E. Firth*§

*BioSciences Institute, University College Cork, Cork, Ireland; †Department of Plant Pathology, 351 Bessey Hall, Iowa State University, Ames, IA 50011;and ‡Department of Human Genetics, University of Utah, Salt Lake City, UT 84112

Communicated by Paul Ahlquist, University of Wisconsin, Madison, WI, February 4, 2008 (received for review October 31, 2007)

The family Potyviridae includes >30% of known plant virus spe-cies, many of which are of great agricultural significance. Theseviruses have a positive sense RNA genome that is �10 kb long andcontains a single long ORF. The ORF is translated into a largepolyprotein, which is cleaved into �10 mature proteins. We reportthe discovery of a short ORF embedded within the P3 cistron of thepolyprotein but translated in the �2 reading-frame. The ORF,termed pipo, is conserved and has a strong bioinformatic codingsignature throughout the large and diverse Potyviridae family.Mutations that knock out expression of the PIPO protein in Turnipmosaic potyvirus but leave the polyprotein amino acid sequenceunaltered are lethal to the virus. Immunoblotting with antiseraraised against two nonoverlapping 14-aa antigens, derived fromthe PIPO amino acid sequence, reveals the expression of an �25-kDa PIPO fusion product in planta. This is consistent with expres-sion of PIPO as a P3-PIPO fusion product via ribosomal frameshift-ing or transcriptional slippage at a highly conserved G1-2A6-7 motifat the 5� end of pipo. This discovery suggests that other shortoverlapping genes may remain hidden even in well studied virusgenomes (as well as cellular organisms) and demonstrates theutility of the software package MLOGD as a tool for identifyingsuch genes.

P3 � PIPO � Potyvirus � Turnip mosaic virus � frameshift

The genomes of many viruses are under strong selectivepressure to compress maximal coding and regulatory infor-

mation into minimal sequence space. Overlapping coding se-quences (CDSs) are particularly common in such viruses, al-though they also occur in more complex genomes, includingeukaryotes (1). Such CDSs can be difficult to detect usingconventional gene-finding software (2), especially when short.The software package MLOGD, however, was designed specif-ically for locating short overlapping CDSs in sequence align-ments and overcomes many of the difficulties with alternativemethods (2, 3). MLOGD includes explicit models for sequenceevolution in double-coding regions as well as models for single-coding and noncoding regions. It can be used to predict whetherquery ORFs are likely to be coding, via a likelihood ratio test,where the null model comprises any known CDSs, and thealternative model comprises the known CDSs plus the queryORF. We have been using MLOGD to search for new virus-encoded genes.

The family Potyviridae is the largest plant virus family andincludes �30% of known plant viruses (4, 5). The genomic RNAis positive sense and �10 kb long. To date, all experimental andsequencing evidence has supported the existence of only onefunctional ORF in the six Potyviridae genera, with the exceptionof the genus Bymovirus, in which the ORF is divided between twogenomic RNAs (5). The ORF encodes a large (340- to 395-kDa)polyprotein that is cleaved into �10 mature proteins (Fig. 1;reviewed in ref. 6). The genomic RNA has a covalently linked5�-terminal protein (VPg) and a 3� polyA tail. Cap-independenttranslation is mediated by the 5�UTR (7). Subgenomic tran-scripts are apparently not produced (8). Such a polyproteinexpression strategy is common to many virus families, mostnotably the Picornaviridae. The potential for functional shortoverlapping ORFs in such viruses has been largely overlooked.However, when we applied MLOGD to the Potyviridae, we found

evidence for an overlapping CDS within the P3 cistron. Here, wedescribe the bioinformatic evidence and experimental verifica-tion for the new CDS.

ResultsBioinformatics. The new CDS, pipo (Pretty Interesting PotyviridaeORF), was first identified in an alignment of Turnip mosaic virus(TuMV) (9) sequences, using the gene-finding softwareMLOGD (3). MLOGD has been tested extensively, using thou-sands of known virus CDSs as a test set, and it has been shownthat, for overlapping CDSs, a total of just 20 independent basevariations are sufficient to detect a new CDS with �90%confidence. The input alignment comprised the GenBankRefSeq NC�002509 and 20 of its genome neighbors (i.e., otherTuMV genome sequences of RefSeq quality but not identical tothe chosen RefSeq). In TuMV (9,835 nt; polyprotein coords131..9625), pipo has coords 3079..3258 (60 codons; �2 framerelative to polyprotein; Fig. 1), which places it within the P3cistron (coords 2591..3655). The total number of independentbase variations within pipo is �92, thus providing MLOGD witha robust signal. The MLOGD output showed that pipo has astrong coding signature and the ORF is present across thealignment.

Further inspection revealed that the ORF (�60 codons; �2frame relative to polyprotein) is present in all 48 PotyvirusRefSeqs in GenBank. MLOGD, applied to the 48-RefSeqalignment, detected a very strong coding signature in the ORF,with �2,000 independent base variations now available toMLOGD. Scores outside of the ORF are negative, whereas,within the ORF, there are at least five nonoverlapping andhence completely independent high-positively scoring windows(Fig. 2).

Starting from a highly conserved G1-2A6-7 motif (Fig. 3), thereare no pipo-frame termination codons in any of the 48 RefSeqsfor at least 60 codons. In 29 of 48 RefSeqs, there is an AUGcodon 9 codons 3� of the motif. Of the remaining 19 RefSeqs, 1has an upstream AUG codon, 15 have downstream AUG codons(but only 3 are within 10 codons), and 4 have no AUG codon at

Author contributions: B.Y.-W.C. and A.E.F. contributed equally to this work; B.Y.-W.C.,W.A.M., J.F.A., and A.E.F. designed research; B.Y.-W.C. and A.E.F. performed research;B.Y.-W.C., W.A.M., J.F.A., and A.E.F. analyzed data; and B.Y.-W.C., W.A.M., J.F.A., and A.E.F.wrote the paper.

The authors declare no conflict of interest.

§To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/cgi/content/full/0800468105/DCSupplemental.

© 2008 by The National Academy of Sciences of the USA

P1 HC−Pr o P3 CI NI b coat 6K 1 6 K2

NIa− NIa− VPg Pr o

131 9625 3079 3258

5’ 3’

GFP

PIPO GGAAAAAA 0 fram e

+2 fram e

Fig. 1. TuMV genome map (9,835 nt). In the plasmid p35STuMV-GFP, GFP isfused between P1 and HC-Pro. The position of the overlapping CDS, pipo, andthe G2A6 motif are indicated.

www.pnas.org�cgi�doi�10.1073�pnas.0800468105 PNAS � April 15, 2008 � vol. 105 � no. 15 � 5897–5902

MIC

ROBI

OLO

GY

Dow

nloa

ded

by g

uest

on

July

23,

202

0

Page 2: An overlapping essential gene in the Potyviridae · 60 to 115 codons. Of relevance to possible leaky scanning, pipo is preceded by many initiation codons, e.g., 57 AUG codons lie

all. Thus, AUG-initiation is unlikely. In 16 of 48 RefSeqs, theG1-2A6-7 motif begins with a pipo-frame stop codon, UGA,instead of GGA, thus precluding 5� extension of the ORF. Thereis no well conserved termination codon position at the 3�end—with ORF lengths (from the G1–2A6–7 motif) ranging from60 to 115 codons. Of relevance to possible leaky scanning, pipois preceded by many initiation codons, e.g., 57 AUG codons liebetween the polyprotein AUG and pipo in the TuMV genome,of which 24 are in the polyprotein frame.

Application of National Center for Biotechnology Informa-tion blastp (word size 2; virus RefSeq db) to PIPO revealed nosimilar amino acid sequences in GenBank, whereas tblastnidentified only the pipo region in other Potyvirus sequences (asexpected). Hydrophobicity plots are broadly similar for theN-terminal �60 aa: a hydrophobic peak followed by a hydro-philic peak and then a narrower hydrophobic peak. PIPO is richin the basic residues Arg and Lys: Averaged over the 48 RefSeqs,their relative abundances are, respectively, 2.1 and 1.5 timeshigher in PIPO than in the polyprotein. Although predictedRNA secondary structures were found in individual RefSeqs, nowidely conserved secondary structures were found in the vicinityof pipo [using alidot (11)].

There are four other Potyviridae genera with genomic RefSeqsin GenBank: Bymovirus (four RefSeqs � two RNA segments);

Ipomovirus (one RefSeq), Rymovirus (three RefSeqs) and Tri-timovirus (three RefSeqs). With the exception of Rymovirus andPotyvirus, the genera are highly divergent [typically 33–45%nucleotide identity between genera within the P3 cistron (12)].In the genus Bymovirus, the longer genomic RNA (RNA1;�7,500 nt) contains a single polyprotein CDS that codes for theC-terminal eight proteins of the Potyvirus polyprotein—thus P3becomes the N-terminal protein. In the P3 cistron, there is aG(G/C)A6 motif followed by a 74- to 88-codon �2 frame ORF,for which MLOGD detects a strong coding signature. In Triti-movirus, there is an A2G2A6-7 motif within the P3 cistronfollowed by a 134- to 137-codon �2 frame ORF, for whichMLOGD again detects a strong coding signature. Similar resultshold for an alignment of the Tritimovirus Wheat streak mosaicvirus (WSMV) RefSeq with its four genome neighbors. InRymovirus, there is a GA6GA motif within the P3 cistronfollowed by a 104- to 116-codon �2 frame ORF for whichMLOGD detects a good coding signature. In Ipomovirus, thereis a GA7 motif in the P3 cistron followed by a 99-codon �2 frameORF. It is noteworthy that the frame of the G1-2A6-7 motif is notfully conserved (Fig. 3). In Potyvirus, the frame is generally (G)GAA AAA A(A) (spaces separate the polyprotein codons). InBymovirus, the frame is GGA AAA AA. In Tritimovirus, twoRefSeqs have G GAA AAA A, whereas the other has GGA

Fig. 2. (Upper) MLOGD sliding-window plot for an alignment of the 48 GenBank Potyvirus RefSeqs (polyprotein region only; CLUSTALW (10) amino acidalignment back-translated to nucleotide sequence). Reference sequence, NC�002509 (TuMV); window size, 40 codons; step size, 20 codons. Each window isrepresented by a small circle (showing the likelihood ratio score for that window), and gray bars show the width (ends) of the window. The null model, in eachwindow, is that only the polyprotein frame is coding, whereas the alternative model is that both the polyprotein frame and the window frame are coding. Positivescores (i.e., above the horizontal dashed line) favor the alternative model. The positions of stop codons in each frame are shown below the likelihood scores andthe positions of alignment gaps (green) in all 48 RefSeqs are shown at the top. Only the �1 (red) and �2 (purple) frames are shown because the �0 frame is thepolyprotein frame that is included in the null model. Scores are generally negative with occasional random scatter into low positive scores, except for the piporegion, which has consecutive high-positively scoring windows and a corresponding ORF present across the alignment. Note that other apparent ‘‘gaps’’ in thestop codon annotation mainly correspond to alignment gaps. See ref. 3 for details of MLOGD software. (Lower) Zoom-in showing pipo and the flanking �400nt. Window size, 10 codons; step size, 5 codons. Note that pipo contains at least five nonoverlapping and hence completely independent high-positively scoringwindows. Formally, and within the MLOGD model, P � 10–100.

5898 � www.pnas.org�cgi�doi�10.1073�pnas.0800468105 Chung et al.

Dow

nloa

ded

by g

uest

on

July

23,

202

0

Page 3: An overlapping essential gene in the Potyviridae · 60 to 115 codons. Of relevance to possible leaky scanning, pipo is preceded by many initiation codons, e.g., 57 AUG codons lie

AAA AAA. Further MLOGD results, GenBank accessionnumbers for all sequences used, and comparisons of the sizes andsequence divergences of P3 and PIPO are included in supportinginformation (SI) Figs. S1–S7 and Tables S1 and S2.

PIPO Knockouts. To investigate whether pipo is an essential gene,we introduced mutations into an infectious clone of TuMV(p35STuMV-GFP) that expresses GFP for easy detection ofvirus-infected tissue. Two PIPO knockout mutants were gener-ated. Each differs from WT GFP-expressing TuMV by just asingle point mutation: GAC 3 GAU and UCG 3 UCU atNC�002509 coordinates 3103 and 3130, respectively (polypro-tein-frame codons shown). These mutations are synonymouswith respect to the polyprotein frame, but introduce prematuretermination codons (UGA) into pipo. The altered codon usagein the polyprotein ORF is not a factor because the new codonsoccur in the TuMV genome at frequencies similar (within 20%)to those they replace.

Nicotiana benthamiana plants were inoculated by bombard-ment with gold particles coated with p35STuMV-GFP DNA orthe mutant derivatives or with water as a control. Each constructwas bombarded into four plants. None of the eight plantsbombarded with DNA containing mutations in PIPO showedany sign of infection up to 12 days postinoculation (dpi) (Fig. 4).There was no green fluorescence, and plants were as tall andhealthy as uninoculated controls. In contrast, three of fourWT-infected plants showed extensive infection (GFP fluores-cence, stunted growth, and wilting) over the same time period.(The sole uninfected plant was bombarded with a smalleramount of DNA than the three infected ones and those bom-barded with mutant TuMV DNAs.) RT-PCR with primersspecific to p35STuMV-GFP, using lysate from infected plants at12 dpi, confirmed that TuMV RNA was present only in WT-infected plants (Fig. 5).

Immunoblotting. Separate antisera were raised against two pre-dicted 14-aa antigen sites within PIPO (AS 2–15: amino acids

Fig. 3. Pipo sequence. (A) Extract of 9 RefSeqs from an alignment of the 48 Potyvirus GenBank RefSeqs, showing the region around the 5� end of pipo, whichcoincides with the annotated highly conserved G1-2A6-7 motif (see Fig. S8 for the full 48-RefSeq alignment). Viruses: NC�001445, plum pox; NC�001555, tobaccoetch; NC�001616, potato Y; NC�001671, pea seed-borne mosaic; NC�002509, turnip mosaic; NC�002634, soybean mosaic; NC�003397, bean common mosaic;NC �004039, potato A; NC�003398, sugarcane mosaic. (B) Similar [GA]-rich tracts from the 5� end of pipo in representative RefSeqs from other Potyviridae genera(see Fig. S8 for other RefSeqs). Viruses: NC�001814, ryegrass mosaic; NC�002350, wheat yellow mosaic; NC�001886, wheat streak mosaic; NC�003797, sweet potatomild mottle. (C) The TuMV/p35STuMV-GFP PIPO peptide and nucleotide sequences, with the two antigen sites underlined in the peptide sequence and the twoknockout point mutation sites underlined in the nucleotide sequence.

Fig. 4. N. benthamiana 12 days postinoculation (dpi). (Upper) Natural light. (Lower) UV light. From left: plants bombarded with (1) no DNA; (2) WTp35STuMVGFP virus (i.e., TuMV with GFP fused between P1 and HC-Pro); (3) and (4) p35STuMV-GFP PIPO knockout mutants p41 and p68; and (5) nonbombardedplants. In each of (1–5), only one of four replicates is depicted. Only WT-infected plants showed any sign of infection (three of four replicates) with extensiveGFP fluorescence, stunted growth, and wilting by 12 dpi. Note that the small yellowish-green patches visible on some leaves under UV—e.g., Lower (1) and (4),are necrosis caused by bombardment (cf. natural light photos). Some patches on the soil also appear green under UV. However, GFP fluorescence is clearlydistinguishable from both of these. High resolution photos are in Fig. S9 and Fig. S10.

Chung et al. PNAS � April 15, 2008 � vol. 105 � no. 15 � 5899

MIC

ROBI

OLO

GY

Dow

nloa

ded

by g

uest

on

July

23,

202

0

Page 4: An overlapping essential gene in the Potyviridae · 60 to 115 codons. Of relevance to possible leaky scanning, pipo is preceded by many initiation codons, e.g., 57 AUG codons lie

2–15; AS 39–52: amino acids 39–52; Fig. 3C). Both antiseradetected �25-kDa products in lysates from WT-infected plants(Fig. 6A). Neither antiserum detected an �6- to 7-kDa product,which is the predicted size of PIPO. Nonetheless, AS 39–52clearly revealed an �7-kDa product generated by in vitro trans-lation of an artificial transcript encoding only pipo (Fig. 6B) (piponucleotide sequence preceded by T7 promoter and followed byFLAG epitope; presumably initiates at the AUG codon 9 codons3� of the G2A6 motif, resulting in a 7-kDa product lacking the AS2–15 antigen) demonstrating that AS 39–52, at least, woulddetect such a product if it were expressed in vivo.

DiscussionPossible Translational Mechanisms. The detection with both anti-sera of an �25-kDa product and the absence of an �6- to 7-kDaproduct, appears to rule out independent ribosome entry intopipo (e.g., via an IRES or shunting). We propose that theG1-2A6-7 motif facilitates ribosomal frameshifting or transcrip-tional slippage (13), allowing expression of PIPO as a fusionproduct with the N-terminal region of P3 (hereafter P3N). In factthe predicted size of PIPO fused with P3N is 25.3 kDa, in goodagreement with the observed �25-kDa band.

Many viruses harbor sequences that induce a portion ofribosomes to shift �1 nt and continue translating in the new(equivalent to �2) reading-frame (13). The �1 frameshift site

consists of a slippery heptanucleotide typically fitting the con-sensus motif N NNW WWH. This is followed by a highlystructured region, usually a pseudoknot, beginning 5–9 nt down-stream (14). The G1-2A6-7 motif may serve as a slippery hep-tanucleotide (G GAA AAA, A AAA AAH or G AAA AAA,allowing for G:U repairings) in 47 of the 60 RefSeqs. However,although we could find potential downstream stimulatory RNAsecondary structures in some RefSeqs, we could not find widelyconserved structures. It remains to be seen whether some otherprocess (e.g., long-range base pairing or the nascent P3 peptidesequence acting within the ribosomal exit tunnel) may, in thiscase, stimulate ribosomal frameshifting. We were unable toobserve any frameshift product from in vitro translation of aconstruct comprising (in order) the T7 promoter, the TuMV5�UTR, the TuMV P3 cistron with artificial in-frame initiation(AUG) and termination (UAA) codons, and the TuMV 3�UTR(Fig. S11 and data not shown) suggesting that, if ribosomalframeshifting occurs in TuMV, it may require distal sequenceelements (cf. ref. 15).

Alternatively, the G1–2A6–7 motif may allow transcriptionalslippage, whereby the viral polymerase ‘‘slips’’ on a region ofrepeated identical nucleotides, resulting in the insertion (ordeletion) of extra nucleotides in a portion of transcripts. Tran-scriptional slippage has been documented in the Paramyxoviridae(e.g., Sendai virus) and the Filoviridae (e.g., Ebola virus) (16). Incontrast to the Paramyxoviridae slippage site (generally A5–6G2–5or A2GAG4–7), in the Potyviridae the Gs are 5� of the As, whichwould appear to favor nucleotide deletions rather than insertions(because of the possibility of G:U repairings but not A:Crepairings)—although insertional slippage on A7 is also possible[as in Ebola virus (17)]. To produce transcripts that allow PIPOexpression, a 2-nt deletion or 1-nt insertion would be required.Mass spectrometry of cDNAs (nominally 62 nt, covering theG1-2A6-7 motif) derived from RNA extracted from an N.benthamiana plant infected with p35STuMV-GFP (Fig. 4) failedto detect transcripts with deletions or insertions (data notshown). Nonetheless, it is possible that some Potyviridae speciesor genera use transcriptional slippage, whereas others use ribo-somal frameshifting [cf. DnaX (18)]. If so, then presumablyeither the sequence difference somehow prevents encapsidationof the modified RNAs or the slippage frequency is sufficientlylow, so modified RNAs are not detected or have been disre-garded as sequencing errors.

In any case, it is clear that the G1-2A6-7 motif is highlyconstrained. The polyprotein-frame codons—GAA and AAA(Fig. 3)—both have twofold degeneracy (GAA and GAG codefor Glu; AAA and AAG code for Lys), yet AAG is never usedat this location, and GAG is only used in two RefSeqs with motifsGAGA5 and GAGA6. Similarly, a WSMV mutant with a singlepoint mutation G GAA AAA A 3 G GAG AAA A (synony-mous in the polyprotein frame) is unable to infect plantssystemically (19). Three other point mutations in the vicinity (at�6, �9, and �18 nt, respectively) did not affect infectivity (19).

Potential Protein Function. Data published by Choi et al. (19) (seealso ref. 20) provide interesting evidence for the function ofPIPO. The authors constructed six WSMV mutants, each con-taining 10–16 synonymous (polyprotein frame) point mutationsspaced along the 3� half of the P3 cistron. All mutants had thesame phenotype: (i) the ability to replicate was retained at22–80% of WT; and (ii) the virus was able to establish infectionfoci limited to small clusters of cells that increased in size onlyslightly by 5 dpi, whereas infection foci produced by WT viruswere much larger at 3 dpi and had coalesced by 5 dpi, as indicatedin histochemical GUS assays. The authors postulated that themutations disrupted an important RNA secondary structureinvolved in movement. However, we could find no evidence fora more widely conserved RNA secondary structure. In fact, all

Fig. 5. Detection of TuMV accumulation in plants by RT-PCR. Primers specificto p35STuMV-GFP were used to amplify cDNAs from RNA isolated from plantsinfected with WT TuMV (p35STuMV-GFP; lane 2), the two PIPO knockoutmutants, p41 and p68 (lanes 2 and 3 respectively), and both negative (water;lane 5) and nonbombarded (lane 6) controls, all at 12 dpi. The PCR controlswere plasmid p35STuMV-GFP for the positive control (lane 7) and water forthe negative control (lane 8). Marker DNAs are in lane 1. TuMV RNA waspresent only in WT-infected plants.

Fig. 6. Immunoblot detection of PIPO products. (A) An approximately25-kDa PIPO fusion product. AS 2–15, antiserum against PIPO N-terminalpeptide (amino acids 2–15). AS 39–52, antiserum against PIPO C-terminalpeptide (amino acids 39–52). �, lysate from noninfected plants. �, lysate fromplants infected with WT TuMV (p35STuMV-GFP). Lanes were loaded withequal amounts of tissue; similar results were obtained when lanes wereloaded with equal total protein (data not shown). (B) �, detection of an�7-kDa product with AS 39–52 from in vitro translation of pipo nucleotidesequence preceded by T7 promoter and followed by FLAG epitope, demon-strating that the antiserum would detect such a product if it were expressedin vivo. �, negative control (no RNA).

5900 � www.pnas.org�cgi�doi�10.1073�pnas.0800468105 Chung et al.

Dow

nloa

ded

by g

uest

on

July

23,

202

0

Page 5: An overlapping essential gene in the Potyviridae · 60 to 115 codons. Of relevance to possible leaky scanning, pipo is preceded by many initiation codons, e.g., 57 AUG codons lie

six mutants contain many nonsynonymous mutations in the piporeading-frame. Importantly, a seventh mutant with synonymousmutations in the downstream CI cistron (3� of the pipo termi-nation codon) had WT phenotype. In contrast to the mutantsdescribed in ref. 19, no infection at all was detected for ourTuMV PIPO knockout mutants (although local infection limitedto just small clusters of cells may not have been apparent).Possible roles for PIPO include replication, movement, suppres-sion of systemic silencing, or a combination of functions.

The P3 protein, whose cistron pipo overlaps (Fig. 1), is a�42-kDa nonstructural protein and is one of the least wellcharacterized Potyvirus proteins. It may be involved in virusreplication, pathogenicity, resistance breaking, and systemicinfection (6, 9, 21–23). It is interesting that previous investiga-tions into the function and localization of P3 have producedconflicting results (6). Such discrepancies may in fact be a resultof antibodies raised against P3 being sensitive also to P3N�PIPOin some studies but not in others. A survey of published materialusing antibodies against P3 produced no conclusive evidenceeither for or against PIPO (see Table S4). Furthermore, somephenotypes linked to mutations in the P3 cistron may result fromalterations to P3N�PIPO or its translational mechanism ratherthan P3 itself. For example, a key virulence determinant ofZucchini yellow mosaic potyvirus has been mapped to thepolyprotein-frame codon immediately 5� of the GAA AAA Amotif (24), mutations of which could affect the level ofP3N�PIPO production.

ConclusionsWe have demonstrated the existence of a new coding sequence,pipo, in the Potyviridae—a vast and extremely diverse family ofplant viruses, many of which are of great agricultural significance(5). Evidence includes: (i) the ubiquitous presence of an ORF(�60 codons) in all 60 Potyviridae GenBank RefSeqs; (ii) theubiquitous presence of a strong MLOGD coding signature forthis ORF; (iii) WSMV mutants in ref. 19 that disrupted PIPO butdid not change the polyprotein amino acid sequence were unableto establish systemic infection, whereas mutants that did notdisrupt PIPO had WT phenotype; (iv) our two TuMV PIPOknockout mutants, each differing from WT by a single pointmutation synonymous in the polyprotein frame, were noninfec-tious; and (v) an �25-kDa product was detected in lysate fromTuMV-infected plants by two separate antisera raised againstdifferent 14-aa domains in PIPO. The combination of our results(Potyvirus) with reinterpreted results from ref. 19 (Tritimovirus)gives broad phylogenetic support for these observations. Futurework will address the mechanism by which pipo is translated andthe function(s) of the PIPO fusion protein.

The Potyviridae have been grouped into the picorna-like virussuperfamily, a loose grouping of single-stranded positive-senseRNA viruses that includes Picornaviridae, Caliciviridae, Como-viridae, Dicistroviridae, Potyviridae, and Sequiviridae (8). Over-lapping and frameshift CDSs are almost unknown in the super-family. Notable exceptions include Theilovirus (Picornaviridae)(25), Acyrthosiphon pisum virus (unclassified picorna-like virus)(26), and the Caliciviridae (27, 28) (see SI Text for furtherdetails). Identification of these CDSs was no doubt aided by theirsize (all are � 100 codons). The discovery of pipo, a short CDSembedded deep within the polyprotein cistron, raises the ques-tion of how many other such CDSs await discovery. We havedemonstrated the utility of MLOGD as a tool to search for such‘‘difficult’’ CDSs in all virus genomes.

Materials and MethodsPlasmids. For in vivo translation, three plasmids were used: p35STuMV-GFP,p41, and p68. p35STuMV-GFP (29) contains the full CDS of TuMV with GFPfused between P1 and HC-Pro (Fig. 1). GFP is cleaved from the polyprotein,allowing in situ visualization of viral infection. Plasmids p41 and p68 are

modified versions of p35STuMV-GFP, each containing a point mutation that issynonymous with respect to the polyprotein but introduces a stop codon in thepipo reading-frame (20 nt and 47 nt 3� of the G2A6 motif for p41 and p68respectively; Fig. 3). The mutations were achieved by PCR, using the GenTailorsite directed mutagenesis system (Invitrogen) with primers 41F, 41R, 68F, and68R (all primer sequences are given in Table S3) on the vector pTORFX, whichis a PCR fragment containing pipo, generated from p35STuMV-GFP usingprimers F and R, cloned into pCR 2.1-TOPO (Invitrogen). The inserts weresubcloned via KpnI and MfeI sites into pUC19X, resulting in pUC41 and pUC68.pUC19X is a pUC19-based vector containing 8 kb of the TuMV cDNA (fromKpnI, which is in HC-Pro, to SalI, the end of the viral cDNA). Finally, the 8-kbfragments in pUC41 and pUC68, containing KpnI and SalI, were subclonedback into p35STuMV-GFP, resulting in p41 and p68.

The plasmid pTT7pipoFLAG was used for in vitro translation of pipo, todetermine gel migration of isolated PIPO product and test the antiserum AS39–52. The plasmid was derived from the TA cloning vector pGemT-easy(Promega) with an insert consisting of the TuMV pipo sequence with the T7promoter fused to its 5� end, the FLAG epitope fused to its 3� end before thetermination codon, and 99 nt of downstream sequence. The insert was re-leased by digestion with EcoRI, resulting in T7pipo-FLAG, followed by gelpurification for in vitro transcription and/or translation.

Inoculation of N. benthamiana via Gene Gun-Delivery of TuMV cDNA. Beforebombardment, 15 �g of 1-�m gold particles were washed and coated with 3�g of plasmid DNA according to ref. 30 with variations as follows: The samplewas continuously mixed on the vortexer while 50 �l of 50% glycerol, 25 �l of2.5 M CaCl2, and 10 �l of 0.1 M spermidine were added. After 5–10 min ofcontinuous mixing, the DNA-gold was washed with 70 �l of 70% isopropanoland once again with 100% isopropanol, followed by resuspension in 30 �l of100% isopropanol. Ten microliters of the DNA-coated gold was pipetted ontoeach macrocarrier while the suspension was continuously shaken. Up to fourplants were bombarded from each 1� tube of DNA-coated gold particles.

N. benthamiana plants used for bombardment were 5–7 weeks old andgrown in a 22°C growth chamber with 14 h of daylight. Transformation wascarried out with a PDS 1000/He biolistic gun (Bio-Rad), using the followingparameters: 650 psi rupture disk pressure; a 6-cm target distance (from middleof launch assembly to target plate); a 6-mm gap; 1.2 cm from macrocarrier tostopping plate; and a 28-torr vacuum at rupture. All hardware and disposablesfor the biolistic gun were obtained from Bio-Rad.

Plant Tissue Preparation. Fresh plant tissues were ground in liquid nitrogenfollowed by addition of TBS [10 mM Tris�HCl (pH 7.4), 300 mM NaCl, 5 mMEDTA, 1 mM PMSF, and 1 mM DTT) up to a tissue concentration of 1 mg/�l. Themixtures were then centrifuged at 10,000 � g for 10 min at 4°C and thesupernatant was respun until clear. The clear supernatant was used forimmunoblot analysis. In addition, quantitation of total protein in the lysatewas done by using the Bradford assay (Bio-Rad).

Immunoblotting. NuPAGE sample buffer (Invitrogen) was added to plant lysates,heated for 2 min at 100°C, and subjected to NuPAGE electrophoresis on 4–12%NuPAGE Bis-Tris gels with MES-SDS running buffer (Invitrogen) for 40 min.SamplesweretransferredtoHybond-ECLnitrocellulosemembranes (Amersham),followed by blocking with 3% nonfat dry milk in 1� PBS-T for 30 min at roomtemperature. The blocked membranes were incubated overnight at 4°C withprimary antiserum, washed, and incubated with horseradish peroxidase-conjugatedgoat secondaryAb(Bio-Rad) for30min.Thebandswerevisualizedbydeveloping with enhanced chemiluminescence (Pierce). The following rabbitantisera (Genscript) were used: AS 2–15, raised against peptide sequence KKLST-NLGRSMERVC (TuMV PIPO amino acids 2–15 plus ‘‘C’’; Fig. 3) and AS 39–52, raisedagainst peptide sequence ANEKRSRFRRQIQRC (TuMV PIPO amino acids 39–52plus ‘‘C’’). In addition, 3.125 �l/ml wheat germ extract (Promega) was addedduring binding of AS 39–52 for higher specificity.

RT-PCR. Plants were ground in liquid nitrogen and RNA was extracted by usingTriazol (Invitrogen). Three �g of RNA were subjected to RT-PCR, using Super-Script II (Invitrogen) with primer R, which is specific to p35STuMV-GFP, 1,813nt 3� of the G2A6 motif. This was followed by PCR, using platinum Taq Hi-Fi(Invitrogen) with primers F and X99R. The PCR controls were p35STUMV-GFPfor the positive controls and water for the negative control.

In Vitro Transcription and Translation. In vitro translation of T7PIPO-FLAG wasdone first by transcribing the RNA, using the T7 Megascript kit (Ambion), fol-lowed by capping, using the ScriptCap m7G capping system (Epicentre). This wasfollowed by in vitro translation where 0.2 pmol of RNA transcript were added to

Chung et al. PNAS � April 15, 2008 � vol. 105 � no. 15 � 5901

MIC

ROBI

OLO

GY

Dow

nloa

ded

by g

uest

on

July

23,

202

0

Page 6: An overlapping essential gene in the Potyviridae · 60 to 115 codons. Of relevance to possible leaky scanning, pipo is preceded by many initiation codons, e.g., 57 AUG codons lie

wheatgermextract (Promega) inafinal reactionvolumeof12.5 �l andtranslatedfor 2 h at 25°C. Total protein from the translation mix was separated on a precast4–12% PAGE (Invitrogen), which was used for Western blot analysis.

ACKNOWLEDGMENTS. We thank Valerie Torney for assistance with bom-bardment, Jaime Holdridge for assistance with photography, and Krzysztof

Treder for assistance with protein work. We thank Steve Whitham (IowaState University) for providing the plasmid p35STuMV-GFP and valuableadvice; we also thank the anonymous reviewers for their helpful com-ments. This work was supported by an award from Science FoundationIreland and National Institutes of Health Grants R01 GM067104 (to W.A.M.)and R01 GM079523 (to J.F.A.).

1. Chung WY, Wadhawan S, Szklarczyk R, Pond SK, Nekrutenko A (2007) A first look atARFome: Dual-coding genes in mammalian genomes. PLoS Comput Biol 3:91.

2. Firth AE, Brown CM (2005) Detecting overlapping coding sequences with pairwisealignments. Bioinformatics 21:282–292.

3. Firth AE, Brown CM (2006) Detecting overlapping coding sequences in virus genomes.BMC Bioinformatics 7:75.

4. Riechmann JL, Laın S, Garcıa JA (1992) Highlights and prospects of potyvirus molecularbiology. J Gen Virol 73:1–16.

5. Berger PH, et al. (2005) in Virus Taxonomy: Eighth Report of the InternationalCommittee on the Taxonomy of Viruses, eds Fauquet CM, Mayo MA, Maniloff J,Desselberger U, Ball LA (Elsevier Academic, San Diego), pp 819–841.

6. Urcuqui-Inchima S, Haenni AL, Bernardi F (2001) Potyvirus proteins: A wealth offunctions. Virus Res 74:157–175.

7. Kneller EL, Rakotondrafara AM, Miller WA (2006) Cap-independent translation ofplant viral RNAs. Virus Res 119:63–75.

8. Liljas L, Tate J, Lin T, Christian P, Johnson JE (2002) Evolutionary and taxonomicimplications of conserved structural motifs between picornaviruses and insect picorna-like viruses. Arch Virol 147:59–84.

9. Walsh JA, Jenner CE (2002) Turnip mosaic virus and the quest for durable resistance.Mol Plant Pathol 3:289–300.

10. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: Improving the sensitivity ofprogressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680.

11. Hofacker IL, et al. (1998) Automatic detection of conserved RNA structure elements incomplete RNA virus genomes. Nucleic Acids Res 26:3825–3836.

12. Adams MJ, Antoniw JF, Fauquet CM (2004) Molecular criteria for genus and speciesdiscrimination within the family Potyviridae. Arch Virol 150:459–479.

13. Baranov PV, Gesteland RF, Atkins JF (2002) Recoding: Translational bifurcations in geneexpression. Gene 286:187–201.

14. Giedroc DP, Theimer CA, Nixon PL (2000) Structure, stability and function of RNApseudoknots involved in stimulating ribosomal frameshifting. J Mol Biol 298:167–185.

15. Barry JK, Miller WA (2002) A �1 ribosomal frameshift element that requires basepairing across four kilobases suggests a mechanism of regulating ribosome and rep-licase traffic on a viral RNA. Proc Natl Acad Sci USA 99:11133–11138.

16. Kolakofsky D, Roux L, Garcin D, Ruigrok RW (2005) Paramyxovirus mRNA editing, the‘‘rule of six’’ and error catastrophe: A hypothesis. J Gen Virol 86:1869–1877.

17. Volchkov VE, et al. (1995) GP mRNA of Ebola virus is edited by the Ebola viruspolymerase and by T7 and vaccinia virus polymerases. Virology 214:421–430.

18. Larsen B, Wills NM, Nelson C, Atkins JF, Gesteland RF (2000) Nonlinearity in geneticdecoding: Homologous DNA replicase genes use alternatives of transcriptional slip-page or translational frameshifting. Proc Natl Acad Sci USA 97:1683–1688.

19. Choi IR, Horken KM, Stenger DC, French R (2005) An internal RNA element in the P3cistron of Wheat streak mosaic virus revealed by synonymous mutations that affectboth movement and replication. J Gen Virol 86:2605–2614.

20. Choi IR, et al. (2001) Contributions of genetic drift and negative selection on theevolution of three strains of wheat streak mosaic tritimovirus. Arch Virol 146:619–628.

21. Dallot S, et al. (2001) Identification of Plum pox virus determinants implicated inspecific interactions with different Prunus spp. Phytopathology 91:159–164.

22. Johansen IE, Lund OS, Hjulsager CK, Laursen J (2001) Recessive resistance in Pisumsativum and potyvirus pathotype resolved in a gene-for-cistron correspondence be-tween host and virus. J Virol 75:6609–6614.

23. Jenner CE, et al. (2003) The dual role of the potyvirus P3 protein of Turnip mosaic virusas a symptom and avirulence determinant in brassicas. Mol Plant Microbe Interact16:777–784.

24. Glasa M, Svoboda J, Novakova S (2007) Analysis of the molecular and biologicalvariability of zucchini yellow mosaic virus isolates from Slovakia and Czech Republic.Virus Genes 35:415–421.

25. Kong WP, Roos RP (1991) Alternative translation initiation site in the DA strain ofTheiler’s murine encephalomyelitis virus. J Virol 65:3395–3399.

26. van der Wilk F, Dullemans AM, Verbeek M, van den Heuvel JF (1997) Nucleotidesequence and genomic organization of Acyrthosiphon pisum virus. Virology 238:353–362.

27. Green KY, et al. (2000) Taxonomy of the caliciviruses. J Infect Dis 181:S322–330.28. Clarke IN, Lambden PR (2000) Organization and expression of calicivirus genes. J Infect

Dis 181:S309–316.29. Lellis AD, Kasschau KD, Whitham SA, Carrington JC (2002) Loss-of-susceptibility mu-

tants of Arabidopsis thaliana reveal an essential role for eIF(iso)4E during potyvirusinfection. Curr Biol 12:1046–1051.

30. Frame BR et al. (2000) Production of transgenic maize from bombarded type II callus:Effect of gold particle size and callus morphology on transformation efficiency. In VitroCell Dev Biol 36:21–29.

5902 � www.pnas.org�cgi�doi�10.1073�pnas.0800468105 Chung et al.

Dow

nloa

ded

by g

uest

on

July

23,

202

0