Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
A peer-reviewed version of this preprint was published in PeerJ on 18December 2018.
View the peer-reviewed version (peerj.com/articles/6136), which is thepreferred citable publication unless you specifically need to cite this preprint.
Andrews RJ, Roche J, Moss WN. 2018. ScanFold: an approach for genome-widediscovery of local RNA structural elements—applications to Zika virus andHIV. PeerJ 6:e6136 https://doi.org/10.7717/peerj.6136
Genome-wide discovery of local RNA structural elements in
Zika virus
Ryan J Andrews 1 , Julien Roche 1 , Walter N Moss Corresp. 1
1 Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States
Corresponding Author: Walter N Moss
Email address: [email protected]
In addition to encoding RNA primary structures, genomes also encode RNA secondary and tertiary
structures that play roles in gene regulation and, in the case of RNA viruses, genome replication.
Methods for the identification of functional RNA structures in genomes typically rely on scanning analysis
windows, where multiple partially-overlapping windows are used to predict RNA structures and folding
metrics to deduce regions likely to form functional structure. Separate structural models are produced for
each window, where the step size can greatly affect the returned model. This makes deducing unique
local structures challenging, as the same nucleotides in each window can be alternatively base paired. In
the presented approach, all base pairs from all analysis windows are considered and weighted by
favorable folding metrics throughout all windows. This results in unique base pairing throughout the
genome and the generation of local regions/structures that can be ranked by their propensity to form
unusually thermodynamically stable folds.
This approach was applied to the Zika virus (ZIKV) genome. ZIKV is linked to a variety of neurological
ailments including microcephaly and Guillain-Barré syndrome and its (+)-sense RNA genome encodes
two, previously described, functionally essential structured RNA regions. Our approach is able to
successfully identify and model the structures of these regions, while also finding additional regions likely
to form functional RNA structures throughout the viral polyprotein coding region. All data for the ZIKV
genome have been archived at the RNAStructuromeDB, a repository of RNA folding data for humans and
their pathogens.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27101v1 | CC BY 4.0 Open Access | rec: 9 Aug 2018, publ: 9 Aug 2018
1 Genome-wide discovery of local RNA structural elements in Zika virus.
2 Ryan J. Andrews1, Julien Roche1 and Walter N. Moss1,*
3 1Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State
4 University, Ames, IA 50011, United States of America
5 *Corresponding Author: e-mail: [email protected], Address: Roy J. Carver Department of
6 Biochemistry, Biophysics and Molecular Biology, Molecular Biology Building, 2437 Pammel
7 Drive, Ames, IA 50011-1079
8 Corresponding Author:
9 Walter Moss1
10 Email address: [email protected]
11
12
13
14
15
16
17
18
19
20
21
22
23 Abstract
24 In addition to encoding RNA primary structures, genomes also encode RNA secondary and tertiary
25 structures that play roles in gene regulation and, in the case of RNA viruses, genome replication.
26 Methods for the identification of functional RNA structures in genomes typically rely on scanning
27 analysis windows, where multiple partially-overlapping windows are used to predict RNA
28 structures and folding metrics to deduce regions likely to form functional structure. Separate
29 structural models are produced for each window, where the step size can greatly affect the returned
30 model. This makes deducing unique local structures challenging, as the same nucleotides in each
31 window can be alternatively base paired. In the presented approach, all base pairs from all analysis
32 windows are considered and weighted by favorable folding metrics throughout all windows. This
33 results in unique base pairing throughout the genome and the generation of local regions/structures
34 that can be ranked by their propensity to form unusually thermodynamically stable folds.
35 This approach was applied to the Zika virus (ZIKV) genome. ZIKV is linked to a variety of
36 neurological ailments including microcephaly and Guillain-Barré syndrome and its (+)-sense RNA
37 genome encodes two, previously described, functionally essential structured RNA regions. Our
38 approach is able to successfully identify and model the structures of these regions, while also
39 finding additional regions likely to form functional RNA structures throughout the viral
40 polyprotein coding region. All data for the ZIKV genome have been archived at the
41 RNAStructuromeDB, a repository of RNA folding data for humans and their pathogens.
42 Introduction
43 Zika virus (ZIKV) was first isolated from African rhesus monkeys in 1947 (Dick 1952; Dick et al.
44 1952). The natural transmission cycle of ZIKV involves mosquitos as vectors and monkeys as
45 reservoirs, where humans are only occasional hosts (Saiz et al. 2016). Recent outbreaks, such as
46 the 2007 Micronesian epidemic and the 2013 outbreak in French Polynesia (Lanciotti et al. 2016),
47 has seen ZIKV spread beyond Africa to reach a global population (48 countries as of 2016; (2017)).
48 Initial infection is fairly benign—joint pain, fever, rash, as well as head and muscle aches—
49 however, when contracted by pregnant women, ZIKV can cause harm to the developing baby.
50 ZIKV infection is definitively linked to neonatal microcephaly and Guillain-Barré syndrome
51 (Miner & Diamond 2017), eye/hearing problems and impaired growth (Broutet et al. 2016); it is
52 also implicated in other neurological disorders (Manangeeswaran et al. 2016).
53 ZIKV is a positive-sense (+)RNA virus, whose ~11 kb genome encodes a single long open reading
54 frame (ORF) used to generate the viral polyprotein. This long, unbroken protein coding region is
55 translated as a single polypeptide, which is later processed into the essential ZIKV proteins. At
56 either end of the genome are untranslated regions (UTRs): a short (107 nt) 5 UTR and a longer
57 (465 nt) 3 UTR. Both UTRs form conserved RNA structures with important functions (Goertz et
58 al. 2017). The 5 UTR, plus a stretch of the downstream coding region, contains several functional
59 structured domains (Figure 1A). The 5 UTR has a Y-shaped stem-loop A (SLA) motif, which acts
60 as the promoter of viral genomic (v)RNA replication (Filomatori et al. 2006; Lodeiro et al. 2009;
61 Thurner et al. 2004). Directly upstream, the start codon appears in a smaller stem loop, stem-loop
62 B (SLB), which facilitates RNA interactions with the 3 end (Alvarez et al. 2005). The cHP
63 domain, which overlaps the capsid protein coding region, is required for efficient vRNA synthesis;
64 additionally, cHP enhances start codon selection (Ye et al. 2016). The DCS-PK domain enhances
65 vRNA replication by promoting vRNA circularization (Liu et al. 2013). The 3 UTR contains six
66 recognized structured motifs (Goertz et al. 2017) (Figure 1B). The 3 UTR contains a large stem-
67 loop structure (SL) which is required for viral replication and is highly conserved throughout
68 flavivirus genomes (Davis et al. 2013; Elghonemy et al. 2005; Villordo et al. 2016; Yu & Markoff
69 2005; Zeng et al. 1998). Directly adjacent is a short, well conserved hairpin (sHP), thought to be
70 involved in genome circularization (Villordo et al. 2010). Upstream are two structures (DB-1 and
71 Ѱ-DB), which have been shown to be conserved and duplicated, though their specific function
72 remains unknown (Villordo et al. 2016). The two remaining structures, SLI and SLII, which are
73 resistant to host XRN1 exonucleases (Donald et al. 2016; Goertz et al. 2016; Pijlman et al. 2008),
74 lead to an abundance of a highly structured noncoding (nc)RNA: the subgenomic flavivirus
75 (sf)RNA, which is proposed to play roles in inhibition of the RIG-I host antiviral response
76 (Akiyama et al. 2016; Chapman et al. 2014; Kieft et al. 2015).
77 As structured RNA motifs play such key roles in ZIKV biology, we undertook a bioinformatics
78 and comparative sequence/structure analysis of the ZIKV genome; we analyzed the genome
79 sequence used in reverse genetics systems in the hopes of stimulating additional experimentation
80 to better understand RNA structures’ roles in ZIKV (Atieh et al. 2016). In this analysis, we adapted
81 approaches previously used to discover structured regions in other viral genomes, influenza A
82 virus (Moss et al. 2011) and Epstein–Barr virus (Moss & Steitz 2013), as well as the XIST long
83 ncRNA (Fang et al. 2015). These methods (as well as most others) rely on scanning analysis
84 windows, where multiple partially-overlapping windows are used to predict RNA structures and
85 folding metrics to deduce regions likely to form functional structure (Andrews et al. 2017; Gruber
86 et al. 2010; Washietl & Hofacker 2007). Separate structural models are produced for each window
87 and the window and step size can greatly affect the returned models. This makes deducing unique
88 local structures challenging, as the same nucleotides in each window can be alternatively base
89 paired. We have developed the ScanFold algorithm to overcome these challenges. ScanFold
90 (available from https://github.com/moss-lab/ScanFold) considers all base pairs from all analysis
91 windows and weights them based on their cumulative folding metrics. This yields a list of base
92 pairing partners most likely to generate unusually thermodynamically stable structures, effectively
93 compressing thousands of scanning window models into a single result. The results easily integrate
94 with well-established tools such as the Integrative Genomics Viewer (Busan & Weeks 2017;
95 Thorvaldsdottir et al. 2013), RNAstructure package (Reuter & Mathews 2010),
96 RNAStructuromeDB (Andrews et al. 2017), or VARNA (Darty et al. 2009), to allow for immediate
97 viewing of predicted RNA structures and/or metrics.
98 Materials and Methods
99
100 ScanFold
101 The analyzed genome was sequenced from the outbreak-lineage-derived reverse genetics system
102 (Atieh et al. 2016) (NCBI accession KJ776791.2). The scanning window analysis was performed
103 by the ScanFold-Scan program (https://github.com/moss-lab/ScanFold). Using a 120 nt window
104 and 1 nt step size, 10688 overlapping window sequences were analyzed, via the following
105 ScanFold-Scan process.
106 Each sequence is folded via RNAfold (Lorenz et al. 2011) (version 2.2.10) to calculate its native
107 Gibb’s free energy of folding (Gnative; measures how energetically favorable the predicted
108 structure is) and associated base pairing structure at 37ºC (human body temperature). Each
109 sequence is then scrambled to produce, in this case, 50 random sequences with the same
110 nucleotide content as the native sequence (randomization number was optimized to give stable z-
111 scores). Each randomized sequence is then folded in silico to calculate an average Grandom value
112 for use in the calculation of the thermodynamic z-score (Eq. 1; as adapted from (Clote et al.
113 2005)).
114 𝑧 ‒ 𝑠𝑐𝑜𝑟𝑒= 𝐺𝑛𝑎𝑡𝑖𝑣𝑒 ‒ 𝐺𝑟𝑎𝑛𝑑𝑜𝑚𝜎 #(𝑒𝑞1)115
116 The results of this scan (Table S1) are the input for the ScanFold-Fold program.
117 For each nucleotide (i) in the genome, the ScanFold-Fold program generates a list of all pairing
118 partners (j) (unpaired nucleotides are considered self-partnered) and records the metrics from all
119 windows where that partner appeared. For each i in the sequence, a selection algorithm is deployed
120 to determine the most likely i-j arrangement. First, a coverage-normalized z-score is calculated for
121 each i-j pair; calculated as the sum of the z-scores observed for i-j, divided by the total number of
122 windows i appears in. The i-j pair with the lowest coverage-normalized z-score is reported as the
123 most favorable i-j pair. This coverage-normalized z-score (as opposed to a simple z-score average)
124 gives more weight to i-j pairs which consistently appear in low z-score windows and provides a
125 normalized metric for comparison of regions with lower window coverage (near the ends, where
126 nucleotides are covered by only a few windows). Selection based on the coverage-normalized z-
127 score results in a list of the most favorable i-j pairs for each nucleotide of the input sequence. In
128 cases where different i’s compete for the same j, the i-j pair is “awarded” to the i-j pair with the
129 lower coverage-normalized z-score. Results of all possible partners per i-nucleotide of the genome,
130 as well as the final list of i-j partners, are reported (shown for ZIKV in Table S2).
131 Filtering for cumulatively low-z-score i-j pairs occurs after this initial selection process. In this
132 case, average z-scores for each i-j pair are considered as opposed to the coverage-normalized
133 score; only i-j pairs with average z-scores below a filter value are written to a CT file. By default,
134 the ScanFold-Fold program will write four CT files, each using a different filter value: -2, -1, 10
135 (which captures all values; referred to as no-filter), and a user input filter value.
136 Alignment
137 A total of 37 ZIKV genomes (curated in the ZikaVR database (Gupta et al. 2016)) were aligned
138 to KJ776791.2 using the MAFFT web server (Katoh et al. 2017; Kuraku et al. 2013) with default
139 settings. Aligned sequences (Table S3) were compared to ScanFold-Fold predicted bps (with z-
140 score < -1) to tabulate the types of base pairs that are found throughout the alignment (Table S4).
141 Results
142 The ZIKV genome was scanned using a 120 nt window with a 1 nt step: resulting in 10688
143 windows. For each window, several metrics of RNA folding were predicted (described in the
144 Results and Discussion of (Andrews et al. 2017)); two metrics are plotted vs. the ZIKV genome in
145 Figure 2: the Gibb’s free-energy of folding (G) and the thermodynamic energy z-score. The G
146 measures the thermodynamic stability of the optimum predicted structure; here, optimum is
147 defined as the structure with the lowest, most favorable, free energy: the minimum free energy
148 (MFE) structure. A low G indicates an RNA can stably fold, but does not necessarily indicate if
149 it is a functional structure. To gain this information, the thermodynamic z-score compares the
150 native sequence G to randomized sequence (in this case 50X randomizations of the ZIKV
151 nucleotides). Negative z-scores identify regions where the native ZIKV sequence have lower than
152 random free-energy, which can suggest that the sequence order (presumably due to evolution) is
153 set in such a way as to form favorable base pairing (Qu & Adelson 2012). Across the ZIKV genome
154 are regions where low G regions overlap low z-score regions, but also places where they do not
155 (Fig. 2). Low z-scores are a suggestion that the RNA may have a function mediated by structure
156 (Clote et al. 2005); indeed, this metric is at the core of many prediction methods. Negative z-scores
157 can highlight structural motifs with the best likelihood of being functional; however, even for a
158 relatively small genome such as ZIKV, 3349 windows had z-scores less than -1 (signifying the
159 window MFE prediction was one standard deviation more stable than random sequence) and 994
160 windows with z-scores less than -2 (Table S1). With so many windows of interest, and so many
161 competing models, it is a challenge to hone in on the most functionally significant base pairs.
162 A total of 22180 unique base pairs (bps) were predicted throughout all scanning windows (Table
163 S2); certain nucleotides (nt) were predicted to form bps with as many as 16 different partners,
164 highlighting the challenge of finding a single model (see nt-1099 in Table S2). For each bp,
165 ScanFold records the metrics from each window where the bp appears, generating a set of
166 cumulative metrics. These metrics are then used to select the bp with the lowest cumulative z-
167 score for each nt in the input sequence. This yielded a much smaller group of 2259 bps (Table S5),
168 which have the highest propensity to have been ordered (evolved) to fold—and thus be likely to
169 have functional significance. To focus on the most significant hits, a cumulative z-score filter is
170 applied to identify the bps which were consistently found in low z-score windows; by default, a
171 cumulative z-score filter of -1 and -2 (one and two standard deviations more stable than random,
172 respectively) are applied and results are reported in connectivity table (CT) files—a non-filtered
173 CT file and a user-defined filter value CT are generated as well.
174 With a cumulative z-score filter of -1, 1114 bps were identified (Table S6). Consistent with the
175 presence of structured functional motifs, bps with highly negative cumulative z-scores were found
176 within the known ZIKV structured regions; a total of 194 bps were found within the 5' and 3' end
177 structured domains. ScanFold was able to identify 86 of the 114 known bps (Goertz et al. 2017) in
178 the 3' UTR (Fig. S1) and 75 of the 81 known bps (Ye et al. 2016) within the 5' structured domain
179 (Fig. S2).
180 Further filtering the results to z-scores < -2 reduced the number of bps to 233 (Table S7). Of these,
181 207 were found either in the known structural regions, or immediately adjacent (within 240 nt; two
182 window’s length), suggesting that the regions of functional structure at either end may be larger
183 than previously thought (full structural models of the extended 5' and 3' ends are shown in Fig.
184 S3). The remaining 26 bps contribute to four structures found within the core coding region (Fig.
185 3 c-f).
186 To determine the structural conservation of these ScanFold identified bps, an alignment was
187 performed of 37 ZIKV genomes curated in the ZikaVR database (Gupta et al. 2016); aligned
188 sequences are reported in Table S3. The ScanFold bps (e.g., those contained in windows averaging
189 z-score < -1) were mapped to the alignment to determine the conservation of base pairing across
190 ZIKV genomes. When mutations occurred in predicted paired regions they generally preserved
191 base pairing: e.g. ScanFold predicted bps were over 96% conserved (Table S4). Multiple structure-
192 conserving mutations occur throughout novel predicted motifs (Fig. 3) as well as in previously
193 described ZIKV structures (Fig. 1), indicating that ZIKV may be evolving to preserve these novel
194 structured regions in addition to known functional RNA structures.
195 Discussion
196 This bioinformatics analysis of the ZIKV genome reiterates the prominence of RNA structure in
197 the 5 and 3 regions of the ZIKV genome, indicates that these (previously described) structured
198 regions could be larger, and reveals several potentially functional structures within the core coding
199 region. This was achieved by an approach which condenses thousands of scanning window models
200 into a single result; greatly reducing the dependence on subjective manual curation of results.
201 The ScanFold algorithm was able to successfully identify known bps within the 5 and 3 regions
202 (RNAstructure scorer (Bellaousov et al. 2013; Mathews 2004; Mathews et al. 1999) PPV and
203 sensitivity results in Table S8): in the 5 UTR region, ScanFold positively identified all known
204 structures (with slight variations in the stem of SLB; Fig. S2), while the 3 UTR model was accurate
205 in identifying the structures for DB-1, Ѱ-DB, sHP and SL, and only partially identifying the
206 exonuclease resistant SLI and SLII structures (Fig. S1); likely due to the presence of a complex
207 RNA pseudoknot structure (Akiyama et al. 2016), which can be difficult to predict computationally
208 due to non-nested base pairing that complicates the use of recursive algorithms (Schlick & Pyle
209 2017).
210 ScanFold identified structures directly upstream and downstream of the 5 and 3 UTR regions (nt
211 270-528 and nt 10237-10370, respectively), which have metrics that are equally as favorable as
212 those in the known structural regions. These structures (Fig. 3 a, b, and g) have unusually stable
213 thermodynamic stability (compared to random) and are well conserved throughout ZIKV genomes
214 (Table S4) and supported by structure-preserving mutations (Fig. 3). Their close proximity to
215 functional UTR regions (Fig. S3) suggests that they may play important roles in conjunction with
216 these other elements: e.g. in genome replication, processing, etc. The structures predicted in the
217 core coding region (Fig. 3 c-f), also have metrics that indicate they may have evolved to form
218 functional conformations, are conserved and have support from sequence variations (Fig. 3). There
219 are numerous potential functions for these core region RNA structural motifs: e.g. serving as sites
220 of post-transcriptional modifications (Coutard et al. 2017; Dong et al. 2014; Gokhale et al. 2016;
221 Lichinchi et al. 2016), facilitating packaging of the genome (Stockley et al. 2013), acting as
222 localization signals (Pratt & Mowry 2013), modulating the rate of translation to affect protein
223 folding (Khrustalev et al. 2017), or to affect transcript (genome) stability (Wu & Brewer 2012).
224 The map of the RNA structural landscape of the ZIKV genome presented in this report serves as a
225 guide for future analyses. The functional importance of these motifs can be tested in virio by
226 designing mutations to disrupt/compensate structure (e.g. using a tool such as RNA2DMut; (Moss
227 2018))—while maintaining (or minimally disrupting) amino acid coding (and codon use)—then
228 introducing them into the genome via a ZIKV reverse genetics system (Atieh et al. 2016) to assess
229 effects on viability and replication; For example, a similar strategy was used to test RNA structural
230 motifs in influenza A virus (Jiang et al. 2016). Furthermore, the presence of conserved base pairing
231 within ZIKV coding regions would be expected to impact its evolution (to maintain both functional
232 protein and RNA structure); thus, these data can also potentially be helpful in understanding
233 potential constraints placed on the evolution of this virus.
234 Conclusions
235 In conclusion, this report presents a bioinformatics scan of the ZIKV genome and a novel analysis
236 pipeline/method for functional RNA motif discovery that (1) recapitulates known functional motifs
237 toward the ends of the ZIKV genome, (2) suggests that these regions of RNA structure may be
238 larger than previously reported, and (3) finds novel motifs within the core coding region of the
239 genome. These results suggest additional downstream experiments to assign function to these
240 elements—e.g. by disrupting structure via genetic mutations introduced into the ZIKV reverse
241 genetics system.
242 Acknowledgements
243 We would like to thank the Roy J. Carver Charitable Trust for their continuing support.
244
245 References
246 2017. Zika virus: an epidemiological update. Wkly Epidemiol Rec 92:188-192.
247 Akiyama BM, Laurence HM, Massey AR, Costantino DA, Xie X, Yang Y, Shi PY, Nix JC, Beckham
248 JD, and Kieft JS. 2016. Zika virus produces noncoding RNAs using a multi-pseudoknot
249 structure that confounds a cellular exonuclease. Science 354:1148-1152.
250 10.1126/science.aah3963
251 Alvarez DE, Lodeiro MF, Luduena SJ, Pietrasanta LI, and Gamarnik AV. 2005. Long-range
252 RNA-RNA interactions circularize the dengue virus genome. J Virol 79:6631-6643.
253 10.1128/JVI.79.11.6631-6643.2005
254 Andrews RJ, Baber L, and Moss WN. 2017. RNAStructuromeDB: A genome-wide database for
255 RNA structural inference. Sci Rep 7:17269. 10.1038/s41598-017-17510-y
256 Atieh T, Baronti C, de Lamballerie X, and Nougairede A. 2016. Simple reverse genetics
257 systems for Asian and African Zika viruses. Sci Rep 6:39384. 10.1038/srep39384
258 Bellaousov S, Reuter JS, Seetin MG, and Mathews DH. 2013. RNAstructure: Web servers for
259 RNA secondary structure prediction and analysis. Nucleic Acids Res 41:W471-474.
260 10.1093/nar/gkt290
261 Broutet N, Krauer F, Riesen M, Khalakdina A, Almiron M, Aldighieri S, Espinal M, Low N, and
262 Dye C. 2016. Zika Virus as a Cause of Neurologic Disorders. New England Journal of
263 Medicine 374:1506-1509. 10.1056/NEJMp1602708
264 Busan S, and Weeks KM. 2017. Visualization of RNA structure models within the Integrative
265 Genomics Viewer. RNA 23:1012-1018. 10.1261/rna.060194.116
266 Chapman EG, Moon SL, Wilusz J, and Kieft JS. 2014. RNA structures that resist degradation
267 by Xrn1 produce a pathogenic Dengue virus RNA. Elife 3. ARTN e01892
268 10.7554/eLife.01892
269 Clote P, Ferre F, Kranakis E, and Krizanc D. 2005. Structural RNA has lower folding energy
270 than random RNA of the same dinucleotide frequency. RNA 11:578-591.
271 10.1261/rna.7220505
272 Coutard B, Barral K, Lichiere J, Selisko B, Martin B, Aouadi W, Lombardia MO, Debart F,
273 Vasseur JJ, Guillemot JC, Canard B, and Decroly E. 2017. Zika Virus Methyltransferase:
274 Structure and Functions for Drug Design Perspectives. J Virol 91. 10.1128/JVI.02202-
275 16
276 Darty K, Denise A, and Ponty Y. 2009. VARNA: Interactive drawing and editing of the RNA
277 secondary structure. Bioinformatics 25:1974-1975. 10.1093/bioinformatics/btp250
278 Davis WG, Basu M, Elrod EJ, Germann MW, and Brinton MA. 2013. Identification of cis-Acting
279 Nucleotides and a Structural Feature in West Nile Virus 3 '-Terminus RNA That
280 Facilitate Viral Minus Strand RNA Synthesis. Journal of Virology 87:7622-7636.
281 10.1128/Jvi.00212-13
282 Dick GW. 1952. Zika virus. II. Pathogenicity and physical properties. Trans R Soc Trop Med
283 Hyg 46:521-534.
284 Dick GW, Kitchen SF, and Haddow AJ. 1952. Zika virus. I. Isolations and serological specificity.
285 Trans R Soc Trop Med Hyg 46:509-520.
286 Donald CL, Brennan B, Cumberworth SL, Rezelj VV, Clark JJ, Cordeiro MT, Freitas de Oliveira
287 Franca R, Pena LJ, Wilkie GS, Da Silva Filipe A, Davis C, Hughes J, Varjak M, Selinger M,
288 Zuvanov L, Owsianka AM, Patel AH, McLauchlan J, Lindenbach BD, Fall G, Sall AA, Biek
289 R, Rehwinkel J, Schnettler E, and Kohl A. 2016. Full Genome Sequence and sfRNA
290 Interferon Antagonist Activity of Zika Virus from Recife, Brazil. PLoS Negl Trop Dis
291 10:e0005048. 10.1371/journal.pntd.0005048
292 Dong H, Fink K, Zust R, Lim SP, Qin CF, and Shi PY. 2014. Flavivirus RNA methylation. J Gen
293 Virol 95:763-778. 10.1099/vir.0.062208-0
294 Elghonemy S, Davis WG, and Brinton MA. 2005. The majority of the nucleotides in the top
295 loop of the genomic 3 ' terminal stem loop structure are cis-acting in a West Nile virus
296 infectious clone. Virology 331:238-246. 10.1016/j.virol.2004.11.008
297 Fang R, Moss WN, Rutenberg-Schoenberg M, and Simon MD. 2015. Probing Xist RNA
298 Structure in Cells Using Targeted Structure-Seq. PLoS Genet 11:e1005668.
299 10.1371/journal.pgen.1005668
300 Filomatori CV, Lodeiro MF, Alvarez DE, Samsa MM, Pietrasanta L, and Gamarnik AV. 2006. A
301 5' RNA element promotes dengue virus RNA synthesis on a circular genome. Genes
302 Dev 20:2238-2249. 10.1101/gad.1444206
303 Goertz GP, Abbo SR, Fros JJ, and Pijlman GP. 2017. Functional RNA during Zika virus infection.
304 Virus Res. 10.1016/j.virusres.2017.08.015
305 Goertz GP, Fros JJ, Miesen P, Vogels CBF, van der Bent ML, Geertsema C, Koenraadt CJM, van
306 Rij RP, van Oers MM, and Pijlman GP. 2016. Noncoding Subgenomic Flavivirus RNA Is
307 Processed by the Mosquito RNA Interference Machinery and Determines West Nile
308 Virus Transmission by Culex pipiens Mosquitoes. Journal of Virology 90:10145-
309 10159. 10.1128/Jvi.00930-16
310 Gokhale NS, McIntyre ABR, McFadden MJ, Roder AE, Kennedy EM, Gandara JA, Hopcraft SE,
311 Quicke KM, Vazquez C, Willer J, Ilkayeva OR, Law BA, Holley CL, Garcia-Blanco MA,
312 Evans MJ, Suthar MS, Bradrick SS, Mason CE, and Horner SM. 2016. N6-
313 Methyladenosine in Flaviviridae Viral RNA Genomes Regulates Infection. Cell Host
314 Microbe 20:654-665. 10.1016/j.chom.2016.09.015
315 Gruber AR, Findeiss S, Washietl S, Hofacker IL, and Stadler PF. 2010. RNAz 2.0: improved
316 noncoding RNA detection. Pac Symp Biocomput:69-79.
317 Gupta AK, Kaur K, Rajput A, Dhanda SK, Sehgal M, Khan MS, Monga I, Dar SA, Singh S, Nagpal
318 G, Usmani SS, Thakur A, Kaur G, Sharma S, Bhardwaj A, Qureshi A, Raghava GP, and
319 Kumar M. 2016. ZikaVR: An Integrated Zika Virus Resource for Genomics, Proteomics,
320 Phylogenetic and Therapeutic Analysis. Sci Rep 6:32713. 10.1038/srep32713
321 Jiang T, Nogales A, Baker SF, Martinez-Sobrido L, and Turner DH. 2016. Mutations Designed
322 by Ensemble Defect to Misfold Conserved RNA Structures of Influenza A Segments 7
323 and 8 Affect Splicing and Attenuate Viral Replication in Cell Culture. PLoS One
324 11:e0156906. 10.1371/journal.pone.0156906
325 Katoh K, Rozewicki J, and Yamada KD. 2017. MAFFT online service: multiple sequence
326 alignment, interactive sequence choice and visualization. Brief Bioinform.
327 10.1093/bib/bbx108
328 Khrustalev VV, Khrustaleva TA, Sharma N, and Giri R. 2017. Mutational Pressure in Zika
329 Virus: Local ADAR-Editing Areas Associated with Pauses in Translation and
330 Replication. Front Cell Infect Microbiol 7:44. 10.3389/fcimb.2017.00044
331 Kieft JS, Rabe JL, and Chapman EG. 2015. New hypotheses derived from the structure of a
332 flaviviral Xrn1-resistant RNA: Conservation, folding, and host adaptation. RNA Biol
333 12:1169-1177. 10.1080/15476286.2015.1094599
334 Kuraku S, Zmasek CM, Nishimura O, and Katoh K. 2013. aLeaves facilitates on-demand
335 exploration of metazoan gene family trees on MAFFT sequence alignment server with
336 enhanced interactivity. Nucleic Acids Res 41:W22-28. 10.1093/nar/gkt389
337 Lanciotti RS, Lambert AJ, Holodniy M, Saavedra S, and Signor Ldel C. 2016. Phylogeny of Zika
338 Virus in Western Hemisphere, 2015. Emerg Infect Dis 22:933-935.
339 10.3201/eid2205.160065
340 Lichinchi G, Zhao BS, Wu Y, Lu Z, Qin Y, He C, and Rana TM. 2016. Dynamics of Human and
341 Viral RNA Methylation during Zika Virus Infection. Cell Host Microbe 20:666-673.
342 10.1016/j.chom.2016.10.002
343 Liu ZY, Li XF, Jiang T, Deng YQ, Zhao H, Wang HJ, Ye Q, Zhu SY, Qiu Y, Zhou X, Qin ED, and Qin
344 CF. 2013. Novel cis-Acting Element within the Capsid-Coding Region Enhances
345 Flavivirus Viral-RNA Replication by Regulating Genome Cyclization. Journal of
346 Virology 87:6804-6818. 10.1128/Jvi.00243-13
347 Lodeiro MF, Filomatori CV, and Gamarnik AV. 2009. Structural and functional studies of the
348 promoter element for dengue virus RNA replication. J Virol 83:993-1008.
349 10.1128/JVI.01647-08
350 Lorenz R, Bernhart SH, Honer Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, and Hofacker
351 IL. 2011. ViennaRNA Package 2.0. Algorithms Mol Biol 6:26. 10.1186/1748-7188-6-
352 26
353 Manangeeswaran M, Ireland DDC, and Verthelyi D. 2016. Zika (PRVABC59) Infection Is
354 Associated with T cell Infiltration and Neurodegeneration in CNS of
355 Immunocompetent Neonatal C57Bl/6 Mice. Plos Pathogens 12. ARTN e1006004
356 10.1371/journal.ppat.1006004
357 Mathews DH. 2004. Using an RNA secondary structure partition function to determine
358 confidence in base pairs predicted by free energy minimization. RNA 10:1178-1190.
359 10.1261/rna.7650904
360 Mathews DH, Sabina J, Zuker M, and Turner DH. 1999. Expanded sequence dependence of
361 thermodynamic parameters improves prediction of RNA secondary structure. J Mol
362 Biol 288:911-940. 10.1006/jmbi.1999.2700
363 Miner JJ, and Diamond MS. 2017. Zika Virus Pathogenesis and Tissue Tropism. Cell Host
364 Microbe 21:134-142. 10.1016/j.chom.2017.01.004
365 Moss WN. 2018. RNA2DMut: a web tool for the design and analysis of RNA structure
366 mutations. RNA 24:273-286. 10.1261/rna.063933.117
367 Moss WN, Priore SF, and Turner DH. 2011. Identification of potential conserved RNA
368 secondary structure throughout influenza A coding regions. RNA 17:991-1011.
369 10.1261/rna.2619511
370 Moss WN, and Steitz JA. 2013. Genome-wide analyses of Epstein-Barr virus reveal conserved
371 RNA structures and a novel stable intronic sequence RNA. BMC Genomics 14:543.
372 10.1186/1471-2164-14-543
373 Pijlman GP, Funk A, Kondratieva N, Leung J, Torres S, van der Aa L, Liu WJ, Palmenberg AC,
374 Shi PY, Hall RA, and Khromykh AA. 2008. A Highly Structured, Nuclease-Resistant,
375 Noncoding RNA Produced by Flaviviruses Is Required for Pathogenicity. Cell Host &
376 Microbe 4:579-591. 10.1016/j.chom.2008.10.007
377 Pratt CA, and Mowry KL. 2013. Taking a cellular road-trip: mRNA transport and anchoring.
378 Curr Opin Cell Biol 25:99-106. 10.1016/j.ceb.2012.08.015
379 Qu Z, and Adelson DL. 2012. Evolutionary conservation and functional roles of ncRNA. Front
380 Genet 3:205. 10.3389/fgene.2012.00205
381 Reuter JS, and Mathews DH. 2010. RNAstructure: software for RNA secondary structure
382 prediction and analysis. BMC Bioinformatics 11:129. 10.1186/1471-2105-11-129
383 Saiz JC, Vazquez-Calvo A, Blazquez AB, Merino-Ramos T, Escribano-Romero E, and Martin-
384 Acebes MA. 2016. Zika Virus: the Latest Newcomer. Front Microbiol 7:496.
385 10.3389/fmicb.2016.00496
386 Schlick T, and Pyle AM. 2017. Opportunities and Challenges in RNA Structural Modeling and
387 Design. Biophys J 113:225-234. 10.1016/j.bpj.2016.12.037
388 Stockley PG, Twarock R, Bakker SE, Barker AM, Borodavka A, Dykeman E, Ford RJ, Pearson
389 AR, Phillips SE, Ranson NA, and Tuma R. 2013. Packaging signals in single-stranded
390 RNA viruses: nature's alternative to a purely electrostatic assembly mechanism. J Biol
391 Phys 39:277-287. 10.1007/s10867-013-9313-0
392 Thorvaldsdottir H, Robinson JT, and Mesirov JP. 2013. Integrative Genomics Viewer (IGV):
393 high-performance genomics data visualization and exploration. Brief Bioinform
394 14:178-192. 10.1093/bib/bbs017
395 Thurner C, Witwer C, Hofacker IL, and Stadler PF. 2004. Conserved RNA secondary
396 structures in Flaviviridae genomes. J Gen Virol 85:1113-1124. 10.1099/vir.0.19462-0
397 Villordo SM, Alvarez DE, and Gamarnik AV. 2010. A balance between circular and linear
398 forms of the dengue virus genome is crucial for viral replication. Rna-a Publication of
399 the Rna Society 16:2325-2335. 10.1261/rna.2120410
400 Villordo SM, Carballeda JM, Filomatori CV, and Gamarnik AV. 2016. RNA Structure
401 Duplications and Flavivirus Host Adaptation. Trends Microbiol 24:270-283.
402 10.1016/j.tim.2016.01.002
403 Washietl S, and Hofacker IL. 2007. Identifying structural noncoding RNAs using RNAz. Curr
404 Protoc Bioinformatics Chapter 12:Unit 12 17. 10.1002/0471250953.bi1207s19
405 Wu X, and Brewer G. 2012. The regulation of mRNA stability in mammalian cells: 2.0. Gene
406 500:10-21. 10.1016/j.gene.2012.03.021
407 Ye Q, Liu ZY, Han JF, Jiang T, Li XF, and Qin CF. 2016. Genomic characterization and
408 phylogenetic analysis of Zika virus circulating in the Americas. Infect Genet Evol
409 43:43-49. 10.1016/j.meegid.2016.05.004
410 Yu L, and Markoff L. 2005. The topology of bulges in the long stem of the flavivirus 3 ' stem-
411 loop is a major determinant of RNA replication competence. Journal of Virology
412 79:2309-2324. 10.1128/Jvi.79.4.2309-2324.2005
413 Zeng LL, Falgout B, and Markoff L. 1998. Identification of specific nucleotide sequences
414 within the conserved 3 '-SL in the dengue type 2 virus genome required for
415 replication. Journal of Virology 72:7510-7522.
416
417
418 Figure Legends
419
420 Figure 1. Models of the known functional RNA structural motifs found in the 5' and 3' end regions
421 of the ZIKV genome (KJ776791.2). (a) Structure model of the 5' UTR region as shown in (Ye et
422 al. 2016) mapped to the KJ776791.2 sequence. (b) Structure model of the 3' UTR region as shown
423 in (Goertz et al. 2017) mapped to the KJ776791.2 sequence. Base pairs are colored based on their
424 appearance in ScanFold filtered results; blue lines depict bps that were predicted in the z-score <
425 -2 results (Table S7), green lines refer to bps that were predicted in the z-score < -1 results (Table
426 S6), and grey lines were not predicted. Structure preserving mutations (from the alignment of
427 curated ZIKV genomes compared to the structures of known motifs) are highlighted with green
428 circles.
429 Figure 2. Bioinformatics scans of the ZIKV genome. (a) The predicted minimum folding free
430 energy (MFE) and (b) z-score for all RNA window segments. (c) Genome feature annotations are
431 shown; the polyprotein region has been broken down for visualization of individual coding
432 sequence regions. All data was visualized, archived, and is available for browsing/download on
433 the RNAStructuromeDB https://structurome.bb.iastate.edu/.
434 Figure 3. Structure models of coding region motifs that contain ScanFold (z-score < -2) bp hits.
435 Structures are shown in the order they appear throughout the ZIKV genome (KJ776791.2). Figures
436 (a) and (b) depict the structural models of the ScanFold predicted structures closely upstream from
437 the previously annotated 5' structured region, located from nt 274-326 and 433-529 respectively.
438 Figures (c), (d), (e), and (f) are structures which appeared within the core coding region located
439 from nt 4463-4520, 5587-5702, 7096-7205, and 7651-7755 respectively. Structure (g) is directly
440 upstream of the 3' structured region, located at nt 10237-10368, and has been annotated to show
441 the location of the stop codon near the terminal loop (circled in green). Base pairs are colored
442 based on their z-score cutoff; green lines show bps which were predicted in the z-score < -1 results
443 (Table S6) and blue lines show bps which were predicted in z-score < -2 results (Table S7). Sites
444 with structure-preserving mutations are highlighted with filled green circles.
445 Supplementary information
446 Figure S1. Comparative arc diagram depicting the previously described 5' RNA structural motifs
447 vs. the ScanFold predicted bps. (a) Arc diagram of 5' end region as predicted via ScanFold; base
448 pairs are colored based on their z-score cutoff where blue lines depict bps which were predicted in
449 the z-score < -2 results (Table S7) and green lines refer to bps which were predicted in the z-score
450 < -1 results (Table S6). (b) Arc diagram of the accepted secondary structure model for the 5' end
451 of ZIKV as shown in (Ye et al. 2016) and mapped to the KJ776791.2 sequence.
452 Figure S2. Comparative arc diagram depicting the known RNA structural motifs vs. the ScanFold
453 predicted bps. (a) Arc diagram of 3' end region as predicted via ScanFold; base pairs are colored
454 based on their z-score cutoff where blue lines depict bps which were predicted in the z-score < -2
455 results (Table S7), green lines refer to bps which were predicted in the z-score < -1 results (Table
456 S6), and yellow lines were predicted in the no filter results (Table S5). (b) Arc diagram of the
457 accepted secondary structure model for the 5’ end of ZIKV as shown in (Goertz et al. 2017)
458 mapped to the KJ776791.2 sequence. The start codon nucleotide locations have been highlighted
459 with a light blue bar.
460 Figure S3. Secondary structure model depicting the ScanFold proposed structures within and
461 directly adjacent to known 5' and 3' structured regions. Base pairs are colored based on their z-
462 score cutoff: blue lines depict bps which were predicted in the z-score < -2 results (Table S7),
463 green lines refer to bps which were predicted in the z-score < -1 results (Table S6), and yellow
464 lines were predicted in the no filter results (Table S5). The start and stop codon nucleotides have
465 been circled and labeled in blue and green respectively. Nucleotides which established ScanFold
466 bp preserving mutations within the alignment are highlighted with filled green circles.
467 Table S1. Results of the scanning window analysis of the the ZIKV genome (NCBI accession
468 KJ776791.2) as ouput from the ScanFold-Scan program. Each row contains the data calculated for
469 each window. Columns A and B are the starting (i) and ending (j) coordinates of the window
470 fragment. Column C is the temperature used for all RNAFold calculations. Column D-H refer to
471 the ∆Gnative, thermodynamic z-score, stability ratio p-value, ensemble diversity, and frequency-of-
472 MFE (fMFE) values respectively (detailed descriptions of all metrics can be found at the
473 RNAStructuromeDB https://structurome.bb.iastate.edu or the corresponding manuscript
474 (Andrews et al. 2017)). Column I contains the sequence of the window; the ∆Gnative and centroid
475 structure of this sequence are shown in Column J and K. Column L-O report nucleotide counts for
476 the window sequence.
477 Table S2. ScanFold log file produced during the ScanFold-Fold portion of the program. The log
478 file is separated into two portions. The first half (row 1 to 87,448) contains a table for each
479 nucleotide in the sequence. These tables contain the cumulative base pairing information for that
480 nucleotide as predicted throughout the scan. Column A refers to the i-nucleotide of the sequence.
481 Column B refers to the coordinate of the j base pair. The total number of windows the i-j pair
482 appears, as well as the total number of windows the i-nucleotide appears are reported in column
483 D. The average window minimum free energy, z-score, and ensemble diversity of each i-j pair are
484 reported in columns E-G respectively. Column H reports the sum of z-scores for each i-j pair,
485 which is used to calculate the coverage-normalized z-score (calculated as the sum of z-score over
486 total windows in which i-nucleotide appeared) as reported in Column I. Column J reports a
487 summary of the bps predicted for each i-nucleotide. The second half of the log file, starting at row
488 87,449, is a list of the most favorable i-j pairs (column B and C) associated with the i-nucleotide
489 listed in column A. In places where this nucleotide competed with other i-nucleotides for the same
490 j-nucleotide, the “winning” i-j pair is reported and denoted with an asterisk (in some cases the
491 winning i-j pair does not contain the original i-nucleotide or may be unpaired). Columns D, E, and
492 F, contain the average window minimum free energy, z-score and ensemble diversity for the
493 corresponding i-j pair.
494 Table S3. Results of 37 ZIKV genomes curated in the ZikaVR database (Gupta et al. 2016) aligned
495 to KJ776791.2. Genomes were aligned using the MAFFT web server (Katoh et al. 2017; Kuraku
496 et al. 2013) with default settings. Headings for each result contain the NCBI accession numbers
497 and name of the aligned sequence name.
498 Table S4. Base pair counts tabulating the number and type of base pair which appears in the
499 ScanFold < -1 predicted structure when compared to 37 aligned ZIKV genome. A total of 37
500 ZIKV genomes were aligned to KJ776791.2 using the MAFFT web server (Katoh et al. 2017;
501 Kuraku et al. 2013) using default settings. Aligned sequences were compared to ScanFold-Fold
502 predicted bps (with z-score < -1) to tabulate the types of base pairs which are found throughout
503 the alignment (Table S3). Column S reports the percent of canonical bps which were found to be
504 allowed throughout the alignment for that base pair and column T reports the different number of
505 canonical base pair types. Results for the previously reported 5' and 3' UTR structural regions
506 appear as separate worksheets.
507 Table S5. CT file of the default no-filter results output from ScanFold-Fold.
508 Table S6. CT file of the default z-score < -1 results output from ScanFold-Fold.
509 Table S7. CT file of the default z-score < -2 results output from ScanFold-Fold.
510 Table S8. RNAstructure webserver scorer results of the ScanFold predicted structures compared
511 to accepted structures. The structures and sequences were uploaded to the server as shown, and
512 scorer was run with default settings.
Figure 1
Figure 1. Models of the known functional RNA structural motifs found in the 5' and 3'
end regions of the ZIKV genome (KJ776791.2).
(a) Structure model of the 5' UTR region as shown in (Ye et al. 2016) mapped to the
KJ776791.2 sequence. (b) Structure model of the 3' UTR region as shown in (Goertz et al.
2017) mapped to the KJ776791.2 sequence. Base pairs are colored based on their
appearance in ScanFold filtered results; blue lines depict bps that were predicted in the z-
score < -2 results (Table S7), green lines refer to bps that were predicted in the z-score < -1
results (Table S6), and grey lines were not predicted. Structure preserving mutations (from
the alignment of curated ZIKV genomes compared to the structures of known motifs) are
highlighted with green circles.
Figure 2(on next page)
Figure 2. Bioinformatics scans of the ZIKV genome.
(a) The predicted minimum folding free energy (MFE) and (b) z-score for all RNA window
segments. (c) Genome feature annotations are shown; the polyprotein region has been
broken down for visualization of individual coding sequence regions. All data was visualized,
archived, and is available for browsing/download on the RNAStructuromeDB
https://structurome.bb.iastate.edu/.
-10
-20
-30
-40
-50
-60
ZIKV MFE
3
2
1
0
-1
-2
-3
-4
-5
ZIKV z-score
ZIKV features
10,50010,0009,5009,0008,5008,0007,5007,0006,5006,0005,5005,0004,5004,0003,5003,0002,5002,0001,5001,0005000
Genome Track View Help Share
KJ776791.2 GoKJ776791.2:1..10807 (10.81 Kb)
0 1,250 2,500 3,750 5,000 6,250 7,500 8,750 10,000
Select tracks
a
b
c
Figure 3
Figure 3. Structure models of coding region motifs that contain ScanFold (z-score < -2)
bp hits.
Structures are shown in the order they appear throughout the ZIKV genome (KJ776791.2).
Figures (a) and (b) depict the structural models of the ScanFold predicted structures closely
upstream from the previously annotated 5' structured region, located from nt 274-326 and
433-529 respectively. Figures (c), (d), (e), and (f) are structures which appeared within the
core coding region located from nt 4463-4520, 5587-5702, 7096-7205, and 7651-7755
respectively. Structure (g) is directly upstream of the 3' structured region, located at nt
10237-10368, and has been annotated to show the location of the stop codon near the
terminal loop (circled in green). Base pairs are colored based on their z-score cutoff; green
lines show bps which were predicted in the z-score < -1 results (Table S6) and blue lines
show bps which were predicted in z-score < -2 results (Table S7). Sites with structure-
preserving mutations are highlighted with filled green circles.