26
A peer-reviewed version of this preprint was published in PeerJ on 18 December 2018. View the peer-reviewed version (peerj.com/articles/6136), which is the preferred citable publication unless you specifically need to cite this preprint. Andrews RJ, Roche J, Moss WN. 2018. ScanFold: an approach for genome-wide discovery of local RNA structural elements—applications to Zika virus and HIV. PeerJ 6:e6136 https://doi.org/10.7717/peerj.6136

Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

A peer-reviewed version of this preprint was published in PeerJ on 18December 2018.

View the peer-reviewed version (peerj.com/articles/6136), which is thepreferred citable publication unless you specifically need to cite this preprint.

Andrews RJ, Roche J, Moss WN. 2018. ScanFold: an approach for genome-widediscovery of local RNA structural elements—applications to Zika virus andHIV. PeerJ 6:e6136 https://doi.org/10.7717/peerj.6136

Page 2: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

Genome-wide discovery of local RNA structural elements in

Zika virus

Ryan J Andrews 1 , Julien Roche 1 , Walter N Moss Corresp. 1

1 Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States

Corresponding Author: Walter N Moss

Email address: [email protected]

In addition to encoding RNA primary structures, genomes also encode RNA secondary and tertiary

structures that play roles in gene regulation and, in the case of RNA viruses, genome replication.

Methods for the identification of functional RNA structures in genomes typically rely on scanning analysis

windows, where multiple partially-overlapping windows are used to predict RNA structures and folding

metrics to deduce regions likely to form functional structure. Separate structural models are produced for

each window, where the step size can greatly affect the returned model. This makes deducing unique

local structures challenging, as the same nucleotides in each window can be alternatively base paired. In

the presented approach, all base pairs from all analysis windows are considered and weighted by

favorable folding metrics throughout all windows. This results in unique base pairing throughout the

genome and the generation of local regions/structures that can be ranked by their propensity to form

unusually thermodynamically stable folds.

This approach was applied to the Zika virus (ZIKV) genome. ZIKV is linked to a variety of neurological

ailments including microcephaly and Guillain-Barré syndrome and its (+)-sense RNA genome encodes

two, previously described, functionally essential structured RNA regions. Our approach is able to

successfully identify and model the structures of these regions, while also finding additional regions likely

to form functional RNA structures throughout the viral polyprotein coding region. All data for the ZIKV

genome have been archived at the RNAStructuromeDB, a repository of RNA folding data for humans and

their pathogens.

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27101v1 | CC BY 4.0 Open Access | rec: 9 Aug 2018, publ: 9 Aug 2018

Page 3: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

1 Genome-wide discovery of local RNA structural elements in Zika virus.

2 Ryan J. Andrews1, Julien Roche1 and Walter N. Moss1,*

3 1Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State

4 University, Ames, IA 50011, United States of America

5 *Corresponding Author: e-mail: [email protected], Address: Roy J. Carver Department of

6 Biochemistry, Biophysics and Molecular Biology, Molecular Biology Building, 2437 Pammel

7 Drive, Ames, IA 50011-1079

8 Corresponding Author:

9 Walter Moss1

10 Email address: [email protected]

11

12

13

14

15

16

17

18

19

20

21

22

Page 4: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

23 Abstract

24 In addition to encoding RNA primary structures, genomes also encode RNA secondary and tertiary

25 structures that play roles in gene regulation and, in the case of RNA viruses, genome replication.

26 Methods for the identification of functional RNA structures in genomes typically rely on scanning

27 analysis windows, where multiple partially-overlapping windows are used to predict RNA

28 structures and folding metrics to deduce regions likely to form functional structure. Separate

29 structural models are produced for each window, where the step size can greatly affect the returned

30 model. This makes deducing unique local structures challenging, as the same nucleotides in each

31 window can be alternatively base paired. In the presented approach, all base pairs from all analysis

32 windows are considered and weighted by favorable folding metrics throughout all windows. This

33 results in unique base pairing throughout the genome and the generation of local regions/structures

34 that can be ranked by their propensity to form unusually thermodynamically stable folds.

35 This approach was applied to the Zika virus (ZIKV) genome. ZIKV is linked to a variety of

36 neurological ailments including microcephaly and Guillain-Barré syndrome and its (+)-sense RNA

37 genome encodes two, previously described, functionally essential structured RNA regions. Our

38 approach is able to successfully identify and model the structures of these regions, while also

39 finding additional regions likely to form functional RNA structures throughout the viral

40 polyprotein coding region. All data for the ZIKV genome have been archived at the

41 RNAStructuromeDB, a repository of RNA folding data for humans and their pathogens.

42 Introduction

43 Zika virus (ZIKV) was first isolated from African rhesus monkeys in 1947 (Dick 1952; Dick et al.

44 1952). The natural transmission cycle of ZIKV involves mosquitos as vectors and monkeys as

45 reservoirs, where humans are only occasional hosts (Saiz et al. 2016). Recent outbreaks, such as

Page 5: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

46 the 2007 Micronesian epidemic and the 2013 outbreak in French Polynesia (Lanciotti et al. 2016),

47 has seen ZIKV spread beyond Africa to reach a global population (48 countries as of 2016; (2017)).

48 Initial infection is fairly benign—joint pain, fever, rash, as well as head and muscle aches—

49 however, when contracted by pregnant women, ZIKV can cause harm to the developing baby.

50 ZIKV infection is definitively linked to neonatal microcephaly and Guillain-Barré syndrome

51 (Miner & Diamond 2017), eye/hearing problems and impaired growth (Broutet et al. 2016); it is

52 also implicated in other neurological disorders (Manangeeswaran et al. 2016).

53 ZIKV is a positive-sense (+)RNA virus, whose ~11 kb genome encodes a single long open reading

54 frame (ORF) used to generate the viral polyprotein. This long, unbroken protein coding region is

55 translated as a single polypeptide, which is later processed into the essential ZIKV proteins. At

56 either end of the genome are untranslated regions (UTRs): a short (107 nt) 5 UTR and a longer

57 (465 nt) 3 UTR. Both UTRs form conserved RNA structures with important functions (Goertz et

58 al. 2017). The 5 UTR, plus a stretch of the downstream coding region, contains several functional

59 structured domains (Figure 1A). The 5 UTR has a Y-shaped stem-loop A (SLA) motif, which acts

60 as the promoter of viral genomic (v)RNA replication (Filomatori et al. 2006; Lodeiro et al. 2009;

61 Thurner et al. 2004). Directly upstream, the start codon appears in a smaller stem loop, stem-loop

62 B (SLB), which facilitates RNA interactions with the 3 end (Alvarez et al. 2005). The cHP

63 domain, which overlaps the capsid protein coding region, is required for efficient vRNA synthesis;

64 additionally, cHP enhances start codon selection (Ye et al. 2016). The DCS-PK domain enhances

65 vRNA replication by promoting vRNA circularization (Liu et al. 2013). The 3 UTR contains six

66 recognized structured motifs (Goertz et al. 2017) (Figure 1B). The 3 UTR contains a large stem-

67 loop structure (SL) which is required for viral replication and is highly conserved throughout

68 flavivirus genomes (Davis et al. 2013; Elghonemy et al. 2005; Villordo et al. 2016; Yu & Markoff

Page 6: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

69 2005; Zeng et al. 1998). Directly adjacent is a short, well conserved hairpin (sHP), thought to be

70 involved in genome circularization (Villordo et al. 2010). Upstream are two structures (DB-1 and

71 Ѱ-DB), which have been shown to be conserved and duplicated, though their specific function

72 remains unknown (Villordo et al. 2016). The two remaining structures, SLI and SLII, which are

73 resistant to host XRN1 exonucleases (Donald et al. 2016; Goertz et al. 2016; Pijlman et al. 2008),

74 lead to an abundance of a highly structured noncoding (nc)RNA: the subgenomic flavivirus

75 (sf)RNA, which is proposed to play roles in inhibition of the RIG-I host antiviral response

76 (Akiyama et al. 2016; Chapman et al. 2014; Kieft et al. 2015).

77 As structured RNA motifs play such key roles in ZIKV biology, we undertook a bioinformatics

78 and comparative sequence/structure analysis of the ZIKV genome; we analyzed the genome

79 sequence used in reverse genetics systems in the hopes of stimulating additional experimentation

80 to better understand RNA structures’ roles in ZIKV (Atieh et al. 2016). In this analysis, we adapted

81 approaches previously used to discover structured regions in other viral genomes, influenza A

82 virus (Moss et al. 2011) and Epstein–Barr virus (Moss & Steitz 2013), as well as the XIST long

83 ncRNA (Fang et al. 2015). These methods (as well as most others) rely on scanning analysis

84 windows, where multiple partially-overlapping windows are used to predict RNA structures and

85 folding metrics to deduce regions likely to form functional structure (Andrews et al. 2017; Gruber

86 et al. 2010; Washietl & Hofacker 2007). Separate structural models are produced for each window

87 and the window and step size can greatly affect the returned models. This makes deducing unique

88 local structures challenging, as the same nucleotides in each window can be alternatively base

89 paired. We have developed the ScanFold algorithm to overcome these challenges. ScanFold

90 (available from https://github.com/moss-lab/ScanFold) considers all base pairs from all analysis

91 windows and weights them based on their cumulative folding metrics. This yields a list of base

Page 7: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

92 pairing partners most likely to generate unusually thermodynamically stable structures, effectively

93 compressing thousands of scanning window models into a single result. The results easily integrate

94 with well-established tools such as the Integrative Genomics Viewer (Busan & Weeks 2017;

95 Thorvaldsdottir et al. 2013), RNAstructure package (Reuter & Mathews 2010),

96 RNAStructuromeDB (Andrews et al. 2017), or VARNA (Darty et al. 2009), to allow for immediate

97 viewing of predicted RNA structures and/or metrics.

98 Materials and Methods

99

100 ScanFold

101 The analyzed genome was sequenced from the outbreak-lineage-derived reverse genetics system

102 (Atieh et al. 2016) (NCBI accession KJ776791.2). The scanning window analysis was performed

103 by the ScanFold-Scan program (https://github.com/moss-lab/ScanFold). Using a 120 nt window

104 and 1 nt step size, 10688 overlapping window sequences were analyzed, via the following

105 ScanFold-Scan process.

106 Each sequence is folded via RNAfold (Lorenz et al. 2011) (version 2.2.10) to calculate its native

107 Gibb’s free energy of folding (Gnative; measures how energetically favorable the predicted

108 structure is) and associated base pairing structure at 37ºC (human body temperature). Each

109 sequence is then scrambled to produce, in this case, 50 random sequences with the same

110 nucleotide content as the native sequence (randomization number was optimized to give stable z-

111 scores). Each randomized sequence is then folded in silico to calculate an average Grandom value

112 for use in the calculation of the thermodynamic z-score (Eq. 1; as adapted from (Clote et al.

113 2005)).

114 𝑧 ‒ 𝑠𝑐𝑜𝑟𝑒= 𝐺𝑛𝑎𝑡𝑖𝑣𝑒 ‒ 𝐺𝑟𝑎𝑛𝑑𝑜𝑚𝜎 #(𝑒𝑞1)115

Page 8: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

116 The results of this scan (Table S1) are the input for the ScanFold-Fold program.

117 For each nucleotide (i) in the genome, the ScanFold-Fold program generates a list of all pairing

118 partners (j) (unpaired nucleotides are considered self-partnered) and records the metrics from all

119 windows where that partner appeared. For each i in the sequence, a selection algorithm is deployed

120 to determine the most likely i-j arrangement. First, a coverage-normalized z-score is calculated for

121 each i-j pair; calculated as the sum of the z-scores observed for i-j, divided by the total number of

122 windows i appears in. The i-j pair with the lowest coverage-normalized z-score is reported as the

123 most favorable i-j pair. This coverage-normalized z-score (as opposed to a simple z-score average)

124 gives more weight to i-j pairs which consistently appear in low z-score windows and provides a

125 normalized metric for comparison of regions with lower window coverage (near the ends, where

126 nucleotides are covered by only a few windows). Selection based on the coverage-normalized z-

127 score results in a list of the most favorable i-j pairs for each nucleotide of the input sequence. In

128 cases where different i’s compete for the same j, the i-j pair is “awarded” to the i-j pair with the

129 lower coverage-normalized z-score. Results of all possible partners per i-nucleotide of the genome,

130 as well as the final list of i-j partners, are reported (shown for ZIKV in Table S2).

131 Filtering for cumulatively low-z-score i-j pairs occurs after this initial selection process. In this

132 case, average z-scores for each i-j pair are considered as opposed to the coverage-normalized

133 score; only i-j pairs with average z-scores below a filter value are written to a CT file. By default,

134 the ScanFold-Fold program will write four CT files, each using a different filter value: -2, -1, 10

135 (which captures all values; referred to as no-filter), and a user input filter value.

136 Alignment

137 A total of 37 ZIKV genomes (curated in the ZikaVR database (Gupta et al. 2016)) were aligned

138 to KJ776791.2 using the MAFFT web server (Katoh et al. 2017; Kuraku et al. 2013) with default

Page 9: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

139 settings. Aligned sequences (Table S3) were compared to ScanFold-Fold predicted bps (with z-

140 score < -1) to tabulate the types of base pairs that are found throughout the alignment (Table S4).

141 Results

142 The ZIKV genome was scanned using a 120 nt window with a 1 nt step: resulting in 10688

143 windows. For each window, several metrics of RNA folding were predicted (described in the

144 Results and Discussion of (Andrews et al. 2017)); two metrics are plotted vs. the ZIKV genome in

145 Figure 2: the Gibb’s free-energy of folding (G) and the thermodynamic energy z-score. The G

146 measures the thermodynamic stability of the optimum predicted structure; here, optimum is

147 defined as the structure with the lowest, most favorable, free energy: the minimum free energy

148 (MFE) structure. A low G indicates an RNA can stably fold, but does not necessarily indicate if

149 it is a functional structure. To gain this information, the thermodynamic z-score compares the

150 native sequence G to randomized sequence (in this case 50X randomizations of the ZIKV

151 nucleotides). Negative z-scores identify regions where the native ZIKV sequence have lower than

152 random free-energy, which can suggest that the sequence order (presumably due to evolution) is

153 set in such a way as to form favorable base pairing (Qu & Adelson 2012). Across the ZIKV genome

154 are regions where low G regions overlap low z-score regions, but also places where they do not

155 (Fig. 2). Low z-scores are a suggestion that the RNA may have a function mediated by structure

156 (Clote et al. 2005); indeed, this metric is at the core of many prediction methods. Negative z-scores

157 can highlight structural motifs with the best likelihood of being functional; however, even for a

158 relatively small genome such as ZIKV, 3349 windows had z-scores less than -1 (signifying the

159 window MFE prediction was one standard deviation more stable than random sequence) and 994

160 windows with z-scores less than -2 (Table S1). With so many windows of interest, and so many

161 competing models, it is a challenge to hone in on the most functionally significant base pairs.

Page 10: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

162 A total of 22180 unique base pairs (bps) were predicted throughout all scanning windows (Table

163 S2); certain nucleotides (nt) were predicted to form bps with as many as 16 different partners,

164 highlighting the challenge of finding a single model (see nt-1099 in Table S2). For each bp,

165 ScanFold records the metrics from each window where the bp appears, generating a set of

166 cumulative metrics. These metrics are then used to select the bp with the lowest cumulative z-

167 score for each nt in the input sequence. This yielded a much smaller group of 2259 bps (Table S5),

168 which have the highest propensity to have been ordered (evolved) to fold—and thus be likely to

169 have functional significance. To focus on the most significant hits, a cumulative z-score filter is

170 applied to identify the bps which were consistently found in low z-score windows; by default, a

171 cumulative z-score filter of -1 and -2 (one and two standard deviations more stable than random,

172 respectively) are applied and results are reported in connectivity table (CT) files—a non-filtered

173 CT file and a user-defined filter value CT are generated as well.

174 With a cumulative z-score filter of -1, 1114 bps were identified (Table S6). Consistent with the

175 presence of structured functional motifs, bps with highly negative cumulative z-scores were found

176 within the known ZIKV structured regions; a total of 194 bps were found within the 5' and 3' end

177 structured domains. ScanFold was able to identify 86 of the 114 known bps (Goertz et al. 2017) in

178 the 3' UTR (Fig. S1) and 75 of the 81 known bps (Ye et al. 2016) within the 5' structured domain

179 (Fig. S2).

180 Further filtering the results to z-scores < -2 reduced the number of bps to 233 (Table S7). Of these,

181 207 were found either in the known structural regions, or immediately adjacent (within 240 nt; two

182 window’s length), suggesting that the regions of functional structure at either end may be larger

183 than previously thought (full structural models of the extended 5' and 3' ends are shown in Fig.

184 S3). The remaining 26 bps contribute to four structures found within the core coding region (Fig.

Page 11: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

185 3 c-f).

186 To determine the structural conservation of these ScanFold identified bps, an alignment was

187 performed of 37 ZIKV genomes curated in the ZikaVR database (Gupta et al. 2016); aligned

188 sequences are reported in Table S3. The ScanFold bps (e.g., those contained in windows averaging

189 z-score < -1) were mapped to the alignment to determine the conservation of base pairing across

190 ZIKV genomes. When mutations occurred in predicted paired regions they generally preserved

191 base pairing: e.g. ScanFold predicted bps were over 96% conserved (Table S4). Multiple structure-

192 conserving mutations occur throughout novel predicted motifs (Fig. 3) as well as in previously

193 described ZIKV structures (Fig. 1), indicating that ZIKV may be evolving to preserve these novel

194 structured regions in addition to known functional RNA structures.

195 Discussion

196 This bioinformatics analysis of the ZIKV genome reiterates the prominence of RNA structure in

197 the 5 and 3 regions of the ZIKV genome, indicates that these (previously described) structured

198 regions could be larger, and reveals several potentially functional structures within the core coding

199 region. This was achieved by an approach which condenses thousands of scanning window models

200 into a single result; greatly reducing the dependence on subjective manual curation of results.

201 The ScanFold algorithm was able to successfully identify known bps within the 5 and 3 regions

202 (RNAstructure scorer (Bellaousov et al. 2013; Mathews 2004; Mathews et al. 1999) PPV and

203 sensitivity results in Table S8): in the 5 UTR region, ScanFold positively identified all known

204 structures (with slight variations in the stem of SLB; Fig. S2), while the 3 UTR model was accurate

205 in identifying the structures for DB-1, Ѱ-DB, sHP and SL, and only partially identifying the

206 exonuclease resistant SLI and SLII structures (Fig. S1); likely due to the presence of a complex

Page 12: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

207 RNA pseudoknot structure (Akiyama et al. 2016), which can be difficult to predict computationally

208 due to non-nested base pairing that complicates the use of recursive algorithms (Schlick & Pyle

209 2017).

210 ScanFold identified structures directly upstream and downstream of the 5 and 3 UTR regions (nt

211 270-528 and nt 10237-10370, respectively), which have metrics that are equally as favorable as

212 those in the known structural regions. These structures (Fig. 3 a, b, and g) have unusually stable

213 thermodynamic stability (compared to random) and are well conserved throughout ZIKV genomes

214 (Table S4) and supported by structure-preserving mutations (Fig. 3). Their close proximity to

215 functional UTR regions (Fig. S3) suggests that they may play important roles in conjunction with

216 these other elements: e.g. in genome replication, processing, etc. The structures predicted in the

217 core coding region (Fig. 3 c-f), also have metrics that indicate they may have evolved to form

218 functional conformations, are conserved and have support from sequence variations (Fig. 3). There

219 are numerous potential functions for these core region RNA structural motifs: e.g. serving as sites

220 of post-transcriptional modifications (Coutard et al. 2017; Dong et al. 2014; Gokhale et al. 2016;

221 Lichinchi et al. 2016), facilitating packaging of the genome (Stockley et al. 2013), acting as

222 localization signals (Pratt & Mowry 2013), modulating the rate of translation to affect protein

223 folding (Khrustalev et al. 2017), or to affect transcript (genome) stability (Wu & Brewer 2012).

224 The map of the RNA structural landscape of the ZIKV genome presented in this report serves as a

225 guide for future analyses. The functional importance of these motifs can be tested in virio by

226 designing mutations to disrupt/compensate structure (e.g. using a tool such as RNA2DMut; (Moss

227 2018))—while maintaining (or minimally disrupting) amino acid coding (and codon use)—then

228 introducing them into the genome via a ZIKV reverse genetics system (Atieh et al. 2016) to assess

229 effects on viability and replication; For example, a similar strategy was used to test RNA structural

Page 13: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

230 motifs in influenza A virus (Jiang et al. 2016). Furthermore, the presence of conserved base pairing

231 within ZIKV coding regions would be expected to impact its evolution (to maintain both functional

232 protein and RNA structure); thus, these data can also potentially be helpful in understanding

233 potential constraints placed on the evolution of this virus.

234 Conclusions

235 In conclusion, this report presents a bioinformatics scan of the ZIKV genome and a novel analysis

236 pipeline/method for functional RNA motif discovery that (1) recapitulates known functional motifs

237 toward the ends of the ZIKV genome, (2) suggests that these regions of RNA structure may be

238 larger than previously reported, and (3) finds novel motifs within the core coding region of the

239 genome. These results suggest additional downstream experiments to assign function to these

240 elements—e.g. by disrupting structure via genetic mutations introduced into the ZIKV reverse

241 genetics system.

242 Acknowledgements

243 We would like to thank the Roy J. Carver Charitable Trust for their continuing support.

244

245 References

246 2017. Zika virus: an epidemiological update. Wkly Epidemiol Rec 92:188-192.

247 Akiyama BM, Laurence HM, Massey AR, Costantino DA, Xie X, Yang Y, Shi PY, Nix JC, Beckham

248 JD, and Kieft JS. 2016. Zika virus produces noncoding RNAs using a multi-pseudoknot

249 structure that confounds a cellular exonuclease. Science 354:1148-1152.

250 10.1126/science.aah3963

251 Alvarez DE, Lodeiro MF, Luduena SJ, Pietrasanta LI, and Gamarnik AV. 2005. Long-range

252 RNA-RNA interactions circularize the dengue virus genome. J Virol 79:6631-6643.

253 10.1128/JVI.79.11.6631-6643.2005

254 Andrews RJ, Baber L, and Moss WN. 2017. RNAStructuromeDB: A genome-wide database for

255 RNA structural inference. Sci Rep 7:17269. 10.1038/s41598-017-17510-y

Page 14: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

256 Atieh T, Baronti C, de Lamballerie X, and Nougairede A. 2016. Simple reverse genetics

257 systems for Asian and African Zika viruses. Sci Rep 6:39384. 10.1038/srep39384

258 Bellaousov S, Reuter JS, Seetin MG, and Mathews DH. 2013. RNAstructure: Web servers for

259 RNA secondary structure prediction and analysis. Nucleic Acids Res 41:W471-474.

260 10.1093/nar/gkt290

261 Broutet N, Krauer F, Riesen M, Khalakdina A, Almiron M, Aldighieri S, Espinal M, Low N, and

262 Dye C. 2016. Zika Virus as a Cause of Neurologic Disorders. New England Journal of

263 Medicine 374:1506-1509. 10.1056/NEJMp1602708

264 Busan S, and Weeks KM. 2017. Visualization of RNA structure models within the Integrative

265 Genomics Viewer. RNA 23:1012-1018. 10.1261/rna.060194.116

266 Chapman EG, Moon SL, Wilusz J, and Kieft JS. 2014. RNA structures that resist degradation

267 by Xrn1 produce a pathogenic Dengue virus RNA. Elife 3. ARTN e01892

268 10.7554/eLife.01892

269 Clote P, Ferre F, Kranakis E, and Krizanc D. 2005. Structural RNA has lower folding energy

270 than random RNA of the same dinucleotide frequency. RNA 11:578-591.

271 10.1261/rna.7220505

272 Coutard B, Barral K, Lichiere J, Selisko B, Martin B, Aouadi W, Lombardia MO, Debart F,

273 Vasseur JJ, Guillemot JC, Canard B, and Decroly E. 2017. Zika Virus Methyltransferase:

274 Structure and Functions for Drug Design Perspectives. J Virol 91. 10.1128/JVI.02202-

275 16

276 Darty K, Denise A, and Ponty Y. 2009. VARNA: Interactive drawing and editing of the RNA

277 secondary structure. Bioinformatics 25:1974-1975. 10.1093/bioinformatics/btp250

278 Davis WG, Basu M, Elrod EJ, Germann MW, and Brinton MA. 2013. Identification of cis-Acting

279 Nucleotides and a Structural Feature in West Nile Virus 3 '-Terminus RNA That

280 Facilitate Viral Minus Strand RNA Synthesis. Journal of Virology 87:7622-7636.

281 10.1128/Jvi.00212-13

282 Dick GW. 1952. Zika virus. II. Pathogenicity and physical properties. Trans R Soc Trop Med

283 Hyg 46:521-534.

284 Dick GW, Kitchen SF, and Haddow AJ. 1952. Zika virus. I. Isolations and serological specificity.

285 Trans R Soc Trop Med Hyg 46:509-520.

286 Donald CL, Brennan B, Cumberworth SL, Rezelj VV, Clark JJ, Cordeiro MT, Freitas de Oliveira

287 Franca R, Pena LJ, Wilkie GS, Da Silva Filipe A, Davis C, Hughes J, Varjak M, Selinger M,

288 Zuvanov L, Owsianka AM, Patel AH, McLauchlan J, Lindenbach BD, Fall G, Sall AA, Biek

289 R, Rehwinkel J, Schnettler E, and Kohl A. 2016. Full Genome Sequence and sfRNA

290 Interferon Antagonist Activity of Zika Virus from Recife, Brazil. PLoS Negl Trop Dis

291 10:e0005048. 10.1371/journal.pntd.0005048

292 Dong H, Fink K, Zust R, Lim SP, Qin CF, and Shi PY. 2014. Flavivirus RNA methylation. J Gen

293 Virol 95:763-778. 10.1099/vir.0.062208-0

294 Elghonemy S, Davis WG, and Brinton MA. 2005. The majority of the nucleotides in the top

295 loop of the genomic 3 ' terminal stem loop structure are cis-acting in a West Nile virus

296 infectious clone. Virology 331:238-246. 10.1016/j.virol.2004.11.008

297 Fang R, Moss WN, Rutenberg-Schoenberg M, and Simon MD. 2015. Probing Xist RNA

298 Structure in Cells Using Targeted Structure-Seq. PLoS Genet 11:e1005668.

299 10.1371/journal.pgen.1005668

Page 15: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

300 Filomatori CV, Lodeiro MF, Alvarez DE, Samsa MM, Pietrasanta L, and Gamarnik AV. 2006. A

301 5' RNA element promotes dengue virus RNA synthesis on a circular genome. Genes

302 Dev 20:2238-2249. 10.1101/gad.1444206

303 Goertz GP, Abbo SR, Fros JJ, and Pijlman GP. 2017. Functional RNA during Zika virus infection.

304 Virus Res. 10.1016/j.virusres.2017.08.015

305 Goertz GP, Fros JJ, Miesen P, Vogels CBF, van der Bent ML, Geertsema C, Koenraadt CJM, van

306 Rij RP, van Oers MM, and Pijlman GP. 2016. Noncoding Subgenomic Flavivirus RNA Is

307 Processed by the Mosquito RNA Interference Machinery and Determines West Nile

308 Virus Transmission by Culex pipiens Mosquitoes. Journal of Virology 90:10145-

309 10159. 10.1128/Jvi.00930-16

310 Gokhale NS, McIntyre ABR, McFadden MJ, Roder AE, Kennedy EM, Gandara JA, Hopcraft SE,

311 Quicke KM, Vazquez C, Willer J, Ilkayeva OR, Law BA, Holley CL, Garcia-Blanco MA,

312 Evans MJ, Suthar MS, Bradrick SS, Mason CE, and Horner SM. 2016. N6-

313 Methyladenosine in Flaviviridae Viral RNA Genomes Regulates Infection. Cell Host

314 Microbe 20:654-665. 10.1016/j.chom.2016.09.015

315 Gruber AR, Findeiss S, Washietl S, Hofacker IL, and Stadler PF. 2010. RNAz 2.0: improved

316 noncoding RNA detection. Pac Symp Biocomput:69-79.

317 Gupta AK, Kaur K, Rajput A, Dhanda SK, Sehgal M, Khan MS, Monga I, Dar SA, Singh S, Nagpal

318 G, Usmani SS, Thakur A, Kaur G, Sharma S, Bhardwaj A, Qureshi A, Raghava GP, and

319 Kumar M. 2016. ZikaVR: An Integrated Zika Virus Resource for Genomics, Proteomics,

320 Phylogenetic and Therapeutic Analysis. Sci Rep 6:32713. 10.1038/srep32713

321 Jiang T, Nogales A, Baker SF, Martinez-Sobrido L, and Turner DH. 2016. Mutations Designed

322 by Ensemble Defect to Misfold Conserved RNA Structures of Influenza A Segments 7

323 and 8 Affect Splicing and Attenuate Viral Replication in Cell Culture. PLoS One

324 11:e0156906. 10.1371/journal.pone.0156906

325 Katoh K, Rozewicki J, and Yamada KD. 2017. MAFFT online service: multiple sequence

326 alignment, interactive sequence choice and visualization. Brief Bioinform.

327 10.1093/bib/bbx108

328 Khrustalev VV, Khrustaleva TA, Sharma N, and Giri R. 2017. Mutational Pressure in Zika

329 Virus: Local ADAR-Editing Areas Associated with Pauses in Translation and

330 Replication. Front Cell Infect Microbiol 7:44. 10.3389/fcimb.2017.00044

331 Kieft JS, Rabe JL, and Chapman EG. 2015. New hypotheses derived from the structure of a

332 flaviviral Xrn1-resistant RNA: Conservation, folding, and host adaptation. RNA Biol

333 12:1169-1177. 10.1080/15476286.2015.1094599

334 Kuraku S, Zmasek CM, Nishimura O, and Katoh K. 2013. aLeaves facilitates on-demand

335 exploration of metazoan gene family trees on MAFFT sequence alignment server with

336 enhanced interactivity. Nucleic Acids Res 41:W22-28. 10.1093/nar/gkt389

337 Lanciotti RS, Lambert AJ, Holodniy M, Saavedra S, and Signor Ldel C. 2016. Phylogeny of Zika

338 Virus in Western Hemisphere, 2015. Emerg Infect Dis 22:933-935.

339 10.3201/eid2205.160065

340 Lichinchi G, Zhao BS, Wu Y, Lu Z, Qin Y, He C, and Rana TM. 2016. Dynamics of Human and

341 Viral RNA Methylation during Zika Virus Infection. Cell Host Microbe 20:666-673.

342 10.1016/j.chom.2016.10.002

343 Liu ZY, Li XF, Jiang T, Deng YQ, Zhao H, Wang HJ, Ye Q, Zhu SY, Qiu Y, Zhou X, Qin ED, and Qin

344 CF. 2013. Novel cis-Acting Element within the Capsid-Coding Region Enhances

Page 16: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

345 Flavivirus Viral-RNA Replication by Regulating Genome Cyclization. Journal of

346 Virology 87:6804-6818. 10.1128/Jvi.00243-13

347 Lodeiro MF, Filomatori CV, and Gamarnik AV. 2009. Structural and functional studies of the

348 promoter element for dengue virus RNA replication. J Virol 83:993-1008.

349 10.1128/JVI.01647-08

350 Lorenz R, Bernhart SH, Honer Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, and Hofacker

351 IL. 2011. ViennaRNA Package 2.0. Algorithms Mol Biol 6:26. 10.1186/1748-7188-6-

352 26

353 Manangeeswaran M, Ireland DDC, and Verthelyi D. 2016. Zika (PRVABC59) Infection Is

354 Associated with T cell Infiltration and Neurodegeneration in CNS of

355 Immunocompetent Neonatal C57Bl/6 Mice. Plos Pathogens 12. ARTN e1006004

356 10.1371/journal.ppat.1006004

357 Mathews DH. 2004. Using an RNA secondary structure partition function to determine

358 confidence in base pairs predicted by free energy minimization. RNA 10:1178-1190.

359 10.1261/rna.7650904

360 Mathews DH, Sabina J, Zuker M, and Turner DH. 1999. Expanded sequence dependence of

361 thermodynamic parameters improves prediction of RNA secondary structure. J Mol

362 Biol 288:911-940. 10.1006/jmbi.1999.2700

363 Miner JJ, and Diamond MS. 2017. Zika Virus Pathogenesis and Tissue Tropism. Cell Host

364 Microbe 21:134-142. 10.1016/j.chom.2017.01.004

365 Moss WN. 2018. RNA2DMut: a web tool for the design and analysis of RNA structure

366 mutations. RNA 24:273-286. 10.1261/rna.063933.117

367 Moss WN, Priore SF, and Turner DH. 2011. Identification of potential conserved RNA

368 secondary structure throughout influenza A coding regions. RNA 17:991-1011.

369 10.1261/rna.2619511

370 Moss WN, and Steitz JA. 2013. Genome-wide analyses of Epstein-Barr virus reveal conserved

371 RNA structures and a novel stable intronic sequence RNA. BMC Genomics 14:543.

372 10.1186/1471-2164-14-543

373 Pijlman GP, Funk A, Kondratieva N, Leung J, Torres S, van der Aa L, Liu WJ, Palmenberg AC,

374 Shi PY, Hall RA, and Khromykh AA. 2008. A Highly Structured, Nuclease-Resistant,

375 Noncoding RNA Produced by Flaviviruses Is Required for Pathogenicity. Cell Host &

376 Microbe 4:579-591. 10.1016/j.chom.2008.10.007

377 Pratt CA, and Mowry KL. 2013. Taking a cellular road-trip: mRNA transport and anchoring.

378 Curr Opin Cell Biol 25:99-106. 10.1016/j.ceb.2012.08.015

379 Qu Z, and Adelson DL. 2012. Evolutionary conservation and functional roles of ncRNA. Front

380 Genet 3:205. 10.3389/fgene.2012.00205

381 Reuter JS, and Mathews DH. 2010. RNAstructure: software for RNA secondary structure

382 prediction and analysis. BMC Bioinformatics 11:129. 10.1186/1471-2105-11-129

383 Saiz JC, Vazquez-Calvo A, Blazquez AB, Merino-Ramos T, Escribano-Romero E, and Martin-

384 Acebes MA. 2016. Zika Virus: the Latest Newcomer. Front Microbiol 7:496.

385 10.3389/fmicb.2016.00496

386 Schlick T, and Pyle AM. 2017. Opportunities and Challenges in RNA Structural Modeling and

387 Design. Biophys J 113:225-234. 10.1016/j.bpj.2016.12.037

388 Stockley PG, Twarock R, Bakker SE, Barker AM, Borodavka A, Dykeman E, Ford RJ, Pearson

389 AR, Phillips SE, Ranson NA, and Tuma R. 2013. Packaging signals in single-stranded

Page 17: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

390 RNA viruses: nature's alternative to a purely electrostatic assembly mechanism. J Biol

391 Phys 39:277-287. 10.1007/s10867-013-9313-0

392 Thorvaldsdottir H, Robinson JT, and Mesirov JP. 2013. Integrative Genomics Viewer (IGV):

393 high-performance genomics data visualization and exploration. Brief Bioinform

394 14:178-192. 10.1093/bib/bbs017

395 Thurner C, Witwer C, Hofacker IL, and Stadler PF. 2004. Conserved RNA secondary

396 structures in Flaviviridae genomes. J Gen Virol 85:1113-1124. 10.1099/vir.0.19462-0

397 Villordo SM, Alvarez DE, and Gamarnik AV. 2010. A balance between circular and linear

398 forms of the dengue virus genome is crucial for viral replication. Rna-a Publication of

399 the Rna Society 16:2325-2335. 10.1261/rna.2120410

400 Villordo SM, Carballeda JM, Filomatori CV, and Gamarnik AV. 2016. RNA Structure

401 Duplications and Flavivirus Host Adaptation. Trends Microbiol 24:270-283.

402 10.1016/j.tim.2016.01.002

403 Washietl S, and Hofacker IL. 2007. Identifying structural noncoding RNAs using RNAz. Curr

404 Protoc Bioinformatics Chapter 12:Unit 12 17. 10.1002/0471250953.bi1207s19

405 Wu X, and Brewer G. 2012. The regulation of mRNA stability in mammalian cells: 2.0. Gene

406 500:10-21. 10.1016/j.gene.2012.03.021

407 Ye Q, Liu ZY, Han JF, Jiang T, Li XF, and Qin CF. 2016. Genomic characterization and

408 phylogenetic analysis of Zika virus circulating in the Americas. Infect Genet Evol

409 43:43-49. 10.1016/j.meegid.2016.05.004

410 Yu L, and Markoff L. 2005. The topology of bulges in the long stem of the flavivirus 3 ' stem-

411 loop is a major determinant of RNA replication competence. Journal of Virology

412 79:2309-2324. 10.1128/Jvi.79.4.2309-2324.2005

413 Zeng LL, Falgout B, and Markoff L. 1998. Identification of specific nucleotide sequences

414 within the conserved 3 '-SL in the dengue type 2 virus genome required for

415 replication. Journal of Virology 72:7510-7522.

416

417

418 Figure Legends

419

420 Figure 1. Models of the known functional RNA structural motifs found in the 5' and 3' end regions

421 of the ZIKV genome (KJ776791.2). (a) Structure model of the 5' UTR region as shown in (Ye et

422 al. 2016) mapped to the KJ776791.2 sequence. (b) Structure model of the 3' UTR region as shown

423 in (Goertz et al. 2017) mapped to the KJ776791.2 sequence. Base pairs are colored based on their

424 appearance in ScanFold filtered results; blue lines depict bps that were predicted in the z-score <

425 -2 results (Table S7), green lines refer to bps that were predicted in the z-score < -1 results (Table

426 S6), and grey lines were not predicted. Structure preserving mutations (from the alignment of

Page 18: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

427 curated ZIKV genomes compared to the structures of known motifs) are highlighted with green

428 circles.

429 Figure 2. Bioinformatics scans of the ZIKV genome. (a) The predicted minimum folding free

430 energy (MFE) and (b) z-score for all RNA window segments. (c) Genome feature annotations are

431 shown; the polyprotein region has been broken down for visualization of individual coding

432 sequence regions. All data was visualized, archived, and is available for browsing/download on

433 the RNAStructuromeDB https://structurome.bb.iastate.edu/.

434 Figure 3. Structure models of coding region motifs that contain ScanFold (z-score < -2) bp hits.

435 Structures are shown in the order they appear throughout the ZIKV genome (KJ776791.2). Figures

436 (a) and (b) depict the structural models of the ScanFold predicted structures closely upstream from

437 the previously annotated 5' structured region, located from nt 274-326 and 433-529 respectively.

438 Figures (c), (d), (e), and (f) are structures which appeared within the core coding region located

439 from nt 4463-4520, 5587-5702, 7096-7205, and 7651-7755 respectively. Structure (g) is directly

440 upstream of the 3' structured region, located at nt 10237-10368, and has been annotated to show

441 the location of the stop codon near the terminal loop (circled in green). Base pairs are colored

442 based on their z-score cutoff; green lines show bps which were predicted in the z-score < -1 results

443 (Table S6) and blue lines show bps which were predicted in z-score < -2 results (Table S7). Sites

444 with structure-preserving mutations are highlighted with filled green circles.

445 Supplementary information

446 Figure S1. Comparative arc diagram depicting the previously described 5' RNA structural motifs

447 vs. the ScanFold predicted bps. (a) Arc diagram of 5' end region as predicted via ScanFold; base

448 pairs are colored based on their z-score cutoff where blue lines depict bps which were predicted in

449 the z-score < -2 results (Table S7) and green lines refer to bps which were predicted in the z-score

Page 19: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

450 < -1 results (Table S6). (b) Arc diagram of the accepted secondary structure model for the 5' end

451 of ZIKV as shown in (Ye et al. 2016) and mapped to the KJ776791.2 sequence.

452 Figure S2. Comparative arc diagram depicting the known RNA structural motifs vs. the ScanFold

453 predicted bps. (a) Arc diagram of 3' end region as predicted via ScanFold; base pairs are colored

454 based on their z-score cutoff where blue lines depict bps which were predicted in the z-score < -2

455 results (Table S7), green lines refer to bps which were predicted in the z-score < -1 results (Table

456 S6), and yellow lines were predicted in the no filter results (Table S5). (b) Arc diagram of the

457 accepted secondary structure model for the 5’ end of ZIKV as shown in (Goertz et al. 2017)

458 mapped to the KJ776791.2 sequence. The start codon nucleotide locations have been highlighted

459 with a light blue bar.

460 Figure S3. Secondary structure model depicting the ScanFold proposed structures within and

461 directly adjacent to known 5' and 3' structured regions. Base pairs are colored based on their z-

462 score cutoff: blue lines depict bps which were predicted in the z-score < -2 results (Table S7),

463 green lines refer to bps which were predicted in the z-score < -1 results (Table S6), and yellow

464 lines were predicted in the no filter results (Table S5). The start and stop codon nucleotides have

465 been circled and labeled in blue and green respectively. Nucleotides which established ScanFold

466 bp preserving mutations within the alignment are highlighted with filled green circles.

467 Table S1. Results of the scanning window analysis of the the ZIKV genome (NCBI accession

468 KJ776791.2) as ouput from the ScanFold-Scan program. Each row contains the data calculated for

469 each window. Columns A and B are the starting (i) and ending (j) coordinates of the window

470 fragment. Column C is the temperature used for all RNAFold calculations. Column D-H refer to

471 the ∆Gnative, thermodynamic z-score, stability ratio p-value, ensemble diversity, and frequency-of-

472 MFE (fMFE) values respectively (detailed descriptions of all metrics can be found at the

Page 20: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

473 RNAStructuromeDB https://structurome.bb.iastate.edu or the corresponding manuscript

474 (Andrews et al. 2017)). Column I contains the sequence of the window; the ∆Gnative and centroid

475 structure of this sequence are shown in Column J and K. Column L-O report nucleotide counts for

476 the window sequence.

477 Table S2. ScanFold log file produced during the ScanFold-Fold portion of the program. The log

478 file is separated into two portions. The first half (row 1 to 87,448) contains a table for each

479 nucleotide in the sequence. These tables contain the cumulative base pairing information for that

480 nucleotide as predicted throughout the scan. Column A refers to the i-nucleotide of the sequence.

481 Column B refers to the coordinate of the j base pair. The total number of windows the i-j pair

482 appears, as well as the total number of windows the i-nucleotide appears are reported in column

483 D. The average window minimum free energy, z-score, and ensemble diversity of each i-j pair are

484 reported in columns E-G respectively. Column H reports the sum of z-scores for each i-j pair,

485 which is used to calculate the coverage-normalized z-score (calculated as the sum of z-score over

486 total windows in which i-nucleotide appeared) as reported in Column I. Column J reports a

487 summary of the bps predicted for each i-nucleotide. The second half of the log file, starting at row

488 87,449, is a list of the most favorable i-j pairs (column B and C) associated with the i-nucleotide

489 listed in column A. In places where this nucleotide competed with other i-nucleotides for the same

490 j-nucleotide, the “winning” i-j pair is reported and denoted with an asterisk (in some cases the

491 winning i-j pair does not contain the original i-nucleotide or may be unpaired). Columns D, E, and

492 F, contain the average window minimum free energy, z-score and ensemble diversity for the

493 corresponding i-j pair.

494 Table S3. Results of 37 ZIKV genomes curated in the ZikaVR database (Gupta et al. 2016) aligned

495 to KJ776791.2. Genomes were aligned using the MAFFT web server (Katoh et al. 2017; Kuraku

Page 21: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

496 et al. 2013) with default settings. Headings for each result contain the NCBI accession numbers

497 and name of the aligned sequence name.

498 Table S4. Base pair counts tabulating the number and type of base pair which appears in the

499 ScanFold < -1 predicted structure when compared to 37 aligned ZIKV genome. A total of 37

500 ZIKV genomes were aligned to KJ776791.2 using the MAFFT web server (Katoh et al. 2017;

501 Kuraku et al. 2013) using default settings. Aligned sequences were compared to ScanFold-Fold

502 predicted bps (with z-score < -1) to tabulate the types of base pairs which are found throughout

503 the alignment (Table S3). Column S reports the percent of canonical bps which were found to be

504 allowed throughout the alignment for that base pair and column T reports the different number of

505 canonical base pair types. Results for the previously reported 5' and 3' UTR structural regions

506 appear as separate worksheets.

507 Table S5. CT file of the default no-filter results output from ScanFold-Fold.

508 Table S6. CT file of the default z-score < -1 results output from ScanFold-Fold.

509 Table S7. CT file of the default z-score < -2 results output from ScanFold-Fold.

510 Table S8. RNAstructure webserver scorer results of the ScanFold predicted structures compared

511 to accepted structures. The structures and sequences were uploaded to the server as shown, and

512 scorer was run with default settings.

Page 22: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

Figure 1

Figure 1. Models of the known functional RNA structural motifs found in the 5' and 3'

end regions of the ZIKV genome (KJ776791.2).

(a) Structure model of the 5' UTR region as shown in (Ye et al. 2016) mapped to the

KJ776791.2 sequence. (b) Structure model of the 3' UTR region as shown in (Goertz et al.

2017) mapped to the KJ776791.2 sequence. Base pairs are colored based on their

appearance in ScanFold filtered results; blue lines depict bps that were predicted in the z-

score < -2 results (Table S7), green lines refer to bps that were predicted in the z-score < -1

results (Table S6), and grey lines were not predicted. Structure preserving mutations (from

the alignment of curated ZIKV genomes compared to the structures of known motifs) are

highlighted with green circles.

Page 23: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

Figure 2(on next page)

Figure 2. Bioinformatics scans of the ZIKV genome.

(a) The predicted minimum folding free energy (MFE) and (b) z-score for all RNA window

segments. (c) Genome feature annotations are shown; the polyprotein region has been

broken down for visualization of individual coding sequence regions. All data was visualized,

archived, and is available for browsing/download on the RNAStructuromeDB

https://structurome.bb.iastate.edu/.

Page 24: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

-10

-20

-30

-40

-50

-60

ZIKV MFE

3

2

1

0

-1

-2

-3

-4

-5

ZIKV z-score

ZIKV features

10,50010,0009,5009,0008,5008,0007,5007,0006,5006,0005,5005,0004,5004,0003,5003,0002,5002,0001,5001,0005000

Genome Track View Help Share

               KJ776791.2 GoKJ776791.2:1..10807 (10.81 Kb)

0 1,250 2,500 3,750 5,000 6,250 7,500 8,750 10,000

Select tracks

a

b

c

Page 25: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,

Figure 3

Figure 3. Structure models of coding region motifs that contain ScanFold (z-score < -2)

bp hits.

Structures are shown in the order they appear throughout the ZIKV genome (KJ776791.2).

Figures (a) and (b) depict the structural models of the ScanFold predicted structures closely

upstream from the previously annotated 5' structured region, located from nt 274-326 and

433-529 respectively. Figures (c), (d), (e), and (f) are structures which appeared within the

core coding region located from nt 4463-4520, 5587-5702, 7096-7205, and 7651-7755

respectively. Structure (g) is directly upstream of the 3' structured region, located at nt

10237-10368, and has been annotated to show the location of the stop codon near the

terminal loop (circled in green). Base pairs are colored based on their z-score cutoff; green

lines show bps which were predicted in the z-score < -1 results (Table S6) and blue lines

show bps which were predicted in z-score < -2 results (Table S7). Sites with structure-

preserving mutations are highlighted with filled green circles.

Page 26: Genome-wide discovery of local RNA structural elements in Zika virus · 2018. 8. 9. · 40 polyprotein coding region. All data for the ZIKV genome have been archived at the 41 RNAStructuromeDB,