47
1 LARGE-SCALE BIOLOGY 1 2 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 3 Plants 4 5 Xiaotuo Zhang a,e , Yong Zhang a,e , Taiyun Wang a , Ziwei Li a , Jinping Cheng a , 6 Haoran Ge a , Qi Tang a , Kun Chen d , Li Liu d , Chenyu Lu c , Junqiang Guo b,c , 7 Binglian Zheng a,f , and Yun Zheng b,c,f 8 9 a State Key Laboratory of Genetic Engineering, Ministry of Education Key Laboratory 10 of Biodiversity Sciences and Ecological Engineering, Institute of Plant Biology, 11 School of Life Sciences, Fudan University, Shanghai 200438, China 12 b Faculty of Information Engineering and Automation, Kunming University of 13 Science and Technology, Kunming, Yunnan 650500, China 14 c Yunnan Key Laboratory of Primate Biomedical Research; Institute of Primate 15 Translational Medicine, Kunming University of Science and Technology, Kunming, 16 Yunnan 650500, China 17 d Faculty of Life Science and Technology, Kunming University of Science and 18 Technology, Kunming, Yunnan 650500, China 19 e These authors contributed equally to the work. 20 f Correspondence: YZ ([email protected]) and BZ ([email protected]). 21 22 Short title: Splicing branchpoints and lariats in plants 23 24 One-sentence summary: Analysis of 948 RNA sequencing datasets produced a 25 comprehensive map of intron branchpoints and lariat RNAs in Arabidopsis thaliana, 26 tomato, rice, and maize. 27 28 The author(s) responsible for distribution of materials integral to the findings 29 presented in this article in accordance with the policy described in the Instructions for 30 Authors (www.plantcell.org) are: Yun Zheng ([email protected]) and 31 Binglian Zheng ([email protected]). 32 33 ABSTRACT 34 Lariats are formed by excised introns, when the 5’ splice site joins with the 35 branchpoint (BP) during splicing. Although lariat RNAs are usually degraded by 36 RNA debranching enzyme 1 (DBR1), recent findings in animals detected many lariat 37 RNAs under physiological conditions. By contrast, the features of BPs and to what 38 extent lariat RNAs accumulate naturally are largely unexplored in plants. Here, we 39 Plant Cell Advance Publication. Published on March 20, 2019, doi:10.1105/tpc.18.00711 ©2019 American Society of Plant Biologists. All Rights Reserved

A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

1

LARGE-SCALE BIOLOGY 1

2

A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 3

Plants 4

5

Xiaotuo Zhanga,e

, Yong Zhanga,e

, Taiyun Wanga, Ziwei Li

a, Jinping Cheng

a, 6

Haoran Gea, Qi Tang

a, Kun Chen

d, Li Liu

d, Chenyu Lu

c, Junqiang Guo

b,c, 7

Binglian Zhenga,f

, and Yun Zhengb,c,f

8

9 a State Key Laboratory of Genetic Engineering, Ministry of Education Key Laboratory 10

of Biodiversity Sciences and Ecological Engineering, Institute of Plant Biology, 11

School of Life Sciences, Fudan University, Shanghai 200438, China 12 b

Faculty of Information Engineering and Automation, Kunming University of 13

Science and Technology, Kunming, Yunnan 650500, China 14 c

Yunnan Key Laboratory of Primate Biomedical Research; Institute of Primate 15

Translational Medicine, Kunming University of Science and Technology, Kunming, 16

Yunnan 650500, China 17 d

Faculty of Life Science and Technology, Kunming University of Science and 18

Technology, Kunming, Yunnan 650500, China 19 e These authors contributed equally to the work. 20

f Correspondence: YZ ([email protected]) and BZ ([email protected]). 21

22

Short title: Splicing branchpoints and lariats in plants 23

24

One-sentence summary: Analysis of 948 RNA sequencing datasets produced a 25

comprehensive map of intron branchpoints and lariat RNAs in Arabidopsis thaliana, 26

tomato, rice, and maize. 27

28

The author(s) responsible for distribution of materials integral to the findings 29

presented in this article in accordance with the policy described in the Instructions for 30

Authors (www.plantcell.org) are: Yun Zheng ([email protected]) and 31

Binglian Zheng ([email protected]). 32

33

ABSTRACT 34

Lariats are formed by excised introns, when the 5’ splice site joins with the 35

branchpoint (BP) during splicing. Although lariat RNAs are usually degraded by 36

RNA debranching enzyme 1 (DBR1), recent findings in animals detected many lariat 37

RNAs under physiological conditions. By contrast, the features of BPs and to what 38

extent lariat RNAs accumulate naturally are largely unexplored in plants. Here, we 39

Plant Cell Advance Publication. Published on March 20, 2019, doi:10.1105/tpc.18.00711

©2019 American Society of Plant Biologists. All Rights Reserved

Page 2: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

2

analyzed 948 RNA sequencing datasets to document plant BPs and lariat RNAs on a 40

genome-wide scale. In total, we identified 13872, 5199, 29582, and 13478 BPs in 41

Arabidopsis thaliana, tomato (Solanum lycopersicum), rice (Oryza sativa), and maize 42

(Zea mays), respectively. Features of plant BPs are highly similar to those in yeast and 43

human, in that BPs are adenine-preferred and flanked by uracil-enriched sequences. 44

Intriguingly, ~20% of introns harbor multiple BPs, and BP usage is tissue-specific. 45

Furthermore, 10,580 lariat RNAs accumulate in wild-type Arabidopsis plants, and 46

most of these lariat RNAs originate from longer or retroelement-depleted introns. 47

Moreover, the expression of these lariat RNAs is accompanied with the incidence of 48

back-splicing of parent exons. Collectively, our results provide a comprehensive map 49

of intron BPs and lariat RNAs in four plant species, and uncover a link between lariat 50

turnover and splicing. 51

52

INTRODUCTION 53

In eukaryotes, splicing of mRNA precursors (pre-mRNAs), a highly conserved critical 54

step for gene expression, comprises two catalytic steps (Ruskin et al., 1984). In the 55

first step, the 5’ splice site (5’ss, usually a GU dinucleotide) is attacked and 56

concurrently the 5' end of the intron is joined to the branchpoint (BP) by forming a 57

2’-5’ phosphodiester bond. This results in the production of a 5’ exon and a lariat 58

intermediate RNA that consists of a lariat form-intron and a 3’ exon. These 59

intermediates are then subjected to the second step of the reaction, in which the 3’ 60

splice site (3’ss, usually an AG dinucleotide) is attacked and the two exons are ligated 61

to produce the mRNA (Ruskin et al., 1984). The excised lariat introns, termed lariat 62

RNAs, are traditionally thought to be degraded quickly, when a dedicated 63

debranching enzyme 1 (DBR1) recognizes the BP and linearizes the lariat, promoting 64

its rapid degradation (Ruskin and Green, 1985; Nam et al., 1997; Kim et al., 2000; 65

Kim et al., 2001; Wang et al., 2004). As an obligate signal, the BP must be properly 66

selected to ensure efficient splicing (Jacquier and Rosbash, 1986). Recent studies 67

showed that features of the BP are highly conserved from yeast to human, that BP 68

selection is indeed regulated, and that BP mutation occurs in various diseases 69

Page 3: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

3

(Taggart et al., 2012; Bitton et al., 2014; Mercer et al., 2015; Taggart et al., 2017; 70

Pineda and Bradley, 2018). 71

In general, lariat RNAs derived from excised introns are usually destined for 72

intron recycling, although in animals some debranched lariat RNAs can be further 73

processed into mirtron microRNAs (Ruby et al., 2007; Okamura et al., 2007), or into 74

small interfering RNAs in yeast (Dumesic et al., 2013), or into small nucleolar RNAs 75

(Ooi et al., 1998). However, some lariat RNAs accumulate under physiological 76

conditions in animals (Zhang et al., 2013; Talhouarne and Gall, 2014; Tay and Pek, 77

2017; Talhouarne and Gall, 2018). Moreover, two recent studies showed that intron 78

RNAs promote cell survival in yeast (Parenteau et al., 2019; Morgan et al., 2019), 79

further indicating that intronic RNAs are not useless by-products of splicing, but 80

rather that these intronic RNAs play essential roles in eukaryotes. 81

Loss-of-function mutants of DBR1 are embryonic lethal in both plants to animals, 82

and accompanied with over-accumulation of lariat RNAs (Wang et al., 2004; Zheng et 83

al., 2015; Li et al., 2016), indicating that DBR1 is essential for viability in both plants 84

and animals. We showed that lariat RNAs act as decoys to inhibit genome-wide 85

miRNA biogenesis by sequestering the Dicer complex (Li et al., 2016). Together, 86

findings that lariat RNAs act as decoys to sequester the toxicity of TDP-43 in 87

Amyotrophic Lateral Sclerosis (ALS) disease (Armakola et al., 2012) and that loss of 88

DBR1 caused compromised retrovirus replication (Ye et al., 2005; Galvis et al., 2014; 89

Galvis et al., 2017; Zhang et al., 2018), indicate that a strategy to control lariat RNA 90

abundance is a potential therapeutic approach. 91

In earlier studies, the detection of lariat RNAs was usually based on RT-PCR, 92

which exploits the ability of the reverse transcriptase to read through the BP (Suzuki 93

et al., 2006). With breakthroughs of high throughput sequencing technologies and 94

bioinformatics analyses, several studies from animals, such as Xenopus tropicalis, 95

Drosophila melanogaster, mouse, chicken, zebrafish, and human, showed 96

genome-wide accumulation of lariat RNAs in stable circular forms (Zhang et al., 2013; 97

Tay and Pek, 2017; Talhouarne and Gall, 2018), implying that the phenomenon of 98

lariat RNAs accumulating naturally is evolutionarily conserved in animals. Although 99

Page 4: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

4

the formation of lariat RNAs is highly conserved in eukaryotes, the features of lariat 100

RNAs in plants were largely unexplored. Importantly, the features of BPs and/or 101

whether flanking sequences of BPs play a role in lariat RNA turnover was also 102

unclear in plants. In contrast to the increasing understanding of BPs and lariat RNAs 103

in yeast (Bitton et al., 2014) and animals (Taggart et al., 2012; Mercer et al., 2015; 104

Pineda and Bradley, 2018), a genome-wide analysis of plant intronic BPs and lariat 105

RNAs had not yet been reported. 106

Here, we performed large-scale analyses to systematically identify BPs across 107

four plant species, dicots and monocots, to provide a comprehensive map of intron 108

BPs on a genome-wide scale in plants. Our results indicate that plant introns prefer 109

adenines as their BPs, that many introns have multiple BPs, and that BP usage is 110

tissue-specific. Furthermore, using circular RNA-seq analyses from wild type and a 111

weak viable dbr1 mutant (dbr1-2), we showed that over 10,000 lariat RNAs 112

accumulate with at least five FPKM (Fragments Per Kilo basepairs per Million 113

sequencing tags) in wild-type Arabidopsis. The expression of these lariat RNAs is 114

anti-correlated with the insertion frequency of retroelements in the introns, but is 115

positively correlated with the incidence of back-splicing of flanking exons. Our data 116

provide insights into the characteristics of plant lariat RNAs and intron BPs, and 117

reveal an unexpected complexity of BP selection, lariat RNA turnover and splicing. 118

119

RESULTS 120

Transcriptomic analyses of Col-0 and dbr1-2 by circular RNA sequencing 121

Because recent studies showed that accumulation of lariat RNAs occurs widely in 122

animals (Zhang et al., 2013; Tay and Pek, 2017; Talhouarne and Gall, 2018), we 123

investigated whether this was also the case in plants. As it is known that lariat RNAs 124

exist in a circular form by escaping linearization in vivo, we performed circular RNA 125

sequencing to globally identify lariat RNAs under physiological conditions. Briefly, 126

by taking advantage of dbr1-2, a weak viable allele of dbr1 (Li et al., 2016), we 127

enriched sequencing tags spanning the junction between the 5’ss and the BP from the 128

transcriptomes of Col-0 and dbr1-2, for genome-wide identification of BPs and lariat 129

Page 5: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

5

RNAs (Supplemental Figure 1). The RNA-seq profiles were consistent in multiple 130

samples for both Col-0 and dbr1-2, with and without RNase R treatments 131

(Supplemental Figure 2A). 132

As most linear mRNAs were digested by RNase R, the RNase R-treated samples 133

had lower gene expression levels than samples without RNase R treatments 134

(Supplemental Figure 3A). The global gene expression patterns of Col-0 and dbr1-2 135

were very similar, as shown by the high correlation coefficient values between the 136

Col-0 and dbr1-2 samples without RNase R treatments (Supplemental Figure 3B). 137

The hierarchical clustering showed that the Col-0 and dbr1-2 samples without RNase 138

R treatments grouped together, with very little difference within the cluster, which 139

was much smaller than with the RNase R-treated samples of Col-0 and dbr1-2 140

(Supplemental Figure 3B). Principal Component Analysis (PCA) also showed that 141

samples without RNase R treatments were clustered (Supplemental Figure 3C). 142

However, relative to their expression levels in Col-0, only 60 genes were de-regulated 143

in dbr1-2 (with multiple test corrected P-value < 0.05) (Supplemental Data Set 1). 144

These results suggest that the two genotypes are similar in expression profiles of 145

genes. 146

However, the accumulation levels of introns were significantly increased in 147

dbr1-2, whether samples with and without RNase R treatments (P < 10-100

, in both 148

cases, Mann-Whitney U-test, Figure 1A). The intron expression was consistent for 149

Col-0 and dbr1-2 profiles with and without RNase R treatments (Supplemental Figure 150

2B). Interestingly, samples were clustered based on the intron level profiles by both 151

hierarchical clustering (Figure 1B) and PCA (Figure 1C). Furthermore, the differences 152

between Col-0 and dbr1-2 samples detected using intron levels were larger than those 153

in using clustering results of genes (Supplemental Figure 3B). These results suggest 154

that the intronic expression underlies the main differences in the transcriptomes of 155

Col-0 and dbr1-2. By selecting intronic transcripts that had average abundances of ≥ 5 156

FPKM in Col-0 samples with RNase R treatments, 10,580 transcripts (619 transcripts 157

were detected only in Col-0 and 9961 transcripts were detected in both Col-0 and 158

dbr1-2) were identified as lariat RNAs from 6585 genes (548 genes from Col-0 only 159

Page 6: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

6

and 6037 genes from both Col-0 and dbr1-2) (Figure 1D and Supplemental Data Set 160

1), indicating that these lariat RNAs accumulated under physiological conditions. Of 161

note, the number of total annotated introns is 128,271 from 22,524 genes in the 162

Arabidopsis thaliana genome, but more than 50% of the introns (64,213/128,271 = 163

50.4%) are ≤ 100 nucleotides (nt) which were underrepresented in our study 164

presumably due to the intentionally depletion during library preparation. 165

In contrast to those in Col-0, 15,602 intronic transcripts (5641 transcripts were 166

detected only in dbr1-2 and 9961 transcripts were detected in both Col-0 and dbr1-2) 167

from 10242 genes (4205 genes from dbr1-2 only and 6037 genes from both Col-0 and 168

dbr1-2) showed average abundances of ≥ 5 FPKM in dbr1-2 samples with RNase R 169

treatments (Figure 1D), and 6720 unique intronic transcripts from 4672 genes had 170

significantly higher expression levels in dbr1-2 than in Col-0 (Figure 1E and 171

Supplemental Data Set 1). Notably, this higher intronic expression was heavily biased 172

for long introns (Figure 1F). This bias might be small introns depleted during the 173

commercial library construction protocols. The increased intronic accumulation in 174

dbr1-2 was due to accumulation of lariat RNAs because linear RNAs were removed 175

in the RNase R treatments. 176

To exclude the possibility that increased intronic accumulation in dbr1-2 was 177

caused by defective splicing efficiency, we compared the splicing efficiency (SE) in 178

Col-0 and dbr1-2 using RNA-seq profiles without RNase R treatments. The overall 179

SE showed no significant difference between Col-0 and dbr1-2 (Supplemental Figure 180

3D). Therefore, because dbr1-2 showed minor effects on gene expression, but major 181

effects on intron expression, it was reasonable to use these transcriptomes to further 182

investigate BPs and lariat RNAs in Arabidopsis. 183

184

BP features are highly conserved from dicots to monocots 185

Reverse transcriptase can traverse the BP to copy the intronic region upstream of the 186

BP, and thus this product contains two juxtaposed intronic segments that align in an 187

inverted order, defining the 5’ss and the BP (Suzuki et al., 2006). Considering that this 188

same read-through phenomenon also occurs during the construction of RNA 189

Page 7: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

7

sequencing libraries, we developed a computational pipeline using RNA-seq datasets 190

to systemically map the BPs in plants (Figure 2A). In brief, we first aligned all 191

sequenced reads to the genome with TopHat2 (Kim et al., 2013) or HISAT2 (Kim et 192

al., 2015). Then, we aligned unmapped reads to introns with BLASTN or Bowtie 2 193

(Langmead and Salzberg, 2012). For those reads that could be partially mapped to 194

introns, we examined whether the unmapped regions of the same reads could be 195

aligned to the same introns (Figure 2A). The reads that could span the 5’ss and 196

another region close to the 3’ss with at least 6 nucleotides on each segment were used 197

to infer the BP. The last nucleotide of the mapped region close to the 3’ss is predicted 198

as the BP. The reads that cover the same BP were grouped. By employing this 199

approach to analyze 948 RNA-seq profiles in total (Supplemental Data Set 2, 200

including 167 RNA-seq profiles for Arabidopsis (Arabidopsis thaliana), 264 201

RNA-seq profiles for tomato (Solanum lycopersicum), 207 RNA-seq profiles for rice 202

(Oryza sativa), and 310 RNA-seq profiles for maize (Zea mays), respectively), we 203

obtained ~300,000 informative sequenced reads in total, and identified 13,872 BPs 204

from 6414 introns in Arabidopsis (Supplemental Data Set 3), 5199 BPs from 2566 205

introns in tomato (Supplemental Data Set 3), 29582 BPs from 11026 introns in rice 206

(Supplemental Data Set 3), and 13487 BPs from 5986 introns in maize (Supplemental 207

Data Set 3), respectively. 208

In both dicot species (Arabidopsis and tomato) and monocot species (rice and 209

maize), BPs within constitutive introns were most frequently adenines (>50%), 210

followed by thymines/uracils (15-20%), guanines (~8-20%), and cytosines (~2-10%) 211

(Figure 2B), as reported in yeast and human (Taggart et al., 2012; Bitton et al., 2014; 212

Mercer et al., 2015; Taggart et al., 2017; Pineda and Bradley, 2018). By randomly 213

selecting 16 lariat RNAs from Arabidopsis for Sanger sequencing, we confirmed that 214

these BPs were adenines (Supplemental Figure 3E and 3F). In addition, previous 215

studies showed that the distance from the BP to the 3’ss is tightly constrained 216

(Taggart et al., 2012; Bitton et al., 2014; Mercer et al., 2015). We found that BPs were 217

preferentially positioned within 50 nucleotides upstream of the 3’ss in Arabidopsis, 218

tomato, and rice (Figure 2C), highly similar to those in yeast and humans (Taggart et 219

Page 8: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

8

al., 2012; Bitton et al., 2014; Mercer et al., 2015). However, only 51.2% (6903/13487) 220

of BPs located within 50 nucleotides upstream of the 3’ss in maize (Figure 2C), and 221

around half of BPs were positioned between 100 and 1000 nucleotides from the 3’ss 222

(Figure 2C). The heterogeneity of the distance of BP from the 3’ss in maize indicates 223

that the mechanism of BP selection in maize is more complicated. 224

Although >50% of constitutive introns in plants use adenine as the BP, a 225

substantial portion of introns applied other nucleotides as their BPs (Figure 2B and 226

Supplemental Figure 4). To exclude the possibility that non-adenine BPs were caused 227

by lower fidelity during the conversion of BPs by reverse transcriptase, we analyzed 228

the mutation events of BP during conversion by calculating the indicated nucleotide 229

of the referred genome relative to total numbers of identified BPs in each species. We 230

showed that adenine in the annotated sequences of pre-mRNAs was much more easily 231

converted to uracil (Supplemental Figure 5A-D). In contrast, guanines retained high 232

fidelity during library construction (Supplemental Figure 5A-D). This phenomenon 233

has been reported in a previous study (Taggart et al., 2012). Therefore, the identified 234

BPs with guanines are most likely non-canonical BPs in plants. 235

Since the flanking regions of BPs bind U2 snRNA, we surmised that nucleotides 236

flanking the BPs might be important for BP recognition. To identify potential 237

cis-elements, we analyzed nucleotides around the BP. We identified a consensus 238

motif containing a 10 nt uracil-rich element downstream of the BP (Figure 2D), and 239

the second position upstream of the BP exhibited a strong preference for the U 240

nucleotide (4.0-fold enrichment) in all four plant species (Figure 2D), which is 241

consistent with a recent finding in human cell lines (Mercer et al., 2015). Moreover, 242

multiple nucleotides downstream of the BP showed preferences for uracils (Figure 243

2D). These observations indicate that BP selection is highly conserved from plants to 244

animals. 245

246

Multiple branchpoints in plants 247

In general, a default BP is set for each intron. However, by calculating the number of 248

BPs in a single intron in all four plant species, we found that although most introns 249

Page 9: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

9

only had one identified BP, ~20% of the introns used two or more BPs (Figure 3A). 250

For example, At4g39260.1 I1 (the first intron of At4g39260.1) has two BPs identified 251

from our RNA sequencing analyses, and the interval between two BPs is very short 252

(Supplemental Figure 6A). We ranked the BPs for each intron by the number of 253

mapped lariat reads and defined the one supported by the highest number of reads as 254

the major BP. Consistent with this definition, we found that the major BP in 255

At4g39260.1 I1 is supported by more informative sequencing reads (Supplemental 256

Figure 6B), in which the major BP (the 258th

) is adenine supported by 15 reads and 257

the second BP (the 256th

) is uracil supported by 4 reads (Supplemental Figure 6B). 258

We then used Sanger sequencing to further validate the two BPs in At4g39260.1 259

I1. The score peaks before the 258th

nt of this intron were distinct, which indicates a 260

single possible nucleotide. However, on and after the 258th

nt, there were multiple 261

peaks for each nucleotide. We carefully examined the sequence that corresponds to 262

the major and minor peaks by Sanger sequencing. The two sequences (Supplemental 263

Figure 6C) actually resulted from the two BPs at the 258th

and 256th

nt of 264

At4g39260.1 I1, respectively. 265

Next, we investigated whether the distance of the BP from the 3’ss affected BP 266

usage. Quantitatively plotting the numbers of lariat reads as a function of BP position 267

showed that the majority of the most frequently used BPs resided within a narrow 268

window in all four plant species, consistent with the restricted genome-wide 269

distribution of BPs (Figure 3B). Together with that multiple BPs occurs widely in 270

human cell lines (Pineda and Bradley, 2018), these findings indicate that the 271

phenomena of multiple BPs is conserved from plants to mammals. 272

273

Tissue-specific branchpoints in Arabidopsis and rice 274

We surmised that the existence of multiple BPs might play a regulatory role in 275

pre-mRNA splicing. In other words, BP usage might be developmentally regulated, as 276

recently reported in human cell lines (Pineda and Bradley, 2018). To test this 277

hypothesis, we investigated whether certain introns exhibit tissue-specific BP usage in 278

Arabidopsis and rice. By grouping 167 RNA-seq profiles in Arabidopsis according to 279

Page 10: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

10

the tissue used for RNA extraction, we selected five tissues (callus, roots, seedlings, 280

leaves, and inflorescences) that had the largest numbers of supporting reads flanking 281

the BP to identify tissue-specific BPs using multinomial proportion tests and 282

estimated the False Discovery Rate (FDR) according to a previous study (Benjamini 283

and Hochberg, 1995). Due to the transient nature of lariat RNAs and the specific 284

selection of poly A-plus RNA in traditional RNA-seq library construction protocols, 285

informative reads that traverse the lariat junction between the 5’ss and the BP are rare. 286

However, we still detected 136 tissue-specific BPs in Arabidopsis (Supplemental Data 287

Set 4). By using the same method, we grouped 207 rice RNA-seq profiles according 288

to the tissue used for RNA extraction, and selected five tissues (nematode-induced 289

giant cells, panicle, roots, shoots, and vascular cells) that had the largest numbers of 290

supporting reads flanking the BP to identify tissue-specific BPs using the multinomial 291

proportion tests. Consequently, we identified 565 tissue-specific BPs in rice 292

(Supplemental Data Set 4). 293

Given the above-mentioned positional effects on BP usage, we expected to 294

observe preferential usage of BPs that were within ~50 bp proximal to the 3’ss. 295

Unexpectedly, we found that BP usage was instead highly tissue-specific 296

(Supplemental Data Set 4). For example, three BPs were identified for the ninth intron 297

of At3g01500.1, in which the distal BP (the 154th

nt upstream of the 3’ss) had a 298

significantly higher preference in leaves and seedlings but not in inflorescences (P = 299

1.7 × 10-61

, multinomial proportion test, Figure 4A), while the proximal BPs (the 33th

300

and 34th

nt upstream of the 3’ss) were frequently used in inflorescences but not in leaf 301

and seedlings (Figure 4A). Similarly, three BPs were identified for the first intron of 302

Os04g16748, in which the most distal BP (the 700th

nt upstream of the 3’ss) was only 303

detected in panicles and the distal BP (the 87th

nt upstream of the 3’ss) was mainly 304

used in giant cells and vascular cells, while the closest BP (the 7th

nt upstream of the 305

3’ss) was frequently used in panicles, shoots, and roots (Figure 4B). 306

To further validate this tissue-specific BP usage, we amplified the RT-PCR 307

products of the 7th

intron of At3g23590.1 from five different tissues (roots, seedlings, 308

leaves, inflorescences, and siliques) using indicated divergent primers, and performed 309

Page 11: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

11

Sanger sequencing. We obtained seven different isoforms of the 7th

intron of 310

At3g23590 by using the same pair of primers, and showed that seven unique BPs 311

existed in the lariat RNAs (Figure 4C). To examine whether the usage of these seven 312

BPs exhibited a tissue-specific pattern, we sequenced more than 10 independent 313

clones of RT-PCR products for each tissue, and counted the frequency of different 314

BPs in tested tissues (Supplemental Figure 7). We showed that different BPs 315

exhibited significant preferences in specific tissues (Figure 4D). For example, the 316

216th

BP was mainly selected in leaves, inflorescences, and siliques (Figure 4D). In 317

contrast, the 224th

BP was preferentially used in roots and seedlings (Figure 4D). 318

Unexpectedly, the 285th

and 287th

BPs, two BPs within 50 bp upstream of the 3’ss, 319

were seldom used in any tested tissues. Although the regulatory mechanism of 320

specific BP selection remains unknown, these results suggest that the multiple BP 321

usage is indeed regulated in a tissue-specific manner, which is consistent with results 322

in human introns (Pineda and Bradley, 2018). 323

324

A subset of introns self-circularize with the 5’ss and the 3’ss in plants 325

Several studies show that BP selection determines the 3’ss recognition, in which the 326

first AG downstream of the BP is usually used as the 3’ss (Smith et al., 1989; 327

Gooding et al., 2006). Although other criteria, including secondary structure, context 328

flanking the AG, distance to neighboring AGs, and an optimal distance between the 329

BP and AG shown above (Figure 2D), have been used (Chen et al., 2000; Chua and 330

Reed, 2001; Meyer et al., 2011), the ‘AG exclusion zone’ has been widely accepted to 331

predict the 3’ss. To validate whether the “AG exclusion zone” is applied to the 3’ss 332

recognition of plant introns, we scanned the context of the 3’ss of all introns with BPs 333

identified in our study. As expected, 58.2% in Arabidopsis, 54.2% in tomato, 54.8% 334

in rice, and 48.2% in maize of the BPs selected the first AG downstream of the BP as 335

the 3’ss (P < 0.001, by permutation test) (Figure 5A and Supplemental Data Set 3). 336

However, a substantial portion (~30-40% in four species) of the 3’ss skipped the first 337

AG downstream of the BP (Figure 5A and Supplemental Data Set 3). More 338

interestingly, some introns appeared to avoid AG downstream of the BP but instead 339

Page 12: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

12

selected a non-AG as the 3’ss (Figure 5A and Supplemental Data Set 3). These 340

observations suggest that the determination of the 3’ss in plants is not tightly 341

regulated by the “AG exclusion zone”. 342

Unexpectedly, we found that 107 introns in Arabidopsis, 82 introns in tomato, 343

429 introns in rice, and 269 introns in maize showed an overlap between the BP and 344

the 3’ss (Figure 5A), indicating that these intronic RNAs self-circularized with the 345

5’ss and the 3’ss, as also reported in human cells (Taggart et al., 2012; Gardner et al., 346

2012; Tay and Pek, 2017; Talhouarne and Gall, 2018). Several examples showed that 347

these intron transcripts were indeed circularized from the 5’ss to the 3’ss (Figure 5B 348

and 5C and Supplemental Figure 8), and each circularized intronic RNA was detected 349

in at least two independent RNA-seq profiles with more than 30 unique supporting 350

reads (Figure 5B and 5C and Supplemental Figure 8). Moreover, the average lengths 351

of these stably accumulated introns are longer than average lengths of all introns in 352

four species, especially in rice and maize (Supplemental Figure 9). These 353

observations indicate that some intronic RNAs are not traditionally degraded, instead, 354

these intronic RNAs can accumulate with a circular form in vivo. 355

356

Identification of lariat-derived circular RNAs in Arabidopsis 357

Lariat RNA formation during splicing is highly conserved in eukaryotes, and we 358

previous showed that some lariat RNAs accumulate naturally in plants (Li et al., 359

2016), as also reported in animals (Gardner et al., 2012; Zhang et al., 2013; Tay and 360

Pek, 2017; Talhouarne and Gall, 2018). To identify lariat RNAs in plants on a 361

genome-wide scale, we performed circular RNA sequencing using total RNAs from 362

inflorescences of wild type plants and focused on the reads mapped to intronic regions 363

only. Since RNase R degrades most linear RNAs, introns with significant 364

accumulation of sequencing reads in RNase R-treated Col-0 samples were regarded as 365

lariat-derived circular RNAs. We identified 10580 lariat-derived circular RNAs with 366

≥5 FPKM generated from 6585 genes in Col-0 (Figure 1D and Supplemental Data Set 367

1). Among these lariat-derived circular RNAs, 1489 lariat RNAs with ≥20 FPKM 368

were detected in wild type plants (Figure 6A). The average length of 64058 369

Page 13: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

13

introns >100 bp in the Arabidopsis thaliana genome is 253 bp, but the average length 370

of 10580 introns with lariat accumulation is 378 bp, which is significantly longer than 371

that of all introns (P<10-15

, Welch’s t-test, Figure 6B). However, among these 10580 372

introns with lariat accumulation in Col-0, the intron length was anti-correlated with 373

the expression of lariat-derived circular RNAs (Figure 6C), suggesting that the 374

expression of larger introns is limited. Consistent with this point, we found that 375

introns up-regulated in dbr1-2 were significantly shorter than those introns with lariat 376

accumulation in Col-0 (Figure 6B). 377

By examining the frequency of a single gene harboring lariat RNAs, we showed 378

that most genes only allowed one lariat RNA to accumulate (Supplemental Figure 379

10A), and that more than 2,000 lariat RNAs were originated from the first intron 380

(Supplemental Figure 10B). The potential coding capacity analysis showed that most 381

lariat-derived circular RNAs are non-coding transcripts (Supplemental Figure 10C). 382

Moreover, the expression of the lariat-derived circular RNAs are moderately 383

correlated with expression of the parent gene in both Col-0 and dbr1-2 (Supplemental 384

Figure 10D and 10E), which is consistent with the finding that lariat-derived circular 385

RNAs (ciRNAs) promote expression of the parent gene in human cell lines (Zhang et 386

al., 2013). However, the correlation coefficient between the expression of genes and 387

introns was significantly decreased in dbr1-2 (Supplemental Figure 10D and 10E), 388

consistent with the disturbed processing of lariat RNAs in dbr1-2. 389

To validate identified lariat-derived circular RNAs in vivo, we randomly chose 390

four loci (Figure 6D) for detection by RNA gel blotting. These four lariat-derived 391

circular RNAs represent two types of loci. One type is present in wild type (Col-0) 392

plants, i.e. At4g17390.1 I2 and At3g52590.1 I3 (Figure 6D), while the other only 393

accumulates in the dbr1-2 mutant, i.e. At1g60995.1 I8 and At5g23050.1 I8 (Figure 394

6D). We first analyzed these lariat-derived circular RNAs in denatured agarose gels 395

by RNA gel blotting using the antisense transcript of the intron as the probe (Figure 396

6E). As shown in Figure 6F, these four previously unreported intronic RNAs were 397

detected with expected sizes. At4g17390.1 I2 and At3g52590.1 I3 were detected in 398

Col-0, and At1g60995.1 I8 and At5g23050.1 I8 were only detected in dbr1-2 (Figure 399

Page 14: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

14

6F). Although the levels of mature mRNA of At4g17390.1 and At3g52590.1 were 400

comparable between Col-0 and dbr1-2, the intronic RNA levels of At4g17390.1 I2 401

and At3g52590.1 I3 were significantly higher in dbr1-2 (Figure 6F). Notably, the size 402

of intronic RNAs are much smaller than the mRNA of the parent genes (Figure 6F), 403

indicating that these intronic RNAs are individual transcripts. 404

To further exclude that detected bands seen in Figure 6F are not alternative linear 405

precursor mRNAs or linear individual intronic RNAs, we loaded all RNA samples 406

with three different sized RNA standards (Figure 6G, the most left panel) in the same 407

denatured PAGE gels, and then performed RNA gel blotting using the same probes as 408

in Figure 6E for At4g17390.1 I2 and At3g52590.1 I3, respectively. It is well known 409

that circular RNAs usually migrate much slower than linear RNAs with equivalent 410

sizes in denatured PAGE gels. Consistent with the nature of lariat-derived circular 411

RNAs observed in RNA sequencing, both individual RNAs from At4g17390.1 I2 (290 412

nt) and At3g52590.1 I3 (343 nt), respectively, migrated much more slowly than the 413

linear RNA standards, although their predicted sizes are much smaller than those of 414

the RNA standards (Figure 6G). Moreover, the RNA from At4g17390.1 I2 migrated 415

slightly slower than that the one from At3g52590.1 I3 (Figure 6G), consistent with 416

their size difference. 417

To conclusively demonstrate that these intronic RNAs are circular RNAs, we 418

first treated total RNA using RNase R to degrade linear RNAs (Supplemental Figure 419

10F), and then examined whether the transcripts of At4g17390.1 I2 were retained by 420

RNA gel blotting. As expectedly, there was a distinct band of At4g17390.1 I2 at 421

exactly the same position as in the RNase R-treated samples (Figure 6G), suggesting 422

that this intronic RNA is present in vivo with a circular form. Indeed, we observed 423

that sequencing reads from At4g17390.1 I2, At3g52590.1 I3, At1g60995.1 I8, and 424

At5g23050.1 I8, only covered their respective intronic regions between the 5’ss and 425

the BP, and there were no sequencing reads corresponding to the region between the 426

BP and the 3’ss (the dashed region) (Figure 6D). We thus systemically identified 427

hundreds of previously unidentified but highly abundant lariat-derived circular RNAs 428

in Arabidopsis. 429

Page 15: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

15

430

Lariat accumulated introns accompany increased incidences of exonic 431

back-splicing events 432

Since lariat-derived circular RNAs are formed simultaneously with the maturation of 433

pre-mRNAs, we tested whether the accumulation of lariat RNA affects linear mRNA 434

maturation in plants. It is known that specific sequences in the introns, such as Alu 435

elements in mammalian introns, promote back-splicing of adjacent exons, thus 436

inhibiting the production of linear mature mRNA (Liang and Wilusz, 2014; Zhang et 437

al., 2014; Kramer et al., 2015). However, this mechanism might not be applicable in 438

species that lack noticeable flanking intronic secondary structure, and a subsequent 439

study showed that the formation of double lariats contributes to the occurrence of 440

exonic back-splicing events in yeast (Barrett et al., 2015), indicating that there might 441

be a connection between lariat structure and exon circularization. By analyzing the 442

ratio of back-splicing events of two flanking exons, we showed that the incidence of 443

back-splicing events was significantly correlated to the accumulation of lariat-derived 444

circular RNAs (Figure 7A). Moreover, the correlation between the exonic 445

circularization and intronic accumulation was independent of the position of flanking 446

exons, i.e., both upstream and downstream adjacent exons exhibited increased 447

incidence of back-splicing (Figure 7A). In addition to the correlation with two 448

adjacent exons, the incidence of back-splicing events of the parent gene was also 449

significantly increased with the accumulation of lariat-derived circular RNAs 450

(Supplemental Figure 11A). These results indicate that the rapid elimination of lariat 451

RNAs favors the production of linear mature mRNAs. 452

453

Exclusion of retroelements in lariat RNA accumulated introns 454

To understand sequence features of lariat RNA accumulated introns, we investigated 455

whether the insertion of transposable elements in the intronic regions affected the 456

turnover of lariat RNAs. We used RepeatMasker (Tempel, 2012) to analyze the 457

distributions of TEs in three types of introns, i.e., all 64,058 introns >100 bp, 10,580 458

introns with lariat-derived circular RNA accumulation in Col-0, and 6720 introns with 459

Page 16: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

16

higher accumulation of lariat-derived circular RNAs in dbr1-2. As shown in Figure 460

7B, 6510 introns harbored various types of TEs or repeated sequences, mainly 461

including retroelements (Long Terminal Repeat (LTR) elements, SINE, LINE, ~1% 462

of total length), DNA transposons (~1.5% of total length), and simple repeats (~1.2% 463

of total length). In contrast, introns with accumulation of lariat-derived circular 464

RNAs, were significantly depleted of LTR retroelements and DNA transposons 465

(Figure 7B), but retained the simple repeated sequences (Figure 7B). Especially, those 466

introns with the most abundant lariat-derived circular RNAs (≥50 FPKM in Col-0) 467

were specifically enriched in satellite sequences (Figure 7B). In total, there are 468

retroelements in 486 of all introns with ≥100 bp in the Arabidopsis genome 469

(Supplemental Figure 11B and Supplemental Data Set 6). The ratio of introns with 470

retroelements were significantly reduced in introns with lariat RNA accumulated in 471

Col-0 and in introns with higher expression in dbr1-2 (P = 3.8×10-16

and P = 1.6×10-6

, 472

respectively, Fisher’s exact tests, Supplemental Figure 11B). Compared to naturally 473

accumulated introns in Col-0, the ratio of introns with repeat elements was slightly 474

increased in introns with higher expression in dbr1-2 (P = 0.04, Fisher’s exact tests, 475

Supplemental Figure 11B). These results indicate that the insertion of different classes 476

of TEs might play a role in the turnover of lariat RNAs. 477

Given that introns are longer in more complex eukaryotes, and the insertion of 478

TEs into intronic regions most likely contributes the increase of intron length, we 479

wanted to know if these introns depleted of retroelements are longer than other introns. 480

We named those 486 introns with >100 nt in length and retroelement sequences, as 481

RE-introns, and other introns as non-RE introns. Indeed, RE-introns are significantly 482

longer than other introns (P < 10-100

, Student's t-test) (Figure 7C). Moreover, because 483

the insertion of TEs usually leads to the formation of heterochromatic status of parent 484

genes, which generally inhibits gene expression, we compared the expression levels 485

of RE-introns and non-RE introns. RE-introns themselves had significantly lower 486

expression levels than non-RE introns (P < 10-10

, Mann-Whitney U-test) in all 8 487

RNA-seq profiles (Figure 7D), further suggesting that the presence of retroelements is 488

anti-correlated with lariat RNA accumulation. Furthermore, the expression levels of 489

Page 17: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

17

parent genes with RE-introns were also significantly lower than for non-RE parental 490

genes (P < 10-10

, Mann-Whitney U-test) in all 8 RNA-seq profiles (Supplemental 491

Figure 11C). As shown in Figure 7E, there are 3 retroelements in At2g34880.1 I5, 492

which might contribute to its extremely low expression levels. In contrast, 493

At2g14080.1 I1 only consists of one LINE element, and the expression of both parent 494

gene and intron were much higher than At2g34880 (Figure 7E). Furthermore, 495

At4g39260.1 I1 contained no retroelements and had much higher expression levels 496

than either At2g34880.1 I5 and At2g14080.1 I1 (Figure 7E), and the parent gene 497

At4g39260.1 also had much higher expression levels than At2g34880.1 and 498

At2g14080.1, suggesting that retroelements contribute to the expression levels of both 499

parent genes and their introns. 500

In Col-0 and dbr1-2 RNase R-untreated transcriptomes, we found that the 501

expression level of At2g14080.1 I1 was very limited (Figure 7E), but At2g14080.1 I1 502

expression was abundant in RNase R (+) libraries, especially in dbr1-2 RNase R (+) 503

libraries (Figure 7E), further indicating that DBR1 is responsible for the degradation 504

of At2g14080.1 I1. Therefore, we systemically examined the types of TEs in higher 505

expressed introns in dbr1-2. We found that unlike the exclusion of retroelements and 506

DNA transposons in naturally accumulated introns in Col-0 (Figure 7B), a substantial 507

portion of higher expressed introns in dbr1-2 harbored retroelements and DNA 508

repeats (Supplemental Figure 11D). Collectively, these analyses indicate that 509

retroelement insertion is anti-correlated with the accumulation of lariat RNAs. 510

511

DISCUSSION 512

Although both BP selection and lariat RNA formation are essential during pre-mRNA 513

splicing, the features of BPs and lariat RNA detection have been mostly reported case 514

by case. The first large-scale analysis of BPs was performed in Fairbrother’s lab 515

(Taggart et al., 2012; Taggart et al., 2017), in which high throughput RNA sequencing 516

data from human cell lines was used to find the BP location and to map the 517

distribution of splicing factors around BPs. A subsequent study developed a 518

data-driven algorithm LaSSO (Lariat Sequence Site Origin) to map precisely the 519

Page 18: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

18

location of BPs on a genome-wide scale in yeast (Bitton et al., 2014). With the 520

improvement of circular RNA sequencing, Mercer et al. used RNase R digestion 521

followed by RNA sequencing to enrich sequences that traverse the lariat junction, and 522

provided a first comprehensive map for human BPs (Mercer et al., 2015; Taggart et al., 523

2017). All these studies provide comprehensive knowledge about BPs and lariat 524

RNAs in yeast and humans. However, the understanding about BPs and lariat RNAs 525

in plants was still unexplored. In this study, we utilized a huge number of published 526

RNA-seq datasets from four plant species to extract BPs, and we took advantage of 527

the viability of a weak allele of dbr1 to enrich for lariat RNAs, thus providing a 528

comprehensive view of BPs in both monocots and dicots. Moreover, the systemic 529

identification of lariat-derived circular RNAs in wild type plants opens a research 530

avenue that will allow examination of the unexpected role of intron transcripts. 531

The basic principles of the BP selection are highly conserved from plants to 532

human (Taggart et al., 2012; Bitton et al., 2014; Mercer et al., 2015; Taggart et al., 533

2017; Pineda and Bradley, 2018). First, the BP nucleotide is strictly constrained in 534

distance from the 3’ss. Second, the BP nucleotide exhibits a strong preference for 535

adenine. Third, sequences flanking the BP exhibit U-rich nucleotides. Fourth, uracil is 536

preferred as the second nucleotide upstream of the BP. One of the earliest steps in 537

spliceosome assembly is the binding of SF1 to the BP (Pastuszak et al., 2011), a 538

process for which SF1 requires only the UnA motif, providing a mechanistic 539

explanation for the importance of the U in the second last position before the BP. 540

Although downstream sequences of the BP in plants are U-rich (this study), both 541

U-rich and C-rich downstream sequences in humans (Mercer et al., 2015) indicates 542

that heterogeneity of downstream sequences may enable the sequence-specific 543

selection of multiple BPs by the spliceosome, resulting in more complicated 544

regulation of splicing in larger genomes. Besides the common features in the BP 545

nucleotide and flanking sequences, we also found that multiple BPs (Figure 3) and 546

tissue-specific BP usage (Figure 4) might contribute to the complexity of gene 547

expression in plants. 548

Interestingly, we observed that the accumulation of lariat-derived circular RNAs 549

Page 19: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

19

was correlated with the occurrence of back-splicing events of flanking exonic regions 550

(Figure 7), indicating that quick turnover of lariat RNAs by DBR1 is beneficial for 551

pre-mRNA splicing to favor the production of linear mRNA. Recent mechanistic 552

studies show that intronic complementary sequences (Liang and Wilusz, 2014; Zhang 553

et al., 2014; Kramer et al., 2015), the homodimerization of specific proteins binding to 554

intronic regions (Conn et al., 2015), or potential intronic RNA-RNA interaction 555

(Ivanov et al., 2015), promote exonic back-splicing events. Our finding that the rapid 556

turnover of lariat RNAs prevents back-splicing uncovers a new perspective to 557

understand the biological significance of intron metabolism in gene expression. 558

Identification of lariat RNA binding proteins will provide further mechanistic 559

evidence of the balance between linear mRNA production and intronic circRNA 560

formation. 561

In addition, we observed an anti-correlation between intronic retroelement 562

insertion and lariat RNA accumulation in plants (Figure 7 and Supplemental Figure 563

11). Two possibilities might explain this phenomenon. First, that the insertion of TEs 564

in intronic regions of coding genes usually leads to heterochromatization of the parent 565

gene, and thus the transcription of the parent gene is limited, which finally leads to 566

less production of lariat RNAs. Second, that the transcription of parent genes is quite 567

normal, but their corresponding lariat RNAs with TE sequences are preferentially 568

degraded by DBR1. The latter possibility is consistent with previous findings that 569

DBR1 was initially identified as the regulator for TE transposition in yeast (Karst et 570

al., 2000). Together with our finding that those TE-contained introns were more 571

highly expressed in dbr1-2 (Supplemental Figure 11F), these results suggest that lariat 572

RNAs formed from TE-contained introns might be much more sensitive to DBR1 573

activity. 574

In summary, our work provides a comprehensive map of branchpoints and 575

lariat-derived circular RNAs in four plant species, uncovers features of branchpoints 576

and lariat-derived circular RNAs, shows a potential link between intron metabolism 577

and the evolution of transposable elements, and opens a novel perspective to 578

understand the communication between intronic circular RNAs and exonic circular 579

Page 20: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

20

RNAs. 580

581

MATERIALS AND METHODS 582

Materials and RNA-seq libraries 583

Arabidopsis thaliana Columbia (Col-0) was used as wild type. Seeds of dbr1-2 were 584

generated (Li et al., 2016). Plants were grown in a 16 h light (bulb type: PHILIPS 585

TLD 36W/865, with eight tubes), 8 h dark growth room. Inflorescences were 586

collected for total RNA extraction with Trizol (Amion). Total RNA was treated with a 587

Ribo Zero kit (Epicenter) to obtain ribosomal RNA-depleted RNA (ribo-RNAs), then 588

incubated with or without RNase R (Epicenter) and subjected to phenol:chloroform 589

purification. Purified RNAs were used for library preparation with Illumina TruSeq 590

Stranded Total RNA HT Sample Prep Kit (P/N15031048), and libraries were 591

sequenced with Illumina HiSeq 2500 sequencer at Genergy (Shanghai, China). Two 592

replicates for each sample were performed. The RNA-seq data are deposited in the 593

NCBI GEO database with series accession No. GSE117416. 594

595

RNA gel blotting 596

Total RNA was extracted from inflorescences using Trizol reagent (Invitrogen). 20 µg 597

total RNA was loaded on 1.2% denatured agarose gels or 5% urea-PAGE gels and 598

transferred to a nylon membrane. 32

P α-UTP-labeled antisense RNAs as probes or 599

linear standards were transcribed in vitro using T7 RNA polymerase. Hybridization 600

was performed using hybridization buffer (Ambion), and the signals were detected 601

using Typhoon FLA9500 (GE Healthcare). Primers used for in vitro transcription are 602

listed in Supplemental Table 1. 603

604

Validation of lariat RNAs by RT-PCR follower by Sanger sequencing 605

Lariat RNAs across the BP were detected by RT-PCR as described (Suzuki et al., 606

2006). Total RNA with RNase R treatments were used as templates. cDNA synthesis 607

was carried out using SuperScript III (Invitrogen) with random hexamers. Reaction 608

mixtures were incubated at 30◦C for 10 min, at 42◦C for 120 min, at 50◦C for 30 min, 609

at 60◦C for 30 min, and at 99◦C for 5 min. Then lariat RNAs were obtained by PCR 610

Page 21: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

21

and purified by gel purification for Sanger sequencing to identify the BP. Primer 611

sequences used are listed in Supplemental Table 1. 612

613

Computational analysis of the RNA-seq profiles 614

The RNA-seq libraries were mapped to the genome of Arabidopsis thaliana (version 615

TAIR10) using Cufflinks v2.2.1 (Trapnell et al., 2010). Cuffquant and Cuffnorm of 616

were used to quantify and normalize the FPKM values of the genes, respectively. 617

Correlation coefficients of gene expression levels were calculated for Col-0 and 618

dbr1-2, with and without RNase R digestion. Normalized FPKM values of genes in 619

the Col-0 and dbr1-2 samples without RNase R treatments were compared to find 620

deregulated genes with edgeR (Robinson et al., 2010). Genes with average 621

abundances of at least 5 FPKM in either dbr1-2 or Col-0 and multiple-test corrected 622

P-values smaller than 0.05 were designated as de-regulated genes. Genes with 623

abundances of at least 10 FPKM in at least one of the 8 samples and standard 624

deviation of at least 1 were used for further analyses. The normalized FPKM values 625

plus one were log scaled to calculate the correlation coefficient (CC) values between 626

samples. The CC values were applied to the pheatmap function in the pheatmap 627

library in R to perform hierarchical clustering. These filtered genes were also used to 628

perform Principal Component Analysis (PCA). Log-scaled normalized FPKM values 629

plus one were applied to the prcomp function in the psych library in R to perform 630

PCA. 631

632

Estimation the expression levels of lariat RNAs 633

The “bedtools genomecov” command of bedtools (Quinlan and Hall, 2010) was used 634

to calculate the genome coverage of RNA-seq libraries. A custom program was used 635

to calculate FPKMs (Fragments Per Kilo basepairs per Million sequencing tags) of 636

introns of annotated genes in TAIR10, using the genome coverage results of RNA-seq 637

libraries. To compare global changes of intron expression, the average intron 638

expression levels were calculated for Col-0 and dbr1-2, with and without RNase R 639

treatments. Those intronic transcripts with expression levels of FPKM ≥5 from the 640

Page 22: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

22

Col-0 samples with RNase R treatments were defined lariat RNAs in wild type plants. 641

The differences of intron expression levels for Col-0 and dbr1-2, with and without 642

treatments, were evaluated with the Mann-Whitney U-test. The correlation 643

coefficients of intron expression levels were calculated for the two samples of Col-0 644

and dbr1-2, with and without RNase R treatments. To find de-regulated introns in 645

dbr1-2, introns with at least 5 FPKM in dbr1-2 were kept. Then, the expression levels 646

of introns in dbr1-2 and Col-0 with RNase R treatments, respectively, were used to 647

find de-regulated introns using edgeR (Robinson et al., 2010). The introns with false 648

discovery rate (FDR) values of smaller than 0.05 and log scaled fold change larger 649

than 1 were deemed higher expressed introns in dbr1-2. The introns were selected 650

using the same criteria as genes, then used to perform hierarchical clustering and PCA 651

using the same methods as the genes. 652

653

Correlation between the expression levels of introns and their parent genes 654

The average FPKM values of introns and the average FPKM values of their host 655

genes were used to calculate the correlation coefficient values in the four Col-0 and 656

four dbr1-2 samples without RNase R treatment, respectively. If a gene had more than 657

one intron, the intron closest to the transcription start site was kept. The average 658

FPKM values of introns and average FPKM values of genes should both be larger 659

than 1 or equal to 1. Only introns with at least 200 bp were used for analysis. 660

661

The computational pipeline for identifying BPs 662

Reverse aligned reads to 2'-5'-phosphodiester site regions were identified with a 663

customized computational pipeline (Figure 2A). First, a database of all introns in A. 664

thaliana (for all annotated genes in the TAIR10 database) was generated using a 665

self-written program. Second, RNA-seq profiles were aligned to the genome using 666

TopHat2 for self-generated data sets (Kim et al., 2013) or HISAT2 (Kim et al., 2015) 667

for published data sets, by specifying the unmapped reads using the option 668

"--un-conc". For TopHat2, reads that could not be mapped to the genome were 669

retrieved with bamToFastq in bedtools (Quinlan and Hall, 2010). Then, the unmapped 670

Page 23: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

23

reads were aligned to introns of TAIR10 annotated genes with BLASTN for 671

self-generated RNA-seq data sets, using the options of "-S 1 -e 1e-20", or Bowtie 2 672

(Langmead and Salzberg, 2012) for published RNA-seq data sets, using the options of 673

"--local -q --norc --no-unal -p 32 -a --no-hd --no-sq". Finally, a self-written program 674

was used to check whether the remaining regions of the partially matched reads could 675

also be aligned to the same introns. We selected reads that spanned the 5’ss and the 676

potential BP, requiring that both of the two matched segments in a matched read had 677

at least 6 nucleotides. The BP is then the last nucleotide of the matched segment near 678

the 3' end of the intron. The branch positions that were detected in at least one of the 679

selected RNA-seq libraries were retained for counting the different branch nucleotides 680

of lariat RNAs. 681

To identify BPs in rice (Oryza sativa), tomato (Solanum lycopersicum) and 682

maize (Zea mays), the MSU Rice Genome Annotation for Oryza sativa Nipponbare 683

(release 7), ITGA3.20 annotation for the tomato genome (Tomato Genome 684

Consortium), and annotation of Zea mays cv B73 (version 4), respectively, were used 685

to retrieve the intron sequences. Selected RNA-seq profiles of rice, tomato and maize 686

(as listed in Supplemental Data Set 2) were used to identify BPs in rice using the same 687

method as for Arabidopsis. The regions from 10 bp upstream to 10 bp downstream of 688

the detected BP were used to analyze nucleotide composition. 689

690

Identifying tissue-specific BPs in Arabidopsis and rice 691

The number of supporting reads for identified BPs in Arabidopsis and rice were 692

grouped into different tissues (Supplemental Data Set 4). The five tissues with the 693

largest numbers of supporting reads of BPs were used to identify tissue-specific BPs 694

using the multinomial proportion test (Pineda and Bradley, 2018). The obtained 695

P-values were corrected using the method proposed by Benjamini and Hochberg. BPs 696

with multiple test correlated P-values smaller than 0.05 were deemed tissue-specific 697

BPs. 698

699

Calculating the splicing efficiency 700

Page 24: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

24

The "bedtools genomecov" command of bedtools (Quinlan and Hall, 2010) was used 701

to calculate the genome coverage of the RNA-seq libraries. The maximal number of 702

reads that cover the +10 bp regions of exon-to-intron sites, EI, was calculated. The 703

number of junction reads, JR, was reported by TopHat in the Cufflinks pipeline. The 704

splicing efficiency of a gene was calculated as the log2 value of (EI/JR) as proposed 705

in (Bitton et al., 2014). 706

707

Analysis of transposable elements in introns 708

RepeatMasker (version open-4.0.6) (Tempel, 2012) was used to analyze transposable 709

elements in all introns longer than 100 bp, 10580 introns with lariat accumulation (≥5 710

FPKM) in Col-0, and 6720 introns with higher expression in dbr1-2. RepeatMasker 711

edition of RepBase (Bao et al., 2015) was used in RepeatMasker. 712

713

Accession numbers 714

Sequence data from this paper can be found with the accession numbers listed in 715

Supplemental Data Set 2. The RNA-seq data are deposited in the NCBI GEO database 716

with series accession No. GSE117416. 717

718

SUPPLEMENTAL DATA 719

Supplemental Figure 1. Schematic view of the experimental design. 720

Supplemental Figure 2. Correlation of gene and intron expression levels for each 721

group between two biological replicates. 722

Supplemental Figure 3. Gene expression patterns of the used samples and validation 723

of selected BPs in Arabidopsis. 724

Supplemental Figure 4. Examples of introns using guanines as their BPs in plants. 725

Supplemental Figure 5. Fidelity of BPs during library construction of RNA-seq 726

samples. 727

Supplemental Figure 6. Validation of two BPs in At4g39260.1 I1. 728

Supplemental Figure 7. Validation of the 7 BPs of At3g23590.1 I7. 729

Supplemental Figure 8. Examples of self-circularized introns in tomato and rice. 730

Page 25: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

25

Supplemental Figure 9. Length distribution of self-circularized introns in plants. 731

Supplemental Figure 10. Other characteristics of lariat RNAs in Arabidopsis. 732

Supplemental Figure 11. Back-splicing and expression of parent genes with lariat 733

RNA accumulation. 734

Supplemental Table 1. Primers used in the study. 735

Supplemental Data Set 1. Transcriptome analysis of Col-0 and dbr1-2. 736

Supplemental Data Set 2. List of RNA sequencing datasets used in this study. 737

Supplemental Data Set 3. BPs identified in four plant species. 738

Supplemental Data Set 4. Tissue-specific BPs identified in Arabidopsis and rice. 739

Supplemental Data Set 5. Self-circularized introns identified in four plant species. 740

Supplemental Data Set 6. Introns with retroelements in Arabidopsis. 741

742

Competing interests 743

The authors declare that they have no competing interests. 744

745

AUTHOR'S CONTRIBUTIONS 746

Y.Zheng and B.Z. conceived and designed the research. X.Z. performed most747

bioinformatic analyses, Y.Zhang and T.W., Z.L. performed biological experiments, 748

including preparing samples for RNA-seq and validation of lariat RNAs. H.G., Q.T., 749

and J.C. provided technique help and critical comments on this project. Y.Zheng 750

designed and implemented the computational methods. K.C., L.L., C.L., and J.G. 751

helped to analyze RNA-seq data. Y.Zheng and B.Z. wrote the manuscript. 752

753

ACKNOWLEDGEMENTS 754

We thank Prof. Sheila McCormick for editing. This work was supported by grants of 755

the National Natural Science Foundation of China (No. 31830045, 31671261, and 756

31470281) to BZ, and by a grant (No. 31460295) of National Natural Science 757

Foundation of China to Y. Zheng. 758

759

REFERENCES 760

Page 26: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

26

Armakola, M., Higgins, M.J., Figley, M.D., Barmada, S.J., Scarborough, E.A., 761

Diaz, Z., Fang, X., Shorter, J., Krogan, N.J., Finkbeiner, S., Farese, R.V., 762

Jr., and Gitler, A.D. (2012). Inhibition of RNA lariat debranching enzyme 763

suppresses TDP-43 toxicity in ALS disease models. Nat Genet 44, 1302-1309. 764

Bao, W., Kojima, K.K., and Kohany, O. (2015). Repbase Update, a database of 765

repetitive elements in eukaryotic genomes. Mob DNA 6, 11. 766

Barrett, S.P., Wang, P.L., and Salzman, J. (2015). Circular RNA biogenesis can 767

proceed through an exon-containing lariat precursor. Elife 4, e07540. 768

Benhamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a 769

practical and powerful apporoach to multiple testing. J R Stat Soc Series B 770

Stat Methodol, 289-300. 771

Bitton, D.A., Rallis, C., Jeffares, D.C., Smith, G.C., Chen, Y.Y., Codlin, S., 772

Marguerat, S., and Bahler, J. (2014). LaSSO, a strategy for genome-wide 773

mapping of intronic lariats and branch points using RNA-seq. Genome Res 24, 774

1169-1179. 775

Chen, S., Anderson, K., and Moore, M.J. (2000). Evidence for a linear search in 776

bimolecular 3' splice site AG selection. Proc Natl Acad Sci U S A 97, 777

593-598. 778

Chua, K., and Reed, R. (2001). An upstream AG determines whether a downstream 779

AG is selected during catalytic step II of splicing. Mol Cell Biol 21, 780

1509-1514. 781

Conn, S.J., Pillman, K.A., Toubia, J., Conn, V.M., Salmanidis, M., Phillips, C.A., 782

Roslan, S., Schreiber, A.W., Gregory, P.A., and Goodall, G.J. (2015). The 783

RNA binding protein quaking regulates formation of circRNAs. Cell 160, 784

1125-1134. 785

Dumesic, P.A., Natarajan, P., Chen, C., Drinnenberg, I.A., Schiller, B.J., 786

Thompson, J., Moresco, J.J., Yates, J.R., 3rd, Bartel, D.P., and Madhani, 787

H.D. (2013). Stalled spliceosomes are a signal for RNAi-mediated genome 788

defense. Cell 152, 957-968. 789

Galvis, A.E., Fisher, H.E., Fan, H., and Camerini, D. (2017). Conformational 790

Page 27: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

27

changes in the 5' end of the HIV-1 genome dependent on the debranching 791

enzyme DBR1 during early stages of infection. J Virol 91. 792

Galvis, A.E., Fisher, H.E., Nitta, T., Fan, H., and Camerini, D. (2014). Impairment 793

of HIV-1 cDNA synthesis by DBR1 knockdown. J Virol 88, 7054-7069. 794

Gardner, E.J., Nizami, Z.F., Talbot, C.C., Jr., and Gall, J.G. (2012). Stable 795

intronic sequence RNA (sisRNA), a new class of noncoding RNA from the 796

oocyte nucleus of Xenopus tropicalis. Genes Dev 26, 2550-2559. 797

Gooding, C., Clark, F., Wollerton, M.C., Grellscheid, S.N., Groom, H., and 798

Smith, C.W. (2006). A class of human exons with predicted distant branch 799

points revealed by analysis of AG dinucleotide exclusion zones. Genome Biol 800

7, R1. 801

Ivanov, A., Memczak, S., Wyler, E., Torti, F., Porath, H.T., Orejuela, M.R., 802

Piechotta, M., Levanon, E.Y., Landthaler, M., Dieterich, C., and 803

Rajewsky, N. (2015). Analysis of intron sequences reveals hallmarks of 804

circular RNA biogenesis in animals. Cell Rep 10, 170-177. 805

Jacquier, A., and Rosbash, M. (1986). RNA splicing and intron turnover are greatly 806

diminished by a mutant yeast branch point. Proc Natl Acad Sci U S A 83, 807

5835-5839. 808

Karst, S.M., Rutz, M.L., and Menees, T.M. (2000). The yeast retrotransposons Ty1 809

and Ty3 require the RNA Lariat debranching enzyme, Dbr1p, for efficient 810

accumulation of reverse transcripts. Biochem Biophys Res Commun 268, 811

112-117.812

Kim, D., Langmead, B., and Salzberg, S.L. (2015). HISAT: a fast spliced aligner 813

with low memory requirements. Nat Methods 12, 357-360. 814

Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. 815

(2013). TopHat2: accurate alignment of transcriptomes in the presence of 816

insertions, deletions and gene fusions. Genome Biol 14, R36. 817

Kim, H.C., Kim, G.M., Yang, J.M., and Ki, J.W. (2001). Cloning, expression, and 818

complementation test of the RNA lariat debranching enzyme cDNA from 819

mouse. Mol Cells 11, 198-203. 820

Page 28: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

28

Kim, J.W., Kim, H.C., Kim, G.M., Yang, J.M., Boeke, J.D., and Nam, K. (2000). 821

Human RNA lariat debranching enzyme cDNA complements the phenotypes 822

of Saccharomyces cerevisiae dbr1 and Schizosaccharomyces pombe dbr1 823

mutants. Nucleic Acids Res 28, 3666-3673. 824

Kramer, M.C., Liang, D., Tatomer, D.C., Gold, B., March, Z.M., Cherry, S., and 825

Wilusz, J.E. (2015). Combinatorial control of Drosophila circular RNA 826

expression by intronic repeats, hnRNPs, and SR proteins. Genes Dev 29, 827

2168-2182. 828

Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 829

2. Nat Methods 9, 357-359.830

Li, Z., Wang, S., Cheng, J., Su, C., Zhong, S., Liu, Q., Fang, Y., Yu, Y., Lv, H., 831

Zheng, Y., and Zheng, B. (2016). Intron lariat RNA inhibits microRNA 832

biogenesis by sequestering the dicing complex in Arabidopsis. PLoS Genet 12, 833

e1006422. 834

Liang, D., and Wilusz, J.E. (2014). Short intronic repeat sequences facilitate circular 835

RNA production. Genes Dev 28, 2233-2247. 836

Mercer, T.R., Clark, M.B., Andersen, S.B., Brunck, M.E., Haerty, W., Crawford, 837

J., Taft, R.J., Nielsen, L.K., Dinger, M.E., and Mattick, J.S. (2015). 838

Genome-wide discovery of human splicing branchpoints. Genome Res 25, 839

290-303.840

Meyer, M., Plass, M., Perez-Valle, J., Eyras, E., and Vilardell, J. (2011). 841

Deciphering 3'ss selection in the yeast genome reveals an RNA thermosensor 842

that mediates alternative splicing. Mol Cell 43, 1033-1039. 843

Morgan, J.T., Fink, G.R., and Bartel, D.P. (2019). Excised linear introns regulate 844

growth in yeast. Nature. Epub ahead of print. doi: 845

10.1038/s41586-018-0828-1. 846

Nam, K., Lee, G., Trambley, J., Devine, S.E., and Boeke, J.D. (1997). Severe 847

growth defect in a Schizosaccharomyces pombe mutant defective in intron 848

lariat degradation. Mol Cell Biol 17, 809-818. 849

Okamura, K., Hagen, J.W., Duan, H., Tyler, D.M., and Lai, E.C. (2007). The 850

Page 29: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

29

mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. 851

Cell 130, 89-100. 852

Ooi, S.L., Samarsky, D.A., Fournier, M.J., and Boeke, J.D. (1998). Intronic 853

snoRNA biosynthesis in Saccharomyces cerevisiae depends on the 854

lariat-debranching enzyme: intron length effects and activity of a precursor 855

snoRNA. RNA 4, 1096-1110. 856

Parenteau, J., Maignon, L., Berthoumieux, M., Catala, M., Gagnon, V., and 857

Abou Elela, S. (2019). Introns are mediators of cell response to starvation. 858

Nature. Epub ahead of print. doi: 10.1038/s41586-018-0859-7. 859

Pastuszak, A.W., Joachimiak, M.P., Blanchette, M., Rio, D.C., Brenner, S.E., and 860

Frankel, A.D. (2011). An SF1 affinity model to identify branch point 861

sequences in human introns. Nucleic Acids Res 39, 2344-2356. 862

Pineda, J.M.B., and Bradley, R.K. (2018). Most human introns are recognized via 863

multiple and tissue-specific branchpoints. Genes Dev 32, 577-591. 864

Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for 865

comparing genomic features. Bioinformatics 26, 841-842. 866

Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2010). edgeR: a Bioconductor 867

package for differential expression analysis of digital gene expression data. 868

Bioinformatics 26, 139-140. 869

Ruby, J.G., Jan, C.H., and Bartel, D.P. (2007). Intronic microRNA precursors that 870

bypass Drosha processing. Nature 448, 83-86. 871

Ruskin, B., and Green, M.R. (1985). An RNA processing activity that debranches 872

RNA lariats. Science 229, 135-140. 873

Ruskin, B., Krainer, A.R., Maniatis, T., and Green, M.R. (1984). Excision of an 874

intact intron as a novel lariat structure during pre-mRNA splicing in vitro. Cell 875

38, 317-331. 876

Smith, C.W., Porro, E.B., Patton, J.G., and Nadal-Ginard, B. (1989). Scanning 877

from an independently specified branch point defines the 3' splice site of 878

mammalian introns. Nature 342, 243-247. 879

Suzuki, H., Zuo, Y., Wang, J., Zhang, M.Q., Malhotra, A., and Mayeda, A. 880

Page 30: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

30

(2006). Characterization of RNase R-digested cellular RNA source that 881

consists of lariat and circular RNAs from pre-mRNA splicing. Nucleic Acids 882

Res 34, e63. 883

Taggart, A.J., DeSimone, A.M., Shih, J.S., Filloux, M.E., and Fairbrother, W.G. 884

(2012). Large-scale mapping of branchpoints in human pre-mRNA transcripts 885

in vivo. Nat Struct Mol Biol 19, 719-721. 886

Taggart, A.J., Lin, C.L., Shrestha, B., Heintzelman, C., Kim, S., and Fairbrother, 887

W.G. (2017). Large-scale analysis of branchpoint usage across species and 888

cell lines. Genome Res 27, 639-649. 889

Talhouarne, G.J., and Gall, J.G. (2014). Lariat intronic RNAs in the cytoplasm of 890

Xenopus tropicalis oocytes. RNA 20, 1476-1487. 891

Talhouarne, G.J.S., and Gall, J.G. (2018). Lariat intronic RNAs in the cytoplasm of 892

vertebrate cells. Proc Natl Acad Sci U S A 115, E7970-E7977. 893

Tay, M.L., and Pek, J.W. (2017). Maternally inherited stable intronic sequence RNA 894

triggers a self-reinforcing feedback loop during development. Curr Biol 27, 895

1062-1067. 896

Tempel, S. (2012). Using and understanding RepeatMasker. Methods Mol Biol 859, 897

29-51. 898

Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, 899

M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript 900

assembly and quantification by RNA-Seq reveals unannotated transcripts and 901

isoform switching during cell differentiation. Nat Biotechnol 28, 511-515. 902

Wang, H., Hill, K., and Perry, S.E. (2004). An Arabidopsis RNA lariat debranching 903

enzyme is essential for embryogenesis. J Biol Chem 279, 1468-1473. 904

Ye, Y., De Leon, J., Yokoyama, N., Naidu, Y., and Camerini, D. (2005). DBR1 905

siRNA inhibition of HIV-1 replication. Retrovirology 2, 63. 906

Zhang, S.Y., Clark, N.E., Freije, C.A., Pauwels, E., Taggart, A.J., Okada, S., 907

Mandel, H., Garcia, P., Ciancanelli, M.J., Biran, A., Lafaille, F.G., 908

Tsumura, M., Cobat, A., Luo, J., Volpi, S., Zimmer, B., Sakata, S., Dinis, 909

A., Ohara, O., Garcia Reino, E.J., Dobbs, K., Hasek, M., Holloway, S.P., 910

Page 31: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

31

McCammon, K., Hussong, S.A., DeRosa, N., Van Skike, C.E., Katolik, A., 911

Lorenzo, L., Hyodo, M., Faria, E., Halwani, R., Fukuhara, R., Smith, 912

G.A., Galvan, V., Damha, M.J., Al-Muhsen, S., Itan, Y., Boeke, J.D.,913

Notarangelo, L.D., Studer, L., Kobayashi, M., Diogo, L., Fairbrother, 914

W.G., Abel, L., Rosenberg, B.R., Hart, P.J., Etzioni, A., and Casanova,915

J.L. (2018). Inborn errors of RNA lariat metabolism in humans with brainstem916

viral infection. Cell 172, 952-965 e918. 917

Zhang, X.O., Wang, H.B., Zhang, Y., Lu, X., Chen, L.L., and Yang, L. (2014). 918

Complementary sequence-mediated exon circularization. Cell 159, 134-147. 919

Zhang, Y., Zhang, X.O., Chen, T., Xiang, J.F., Yin, Q.F., Xing, Y.H., Zhu, S., 920

Yang, L., and Chen, L.L. (2013). Circular intronic long noncoding RNAs. 921

Mol Cell 51, 792-806. 922

Zheng, S., Vuong, B.Q., Vaidyanathan, B., Lin, J.Y., Huang, F.T., and 923

Chaudhuri, J. (2015). Non-coding RNA generated following lariat 924

debranching mediates targeting of AID to DNA. Cell 161, 762-773. 925

926

Figure Legends 927

Figure 1. A summary of the results of 8 RNA-seq profiles. 928

(A) A global view of intron expression levels in the 8 RNA-seq profiles. The929

introns >100 bp were used. The average expression levels of Col-0 and dbr1-2, with 930

and without RNase R treatments, were compared using the Mann-Whitney U-test. r1 931

and r2 indicate replicate 1 and replicate 2, respectively. 932

(B) The correlation coefficient between intron expression levels and hierarchical933

clustering analysis. 934

(C) Principal Component Analysis (PCA) based on the intron expression levels.935

(D) The Venn diagram showing the numbers of genes with accumulation of lariat936

RNA in Col-0 and dbr1-2, respectively. The numbers indicate gene numbers with 937

average abundances of intronic transcripts ≥5 FPKM in the Col-0 and dbr1-2. 938

(E) The numbers of down- (blue) and up-regulated (red) introns in dbr1-2 RNase R (+)939

samples when compared to Col-0 RNase R (+) samples. 940

Page 32: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

32

(F) The length of all introns >100 bp and introns de-regulated in dbr1-2 samples with 941

RNase R treatments. 942

943

Figure 2. Characterization of BPs in Arabidopsis, tomato, rice, and maize. 944

(A) The computational pipeline to identify BPs. The sequencing reads were aligned to945

the genome with TopHat (v2). The unmapped reads were aligned to the database of 946

introns with BLASTN. The partially mapped reads were examined to find whether the 947

remaining parts of the same reads could be aligned to the same introns with a 948

self-developed program. The reads supporting the same BP were collectively used to 949

infer the corresponding BP. The results from different RNA-seq profiles were 950

combined. 951

(B) The percentages of different nucleotides as the identified BPs in the four plant952

species, i.e., Arabidopsis, tomato, rice and maize. Numbers indicate the numbers with 953

the indicated nucleotide as the BP, and the percentage indicates the ratio of the 954

indicated BP numbers relative to the total numbers of identified BPs. 955

(C) The distribution of the distance from the BP to the 3’ss in the four plant species,956

i.e., Arabidopsis, tomato, rice and maize. In the distribution of maize, the widths of957

bars from -1000 to -50 are 19 nt, and the widths of bars from -50 to 0 are 1 nt. 958

(D) The nucleotide preferences flanking the BP in the four plant species, i.e.,959

Arabidopsis, tomato, rice and maize. 960

961

Figure 3. Multiple branchpoints in the four plant species. 962

(A) The intron numbers (Y-axis) with indicated BP numbers (X-axis) in the four plant963

species, i.e., Arabidopsis, tomato, rice and maize. 964

(B) The distance distributions of multiple BPs along the intron in the four plant965

species. Only introns with ≥5 lariat reads were counted. The percentages of lariat 966

reads (Y-axis) were calculated by dividing the number of lariat reads for a specific BP 967

by the total number of lariat reads for an intron. 968

969

Figure 4. Tissue-specific BP usage in Arabidopsis and rice. 970

Page 33: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

33

(A) and (B) The percentages of supporting lariat reads at the indicated BPs in 971

different tissues for the 9th

intron of At3g01500.1 (A) and for the first intron of 972

Os04g16748.1 (B). The nucleotides in upper cases and red colors were tissue-specific 973

BPs identified in the current study. The P-values indicate the multiple corrected 974

P-values. The numbers below the sequence of the intron are the positions of the975

identified BPs in the intron from the 3’ss. 976

(C) The summarized information of identified BPs for the 7th

intron of At3g23590.1 977

by RT-PCR followed by Sanger sequencing. The 7th

intron length of At3g23590.1 is 978

311 bp, and the 216th

, 217th

, 216th

…indicate the distance downstream of the 5’ss, and 979

F and R indicate the pair of primers used for RT-PCR to amplify lariat RNAs 980

originated from this intron. 981

(D) The distributions of multiple BPs in different tissues by RT-PCR followed by982

Sanger sequencing. 17, 14, 21, 24, and 28 individual clones were sequenced for roots, 983

seedlings, leaves, inflorescences, and siliques, respectively. The numbers within the 984

circle indicate the clones carrying the indicated BP identified by Sanger sequencing. 985

986

Figure 5. The distributions of the first AGs downstream of the BP in plants. 987

(A) The nucleotide categories of the 3’ss based on the BP in four plant species. The988

four types of the 3’ss based on the BP were classified into (i) the first AG downstream 989

the BP is the 3’ss (blue), (ii) other AG rather than the first AG is the 3’ss (orange), (iii) 990

the 3’ss is non-AG (grey), (iv) the BP is one of the nucleotide of the 3’ss (yellow). 991

(B) The BP of the second intron of At1g70830.1 is exactly located within the 3’ss and992

its supporting reads in two different RNA-seq profiles (SRR1190492 and 993

SRR3234408). 994

(C) The BP of the 8th

intron of Zm0001d013156.2 in maize is exactly located within 995

the 3’ss and its supporting reads in two different RNA-seq profiles (SRR765414 and 996

SRR765622). Numbers in the right indicate the total numbers of supporting reads for 997

each transcript. 998

999

Figure 6. The identification and validation of lariat RNAs in Arabidopsis. 1000

Page 34: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

34

(A) The distribution of total 10580 lariat RNAs identified in Col-0 with different 1001

expression levels. FPKM was used to evaluate the expression level of each lariat RNA, 1002

and the numbers indicate total numbers of lariat RNAs with indicated expression 1003

levels. 1004

(B) The length of all introns >100 bp in the Arabidopsis thaliana genome, introns1005

with lariat RNA expression ≥5 FPKM in Col-0, and introns with lariat RNA introns in 1006

dbr1-2. * indicates a P-value of <10-15

, Welch’s t-test.1007

(C) The length of introns with indicated expression levels of lariat RNAs. * indicates1008

a P-value of <10-15

, Welch’s t-test, compared to the group of 5542 introns with lariat 1009

RNA expression at FPKM5~10 in Col-0. 1010

(D) The genome browser of the abundances of four lariat-derived circular RNAs in1011

the 8 RNA-seq profiles. The dashed regions highlight the region between the BP and 1012

the 3’ss without supporting reads. “Ch4: 9714955-9715275” indicates the genetic 1013

location of the shown region in the chromosome. Numbers in the brackets indicate the 1014

value of the normalized expression level. 1015

(E) Schematic shows the information of the probes used in (F) and (G).1016

(F) RNA gel blotting detected the expression of the four lariat-derived circular RNAs1017

shown in (D). The lower panel (28S and 18S rRNAs were stained by ethidium 1018

bromide) indicates the loading controls, and all total RNA samples were loaded onto 1019

the same denatured agarose gel. The size of each band is indicated in each blot. 1020

(G) RNA gel blotting further detected the lariat-derived circular RNA of At4g17390.11021

I2 and At3g52590.1 I3 in the denatured PAGE gel. Equal amounts of total RNA 1022

samples from Col-0 and dbr1-2 were loaded onto the same urea-PAGE gel. The most 1023

right panel shows the sample of total RNAs treated by RNase R. Linear RNA 1024

standards of 380 nt, 238 nt, and 183 nt were used as the size indicators. The red 1025

arrows indicate the detected bands. 1026

1027

Figure 7. The interplay among exonic back-splicing, intronic TE insertion, and lariat 1028

RNA accumulation. 1029

(A) Fold change of back-splicing incidence at two adjacent exons. * indicates a1030

Page 35: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

35

P-value of <0.001, χ2-test, compared to all introns in the genome.1031

(B) The percentage of different types of TEs in different groups of introns. All introns1032

represent introns >100 nt in the genome, other introns with lariat RNA accumulation 1033

at the level of FPKM 5~10, 10~20, 20~30, 30~50, >50, and introns with up-regulated 1034

lariat RNAs in dbr1-2 as indicated. 1035

(C) The comparison of lengths of RE-introns and non-RE introns.1036

(D) The comparisons of intron expression between RE-introns and non-RE introns. *1037

indicates a P-value smaller than 10-10

(Mann-Whitney U-test). 1038

(E) Three examples of intron with or without retroelements in the 8 RNA-seq profiles.1039

“Ch2: 14713970-14715028” indicates the genetic location of the shown region in the 1040

chromosome. Numbers in the brackets indicate the value of the normalized expression 1041

level. 1042

1043

Page 36: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

Co

l-0_r1

db

r1-2

_r1

db

r1-2

_r2

Co

l-0_r2

Co

l-0_r1

db

r1-2

_r1

db

r1-2

_r2

Co

l-0_r2

Lo

g2 (

FP

KM

+1)

of

intr

on

s

0

5

10

15

20

P = 3.0 × 10-220

P = 0

RNase R (-) RNase R (+)

Col-0_r1

dbr1-2_r1

dbr1-2_r2

Col-0_r2

Col-0_r1

dbr1-2_r1

dbr1-2_r2

Col-0_r2

RN

ase R

(-)

RN

ase R

(+)

Co

l-0_r1

db

r1-2

_r1

db

r1-2

_r2

Co

l-0_r2

Co

l-0_r1

db

r1-2

_r1

db

r1-2

_r2

Co

l-0_r2

RNase R (-) RNase R (+)

0 0.5 1

A B

PC

2 (

17.0

%)

60

40

20

0

-20

-40

-60

-80-150 -100 -50 0 50 100 150

PC1 (73.4%)

dbr1-2

RNase R (+)

Col-0

RNase R (+)

6037 4205 548

Down-regulated

Up-regulated

Col-0 RNase R (-)

Col-0 RNase R (+)

dbr1-2 RNase R (-)

dbr1-2 RNase R (+)

C D

E F

- L

og

10 (F

DR

)

LogFC

30

25

20

15

10

5

0 -8 -4 0 4 8 10

Lo

g2 n

um

be

r o

f in

tro

ns

Length of introns (nt)

0

5

10

15

500 1000 1500 2000 2500 >3000

Down-regulated intronsUp-regulated intronsAll introns

Figure 1. A summary of the results of 8 RNA-seq profiles.

(A) A global view of intron expression levels in the 8 RNA-seq profiles. The introns >100 bp were used.

The average expression levels of Col-0 and dbr1-2, with and without RNase R treatments, were compared

using the Mann-Whitney U-test. r1 and r2 indicate replicate 1 and replicate 2, respectively.

(B) The correlation coefficient between intron expression levels and hierarchical clustering analysis.

(C) Principal Component Analysis (PCA) based on the intron expression levels.

(D) The Venn diagram showing the numbers of genes with accumulation of lariat RNA in Col-0 and dbr1-2,

respectively. The numbers indicate gene numbers with average abundances of intronic transcripts

≥5 FPKM in the Col-0 and dbr1-2.

(E) The numbers of down- (blue) and up-regulated (red) introns in dbr1-2 RNase R (+) samples when

Compared to Col-0 RNase R (+) samples.

(F) The length of all introns >100 bp and introns de-regulated in dbr1-2 samples with RNase R treatments.

Page 37: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

1. Construct a database of

introns of annotated genes

2. Align reads to the genome

with TopHat2 or HISAT2

3. Align unmapped reads to

the intron database with

BLASTN or BOWTIE2

4. Align unmapped regions

of partially mapped reads

to the same intron

5. Combine the results from

different RNA seq libraries

10035,

72.3%

2415,

17.4% 1054,

7.6%

368,

2.7%

A BArabidopsis Tomato

A

U

G

C 2469,

18.3% 5099,

17.2%

19785,

66.9% 7073,

52.4% 2616,

19.4%

1329,

9.9%

3238,

10.9%

1460,

4.9%

Rice Maize

Distance from the BP (nt)

-10 -5 0 5 10

1.0

0.8

0.6

0.4

0.2

0

Raw

nu

cle

oti

de

fra

cti

on

A

C

G

U

D

C

BP

Fre

qu

en

cy

Distance from the 3’ss (nt)

-100 -80 -60 -40 -20 00

300

600

900

100

200

300

400

0 0

100

200

300

400

800

1200

1600

0

Distance from the 3’ss (nt)

-100 -80 -60 -40 -20 0

Distance from the 3’ss (nt)

-100 -80 -60 -40 -20 0

Distance from the 3’ss (nt)

-1000 -500 -50 -25 0

Distance from the BP (nt)

-10 -5 0 5 10

Distance from the BP (nt)

-10 -5 0 5 10

Distance from the BP (nt)

-10 -5 0 5 10

Rice Maize Tomato Arabidopsis

855,

16.4%,

3552,

68.3%

519,

10%

273,

5.3%

Arabidopsis Tomato Rice Maize

-200

Figure 2. Characterization of BPs in Arabidopsis, tomato, rice, and maize.

(A) The computational pipeline to identify BPs. The sequencing reads were aligned to the genome with

TopHat (v2). The unmapped reads were aligned to the database of introns with BLASTN. The partially

mapped reads were examined to find whether the remaining parts of the same reads could be aligned to

the same introns with a self-developed program. The reads supporting the same BP were collectively used

to infer the corresponding BP. The results from different RNA-seq profiles were combined.

(B) The percentages of different nucleotides as the identified BPs in the four plant species, i.e.,

Arabidopsis, tomato, rice and maize. Numbers indicate the numbers with the indicated nucleotide as the

BP, and the percentage indicates the ratio of the indicated BP numbers relative to the total numbers of

identified BPs.

(C) The distribution of the distance from the BP to the 3’ss in the four plant species, i.e., Arabidopsis,

tomato, rice and maize. In the distribution of maize, the widths of bars from -1000 to -50 are 19 nt, and the

widths of bars from -50 to 0 are 1 nt.(D) The nucleotide preferences flanking the BP in the four plant

species, i.e., Arabidopsis, tomato, rice and maize.

Page 38: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

B

Numbers of BP per intron

Nu

mb

ers

of

intr

on

s

8000

6000

4000

2000

1 2 3 4 5 6 7 8 9 10

A

16000

12000

8000

4000

3000

1000

0

2000

8000

6000

4000

2000

0 0 0

Numbers of BP per intron

1 2 3 4 5 6 7 8 9 10

Numbers of BP per intron

1 2 3 4 5 6 7 8 9 10

Numbers of BP per intron

1 2 3 4 5 6 7 8 9 10

La

riat

read

s (

%)

La

riat

read

s (

%)

80

60

40

20

0

100

80

60

40

20

0

100

-100 -80 -60 -40 -20

Distance from the BP to the 3’ss (nt)

0 -200 -150 -100 -50

Distance from the BP to the 3’ss (nt)

0

La

riat

read

s (

%)

La

riat

read

s (

%)

80

60

40

20

0

100

80

60

40

20

0

100

Arabidopsis Tomato Rice Maize

Arabidopsis Tomato

Rice Maize

-100 -80 -60 -40 -20

Distance from the BP to the 3’ss (nt)

0 -100 -80 -60 -40 -20

Distance from the BP to the 3’ss (nt)

0

A

C

G

U

Figure 3. Multiple branchpoints in the four plant species.

(A) The intron numbers (Y-axis) with indicated BP numbers (X-axis) in the four plant species, i.e.,

Arabidopsis, tomato, rice and maize.

(B) The distance distributions of multiple BPs along the intron in the four plant species. Only introns with

≥5 lariat reads were counted. The percentages of lariat reads (Y-axis) were calculated by dividing the

number of lariat reads for a specific BP by the total number of lariat reads for an intron.

Page 39: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

C

A

D

At3g01500.1, the 9th intron

-154 -34 -33 5’ss 3’ss

B

Os04g16748.1, the 1st intron

100

80

60

40

20

0

La

riat

read

s (

%)

Inflorescences (n = 26)

Leaves (n =33)

Callus (n = 0)

Roots (n= 0)

Seedlings (n = 21)

-700 -87 -7 5’ss 3’ss

100

80

60

40

20

0

La

riat

read

s (

%)

Nematode-induced giant cells (n = 54)

Panicle (n =19)

Roots (n = 18)

Shoots (n= 2745)

Vascular cells (n = 2)

216th C 217th T 220th A 224th A 285th T 287th T 223th T

Roots (17 clones) Seedlings (14 clones) Leaves (21 clones) Inflorescences (24 clones) Siliques (28 clones)

3

5

6

2 1

5

9

6

14

1

11

2

6

2

2 1

23

4

1

311 bp

C T

216th 217th F R

AT3G23590.1, the 7th intron

A A T T

220th 224th 285th 287th

T

223th

Exon 7 Exon 8

P = 3.0 × 10-60

P = 9.8 × 10-42

P = 1.6 × 10-8

P = 3.3 × 10-11

P = 1.6× 10-65

P = 5.6 × 10-43

Figure 4. Tissue-specific BP usage in Arabidopsis and rice.

(A) and (B) The percentages of supporting lariat reads at the indicated BPs in different tissues for the 9th

intron of At3g01500.1 (A) and for the first intron of Os04g16748.1 (B). The nucleotides in upper cases and

red colors were tissue-specific BPs identified in the current study. The P-values indicate the multiple

corrected P-values. The numbers below the sequence of the intron are the positions of the identified BPs

in the intron from the 3’ss.

(C) The summarized information of identified BPs for the 7th intron of At3g23590.1 by RT-PCR followed by

Sanger sequencing. The 7th intron length of At3g23590.1 is 311 bp, and the 216th, 217th, 216th…indicate

the distance downstream of the 5’ss, and F and R indicate the pair of primers used for RT-PCR to amplify

lariat RNAs originated from this intron.

(D) The distributions of multiple BPs in different tissues by RT-PCR followed by Sanger sequencing. 17, 14,

21, 24, and 28 individual clones were sequenced for roots, seedlings, leaves, inflorescences, and siliques,

respectively. The numbers within the circle indicate the clones carrying the indicated BP identified by

Sanger sequencing.

Page 40: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

C

5’ss 3’ss

Zm0001d013156.2, the 8th intronexonexon

1.5%, 82 0.9%, 46 2.0%, 269 0.8%, 107 2.8%, 393

Arabidopsis Tomato Maize Rice

5’ss 3’ss

exonAt1g70830.5, the 2nd intron

SR

R3234408

S

RR

1190492

B

A

48.2%, 6504

18.5%, 2493 31.3%, 4221

41.4%, 12254 54.8%, 16208 38.2%, 5305 58.2%, 8067 43.4%, 2254 54.2%, 2817

1.5%, 429 2.3%, 691

exon

The BP is the 3’ss The 3’ss is not AG other AG is the 3’ss The 1st AG is the 3’ss

SR

R765414

S

RR

765622

Figure 5. The distributions of the first AGs downstream of the BP in plants.

(A) The nucleotide categories of the 3’ss based on the BP in four plant species. The four types of the 3’ss

based on the BP were classified into (i) the first AG downstream the BP is the 3’ss (blue), (ii) other AG

rather than the first AG is the 3’ss (orange), (iii) the 3’ss is non-AG (grey), (iv) the BP is one of the

nucleotide of the 3’ss (yellow).

(B) The BP of the second intron of At1g70830.1 is exactly located within the 3’ss and its supporting reads

in two different RNA-seq profiles (SRR1190492 and SRR3234408).

(C) The BP of the 8th intron of Zm0001d013156.2 in maize is exactly located within the 3’ss and its

supporting reads in two different RNA-seq profiles (SRR765414 and SRR765622). Numbers in the right

indicate the total numbers of supporting reads for each transcript.

Page 41: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

At4g17390 At3g52590 At1g60995 At5g23050

mRNA mRNA lariat

intron 2 lariat

intron 3 lariat

intron 8 lariat

intron 8

G

Co

l-0

db

r1-2

Co

l-0

db

r1-2

290 nt 343 nt

375 nt 183 nt

Co

l-0

db

r1-2

Co

l-0

db

r1-2

RNA standards Col-0 dbr1-2 Col-0 dbr1-2 Col-0 dbr1-2

At3g52590.1 I3

RNase R (-) RNase R (+) RNase R (-)

E

F

380 nt

238 nt

183 nt

Co

l-0

db

r1-2

Co

l-0

db

r1-2

BP Exon Exon

probe for intron probe for mRNA

1033 nt

927 nt

343 nt 290 nt

At4g17390.1 I2

At4g17390.1 I2 At3g52590.1 I3 At1g60995.1 I8 At5g23050.1 I8

[0-3000]

[0-1500] [0-150]

[0-500]

[0-150]

[0-1000]

[0-150]

[0-800]

Ch4: 9714955-9715275 Ch3: 19506133-19506511 Ch1: 22467482-22467907 Ch5: 7732986-7733197

Col-0_r1

Col-0_r2

dbr1-2_r1

dbr1-2_r2

Col-0_r1

Col-0_r2

dbr1-2_r1

dbr1-2_r2 RN

ase R

(+)

RN

ase R

(-)

D Expression level (FPKM) of lariat RNAs

300

340

380

420

Intr

on

len

gth

(n

t)

8

12

16

Lo

g2(l

en

gth

of

intr

on

s)

10

14

All introns >100 nt

Introns expressed in Col-0

Up-regulated introns

in dbr1-2

FPKM 5~10

52.6%, 5542

FPKM 10~20

33.3%, 3515

FPKM 20~30

8%, 848

FPKM 30~50

4.1%, 436

FPKM >50

2%, 205

A B C

5’ss 3’ss 5’ss 5’ss 5’ss 3’ss 3’ss 3’ss

Figure 6. The identification and validation of lariat RNAs in Arabidopsis.

The distribution of total 10580 lariat RNAs identified in Col-0 with different expression levels. FPKM was

used to evaluate the expression level of each lariat RNA, and the numbers indicate total numbers of lariat

RNAs with indicated expression levels.

The length of all introns >100 bp in the Arabidopsis thaliana genome, introns with lariat RNA expression ≥5

FPKM in Col-0, and introns with lariat RNA introns in dbr1-2. * indicates a P-value of <10-15, Welch’s t-test.

The length of introns with indicated expression levels of lariat RNAs. * indicates a P-value of <10-15,

Welch’s t-test, compared to the group of 5542 introns with lariat RNA expression at FPKM5~10 in Col-0.

(D) The genome browser of the abundances of four lariat-derived circular RNAs in the 8 RNA-seq profiles.

The dashed regions highlight the region between the BP and the 3’ss without supporting reads. “Ch4:

9714955-9715275” indicates the genetic location of the shown region in the chromosome. Numbers in the

brackets indicate the value of the normalized expression level.

(E) Schematic shows the information of the probes used in (F) and (G).

(F) RNA gel blotting detected the expression of the four lariat-derived circular RNAs shown in (D). The

lower panel (28S and 18S rRNAs were stained by ethidium bromide) indicates the loading controls, and all

total RNA samples were loaded onto the same denatured agarose gel. The size of each band is indicated

in each blot.

(G) RNA gel blotting further detected the lariat-derived circular RNA of At4g17390.1 I2 and At3g52590.1 I3

in the denatured PAGE gel. Equal amounts of total RNA samples from Col-0 and dbr1-2 were loaded onto

the same urea-PAGE gel. The most right panel shows the sample of total RNAs treated by RNase R.

Linear RNA standards of 380 nt, 238 nt, and 183 nt were used as the size indicators. The red arrows

indicate the detected bands.

Page 42: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

2

3

4

5

1

0 5’ exon 5’ and 3’exon 3’exon

Fo

ld c

ha

ng

e o

f exo

nic

cir

cu

lari

zati

on

2

3

4

5

1

0 Perc

en

t o

f d

iffe

ren

t T

E t

yp

es i

n in

tro

ns

Introns with indicated expression

level of lariat RNA (FPKM)

10

12

14

16

8

Lo

g2(l

en

gth

of

intr

on

)

Non-RE

introns

RE

introns

P < 10-100

A B C

15

At2g34880.1 I5 At2g14080.1 I1 At4g39260.1 I1

[0-100]

[0-100] [0-100]

[0-100]

[0-500]

[0-500]

Ch2:

14713970-14715028

Ch2:

5925824-5926195

Ch3:

18274568-18274850

Col-0_r1

Col-0_r1

dbr1-2_r1

dbr1-2_r2

Col-0_r1

Col-0_r2

dbr1-2_r1

dbr1-2_r2

RN

ase R

(+)

RN

ase R

(-)

LTR

LINE Co

l-0_r1

Co

l-0_r2

db

r1-2

_r1

db

r1-2

_r2

Co

l-0_r1

Co

l-0_r2

db

r1-2

_r1

db

r1-2

_r2

RNase R(+) RNase R(-)

Non-RE introns

RE introns

10

5

0 Exp

ressio

n o

f in

tro

ns

(lo

g10(F

PK

M+

1))

D E

SINEs

LINEs

LTR elements

DNA transposons

Unclassified TE

Satellites

Simple repeats

Low complexity

All introns

Average FPKM 10~20

Average FPKM 20~30

Average FPKM 30~50

Average FPKM >50

Figure 7. The interplay among exonic back-splicing, intronic TE insertion, and lariat RNA accumulation.

(A) Fold change of back-splicing incidence at two adjacent exons. * indicates a P-value of <0.001, χ2-test,

compared to all introns in the genome.

(B) The percentage of different types of TEs in different groups of introns. All introns represent introns

>100 nt in the genome, other introns with lariat RNA accumulation at the level of FPKM 5~10, 10~20,

20~30, 30~50, >50, and introns with up-regulated lariat RNAs in dbr1-2 as indicated.

(C) The comparison of lengths of RE-introns and non-RE introns.

(D) The comparisons of intron expression between RE-introns and non-RE introns. * indicates a P-value

smaller than 10-10 (Mann-Whitney U-test).

(E) Three examples of intron with or without retroelements in the 8 RNA-seq profiles. “Ch2: 14713970-

14715028” indicates the genetic location of the shown region in the chromosome. Numbers in the brackets

indicate the value of the normalized expression level.

Page 43: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

Parsed CitationsArmakola, M., Higgins, M.J., Figley, M.D., Barmada, S.J., Scarborough, E.A., Diaz, Z., Fang, X., Shorter, J., Krogan, N.J., Finkbeiner, S.,Farese, R.V., Jr., and Gitler, A.D. (2012). Inhibition of RNA lariat debranching enzyme suppresses TDP-43 toxicity in ALS diseasemodels. Nat Genet 44, 1302-1309.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Bao, W., Kojima, K.K., and Kohany, O. (2015). Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA 6,11.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Barrett, S.P., Wang, P.L., and Salzman, J. (2015). Circular RNA biogenesis can proceed through an exon-containing lariat precursor.Elife 4, e07540.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Benhamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful apporoach to multiple testing. J RStat Soc Series B Stat Methodol, 289-300.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Bitton, D.A., Rallis, C., Jeffares, D.C., Smith, G.C., Chen, Y.Y., Codlin, S., Marguerat, S., and Bahler, J. (2014). LaSSO, a strategy forgenome-wide mapping of intronic lariats and branch points using RNA-seq. Genome Res 24, 1169-1179.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Chen, S., Anderson, K., and Moore, M.J. (2000). Evidence for a linear search in bimolecular 3' splice site AG selection. Proc Natl AcadSci U S A 97, 593-598.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Chua, K., and Reed, R. (2001). An upstream AG determines whether a downstream AG is selected during catalytic step II of splicing.Mol Cell Biol 21, 1509-1514.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Conn, S.J., Pillman, K.A., Toubia, J., Conn, V.M., Salmanidis, M., Phillips, C.A., Roslan, S., Schreiber, A.W., Gregory, P.A., and Goodall,G.J. (2015). The RNA binding protein quaking regulates formation of circRNAs. Cell 160, 1125-1134.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Dumesic, P.A., Natarajan, P., Chen, C., Drinnenberg, I.A., Schiller, B.J., Thompson, J., Moresco, J.J., Yates, J.R., 3rd, Bartel, D.P., andMadhani, H.D. (2013). Stalled spliceosomes are a signal for RNAi-mediated genome defense. Cell 152, 957-968.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Galvis, A.E., Fisher, H.E., Fan, H., and Camerini, D. (2017). Conformational changes in the 5' end of the HIV-1 genome dependent on thedebranching enzyme DBR1 during early stages of infection. J Virol 91.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Galvis, A.E., Fisher, H.E., Nitta, T., Fan, H., and Camerini, D. (2014). Impairment of HIV-1 cDNA synthesis by DBR1 knockdown. J Virol 88,7054-7069.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Gardner, E.J., Nizami, Z.F., Talbot, C.C., Jr., and Gall, J.G. (2012). Stable intronic sequence RNA (sisRNA), a new class of noncodingRNA from the oocyte nucleus of Xenopus tropicalis. Genes Dev 26, 2550-2559.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Gooding, C., Clark, F., Wollerton, M.C., Grellscheid, S.N., Groom, H., and Smith, C.W. (2006). A class of human exons with predicteddistant branch points revealed by analysis of AG dinucleotide exclusion zones. Genome Biol 7, R1.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ivanov, A., Memczak, S., Wyler, E., Torti, F., Porath, H.T., Orejuela, M.R., Piechotta, M., Levanon, E.Y., Landthaler, M., Dieterich, C., andRajewsky, N. (2015). Analysis of intron sequences reveals hallmarks of circular RNA biogenesis in animals. Cell Rep 10, 170-177.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Jacquier, A., and Rosbash, M. (1986). RNA splicing and intron turnover are greatly diminished by a mutant yeast branch point. Proc Natl

Page 44: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

Acad Sci U S A 83, 5835-5839.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Karst, S.M., Rutz, M.L., and Menees, T.M. (2000). The yeast retrotransposons Ty1 and Ty3 require the RNA Lariat debranching enzyme,Dbr1p, for efficient accumulation of reverse transcripts. Biochem Biophys Res Commun 268, 112-117.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Kim, D., Langmead, B., and Salzberg, S.L. (2015). HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12, 357-360.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013). TopHat2: accurate alignment of transcriptomes inthe presence of insertions, deletions and gene fusions. Genome Biol 14, R36.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Kim, H.C., Kim, G.M., Yang, J.M., and Ki, J.W. (2001). Cloning, expression, and complementation test of the RNA lariat debranchingenzyme cDNA from mouse. Mol Cells 11, 198-203.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Kim, J.W., Kim, H.C., Kim, G.M., Yang, J.M., Boeke, J.D., and Nam, K. (2000). Human RNA lariat debranching enzyme cDNA complementsthe phenotypes of Saccharomyces cerevisiae dbr1 and Schizosaccharomyces pombe dbr1 mutants. Nucleic Acids Res 28, 3666-3673.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Kramer, M.C., Liang, D., Tatomer, D.C., Gold, B., March, Z.M., Cherry, S., and Wilusz, J.E. (2015). Combinatorial control of Drosophilacircular RNA expression by intronic repeats, hnRNPs, and SR proteins. Genes Dev 29, 2168-2182.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357-359.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Li, Z., Wang, S., Cheng, J., Su, C., Zhong, S., Liu, Q., Fang, Y., Yu, Y., Lv, H., Zheng, Y., and Zheng, B. (2016). Intron lariat RNA inhibitsmicroRNA biogenesis by sequestering the dicing complex in Arabidopsis. PLoS Genet 12, e1006422.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Liang, D., and Wilusz, J.E. (2014). Short intronic repeat sequences facilitate circular RNA production. Genes Dev 28, 2233-2247.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Mercer, T.R., Clark, M.B., Andersen, S.B., Brunck, M.E., Haerty, W., Crawford, J., Taft, R.J., Nielsen, L.K., Dinger, M.E., and Mattick, J.S.(2015). Genome-wide discovery of human splicing branchpoints. Genome Res 25, 290-303.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Meyer, M., Plass, M., Perez-Valle, J., Eyras, E., and Vilardell, J. (2011). Deciphering 3'ss selection in the yeast genome reveals an RNAthermosensor that mediates alternative splicing. Mol Cell 43, 1033-1039.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Morgan, J.T., Fink, G.R., and Bartel, D.P. (2019). Excised linear introns regulate growth in yeast. Nature. Epub ahead of print. doi:10.1038/s41586-018-0828-1.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Nam, K., Lee, G., Trambley, J., Devine, S.E., and Boeke, J.D. (1997). Severe growth defect in a Schizosaccharomyces pombe mutantdefective in intron lariat degradation. Mol Cell Biol 17, 809-818.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Okamura, K., Hagen, J.W., Duan, H., Tyler, D.M., and Lai, E.C. (2007). The mirtron pathway generates microRNA-class regulatory RNAsin Drosophila. Cell 130, 89-100.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ooi, S.L., Samarsky, D.A., Fournier, M.J., and Boeke, J.D. (1998). Intronic snoRNA biosynthesis in Saccharomyces cerevisiae dependson the lariat-debranching enzyme: intron length effects and activity of a precursor snoRNA. RNA 4, 1096-1110.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Page 45: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

Parenteau, J., Maignon, L., Berthoumieux, M., Catala, M., Gagnon, V., and Abou Elela, S. (2019). Introns are mediators of cell responseto starvation. Nature. Epub ahead of print. doi: 10.1038/s41586-018-0859-7.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Pastuszak, A.W., Joachimiak, M.P., Blanchette, M., Rio, D.C., Brenner, S.E., and Frankel, A.D. (2011). An SF1 affinity model to identifybranch point sequences in human introns. Nucleic Acids Res 39, 2344-2356.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Pineda, J.M.B., and Bradley, R.K. (2018). Most human introns are recognized via multiple and tissue-specific branchpoints. Genes Dev32, 577-591.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841-842.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2010). edgeR: a Bioconductor package for differential expression analysis of digitalgene expression data. Bioinformatics 26, 139-140.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ruby, J.G., Jan, C.H., and Bartel, D.P. (2007). Intronic microRNA precursors that bypass Drosha processing. Nature 448, 83-86.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ruskin, B., and Green, M.R. (1985). An RNA processing activity that debranches RNA lariats. Science 229, 135-140.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ruskin, B., Krainer, A.R., Maniatis, T., and Green, M.R. (1984). Excision of an intact intron as a novel lariat structure during pre-mRNAsplicing in vitro. Cell 38, 317-331.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Smith, C.W., Porro, E.B., Patton, J.G., and Nadal-Ginard, B. (1989). Scanning from an independently specified branch point defines the3' splice site of mammalian introns. Nature 342, 243-247.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Suzuki, H., Zuo, Y., Wang, J., Zhang, M.Q., Malhotra, A., and Mayeda, A. (2006). Characterization of RNase R-digested cellular RNAsource that consists of lariat and circular RNAs from pre-mRNA splicing. Nucleic Acids Res 34, e63.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Taggart, A.J., DeSimone, A.M., Shih, J.S., Filloux, M.E., and Fairbrother, W.G. (2012). Large-scale mapping of branchpoints in humanpre-mRNA transcripts in vivo. Nat Struct Mol Biol 19, 719-721.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Taggart, A.J., Lin, C.L., Shrestha, B., Heintzelman, C., Kim, S., and Fairbrother, W.G. (2017). Large-scale analysis of branchpoint usageacross species and cell lines. Genome Res 27, 639-649.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Talhouarne, G.J., and Gall, J.G. (2014). Lariat intronic RNAs in the cytoplasm of Xenopus tropicalis oocytes. RNA 20, 1476-1487.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Talhouarne, G.J.S., and Gall, J.G. (2018). Lariat intronic RNAs in the cytoplasm of vertebrate cells. Proc Natl Acad Sci U S A 115, E7970-E7977.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Tay, M.L., and Pek, J.W. (2017). Maternally inherited stable intronic sequence RNA triggers a self-reinforcing feedback loop duringdevelopment. Curr Biol 27, 1062-1067.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Tempel, S. (2012). Using and understanding RepeatMasker. Methods Mol Biol 859, 29-51.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Page 46: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010).Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.Nat Biotechnol 28, 511-515.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Wang, H., Hill, K., and Perry, S.E. (2004). An Arabidopsis RNA lariat debranching enzyme is essential for embryogenesis. J Biol Chem279, 1468-1473.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ye, Y., De Leon, J., Yokoyama, N., Naidu, Y., and Camerini, D. (2005). DBR1 siRNA inhibition of HIV-1 replication. Retrovirology 2, 63.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Zhang, S.Y., Clark, N.E., Freije, C.A., Pauwels, E., Taggart, A.J., Okada, S., Mandel, H., Garcia, P., Ciancanelli, M.J., Biran, A., Lafaille,F.G., Tsumura, M., Cobat, A., Luo, J., Volpi, S., Zimmer, B., Sakata, S., Dinis, A., Ohara, O., Garcia Reino, E.J., Dobbs, K., Hasek, M.,Holloway, S.P., McCammon, K., Hussong, S.A., DeRosa, N., Van Skike, C.E., Katolik, A., Lorenzo, L., Hyodo, M., Faria, E., Halwani, R.,Fukuhara, R., Smith, G.A., Galvan, V., Damha, M.J., Al-Muhsen, S., Itan, Y., Boeke, J.D., Notarangelo, L.D., Studer, L., Kobayashi, M.,Diogo, L., Fairbrother, W.G., Abel, L., Rosenberg, B.R., Hart, P.J., Etzioni, A., and Casanova, J.L. (2018). Inborn errors of RNA lariatmetabolism in humans with brainstem viral infection. Cell 172, 952-965 e918.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Zhang, X.O., Wang, H.B., Zhang, Y., Lu, X., Chen, L.L., and Yang, L. (2014). Complementary sequence-mediated exon circularization.Cell 159, 134-147.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Zhang, Y., Zhang, X.O., Chen, T., Xiang, J.F., Yin, Q.F., Xing, Y.H., Zhu, S., Yang, L., and Chen, L.L. (2013). Circular intronic longnoncoding RNAs. Mol Cell 51, 792-806.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Zheng, S., Vuong, B.Q., Vaidyanathan, B., Lin, J.Y., Huang, F.T., and Chaudhuri, J. (2015). Non-coding RNA generated following lariatdebranching mediates targeting of AID to DNA. Cell 161, 762-773.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Page 47: A comprehensive map of intron branchpoints and …...2019/04/17  · 1 1 LARGE-SCALE BIOLOGY 2 3 A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 4 Plants 5 6 Xiaotuo

DOI 10.1105/tpc.18.00711; originally published online March 20, 2019;Plant Cell

Li Liu, Chenyu Lu, Junqiang Guo, Binglian Zheng and Yun ZhengXiaotuo Zhang, Yong Zhang, Taiyun Wang, Ziwei Li, Jinping Cheng, Haoran Ge, Qi Tang, Kun Chen,

A comprehensive map of intron branchpoints and lariat RNAs in plants

 This information is current as of December 6, 2020

 

Supplemental Data /content/suppl/2019/03/20/tpc.18.00711.DC1.html /content/suppl/2019/03/31/tpc.18.00711.DC2.html

Permissions X

https://www.copyright.com/ccc/openurl.do?sid=pd_hw1532298X&issn=1532298X&WT.mc_id=pd_hw1532298

eTOCs http://www.plantcell.org/cgi/alerts/ctmain

Sign up for eTOCs at:

CiteTrack Alerts http://www.plantcell.org/cgi/alerts/ctmain

Sign up for CiteTrack Alerts at:

Subscription Information http://www.aspb.org/publications/subscriptions.cfm

is available at:Plant Physiology and The Plant CellSubscription Information for

ADVANCING THE SCIENCE OF PLANT BIOLOGY © American Society of Plant Biologists