46
Computational pipeline Global analysis Characterization of 50 Circular RNAs are a large class of animal RNAs with regulatory potency S Memczak * , M Jens * , A Elefsinioti * , F Torti * , J Krueger, A Rybak, L Maier, S D Mackowiak, L H Gregersen, M Munschauer, A Loewer, U Ziebold, M Landthaler, C Kocks, F le Noble, and N Rajewsky April 30, 2013 ciRNA April 30, 2013 1 / 47

Comparing the early ciRNA papers

Embed Size (px)

DESCRIPTION

Lab meeting presentation on the early ciRNA papers, with details on what they found and how they did it. Mostly discussing: WHITE SLIDES Memczak,S. et al. (2013) Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. ORANGE SLIDES Jeck,W.R. et al. (2012) Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA. All figures taken from respective papers.

Citation preview

Page 1: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Circular RNAs are a large class of animal RNAs withregulatory potency

S Memczak∗, M Jens∗, A Elefsinioti∗, F Torti∗, J Krueger, A Rybak, L Maier, SD Mackowiak, L H Gregersen, M Munschauer, A Loewer, U Ziebold, M

Landthaler, C Kocks, F le Noble, and N Rajewsky

April 30, 2013

ciRNA April 30, 2013 1 / 47

Page 2: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Basic principles

B used 2x76bp run data generated after ribosomal RNA depletion and randompriming

B screened RNA sequencing reads for splice junctions formed by an acceptorsplice site at the 5’ end of an exon and a donor site at a downstream 3’ end(head-to-tail)

ciRNA April 30, 2013 2 / 47

Page 3: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

More details

1 filtered out reads that aligned (with bowtie2) contiguously and full-length tothe genome

2 From unmapped reads (including normal spliced reads) extracted 20mersfrom both ends and aligned them independently to find unique anchorpositions within spliced exons. Anchors that aligned in the reversedorientation (head-to-tail) indicated circRNA splicing (normal splicing was alsofound by consecutive anchors).

3 Extended the anchor alignments such that the complete read aligns and thebreakpoints were flanked by GU/AG splice sites. Ambiguous breakpoints werediscarded.

4 The resulting alignments were read by another custom script that jointlyevaluates consecutive anchor alignments belonging to the same original read,performs extensions of the anchor alignments, and collects statistics on splicesites. After the run completes, the script outputs all detected splice junctions(linear and circular) in a UCSC BED-like format with extra columns holdingquality statistics, read counts etc.

ciRNA April 30, 2013 3 / 47

Page 4: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Filtering demands

B GU/AG flanking the splice sites (built in);

B unambiguous breakpoint detection

B a maximum of two mismatches in the extension procedure

B the breakpoint cannot reside more than 2 nucleotides inside an anchor

B at least two independent reads (each distinct sequence only counted once persample) support the junction

B unique anchor alignments with a safety margin to the next-best alignment ofat least one anchor above 35 points (approximately equivalent to more thantwo extra mismatches in high-quality bases)

B a genomic distance between the two splice sites of no more than 100 kb (onlya small percentage of the data).

B As the ribosomal DNA cluster is part of the C. elegans genome assembly andribosomal pre-RNAs could give rise to circular RNAs by mechanismsindependent of the spliceosome, we discarded 130 candidates that mapped tothe rDNA cluster on chrI:15,060,286-15,071,020.

ciRNA April 30, 2013 4 / 47

Page 5: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Permutation testing

B reversed either anchor

B reversed the complete read

B randomly reassigned anchors between reads

B reverse complemented the read (as a positive control).

B the reverse complement recovered the same output as expected, the variouspermutations led to only very few candidate predictions, well below 0.2% ofthe output with unpermuted reads and in excel-lent agreement with the resultsfrom simulated reads

B sensitivity (.75%) and FDR (0.2%) using simulated reads and permutationsof real sequencing data

B the efficiency of ribominus protocols to extract and sequence circRNAs islimited, reducing overall sensitivity

ciRNA April 30, 2013 5 / 47

Page 6: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Data generated

B generated ribominus data for HEK293 cells and combined with humanleukocyte data

B detected 1,950 circRNAs with support from at least two independentjunction-spanning reads

ciRNA April 30, 2013 6 / 47

Page 7: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Jeck 2012: cANRIL, or how they got interested

B Our group discovered a circular RNA species, circular ANRIL, whoseexpression is associated with that of products of the human INK4a/ARFlocus and is correlated with the risk of human atherosclerosis

B Production of cANRIL in humans is associated with common SNPs predictedto affect cANRIL splicing, suggesting the possibility that cANRIL productioninfluences PcG-mediated repression of the INK4a/ARF locus to influenceatherosclerosis risk

ciRNA April 30, 2013 7 / 47

Page 8: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

CircleSeq: The RNAse R approach - Jeck 2012

ciRNA April 30, 2013 8 / 47

Page 9: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

CircleSeq: The RNase R approach - Jeck 2012

B Method was optimized to allow for >10-fold enrichment of cANRIL in cDNA prepared from RNaseR-treated vs. untreated samples

B Used approx. 300 million 100-bp reads per sample aligned to the human genome using a de novo splicemapping algorithm, MapSplice

B compiled the list of all fusion splice junctions where splice donor and acceptor occur within 2 Mb but inthe non-colinear ordering; term these junctions backsplices.

B New metric - SRPBM - spliced reads per billion mapping

B Counts of reads mapping across an identified backsplice in untreated samples, normalized by readlength and number of reads mapping,– to permit quantitative comparisons between backsplices.

ciRNA April 30, 2013 9 / 47

Page 10: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Using RNase R works

B CircleSeq enriches for backsplice junctions. Two-dimensional histograms showing normalized backspliced read count (SRPBM) or normalized exoncoverage (RPKM) between two samples or replicates. (A) Coverage of backsplice reads in RNase R-treated replicates over all distinct backsplicespecies (R2 = 0.579). (B) Coverage of exons in mock treated replicates (R2 = 0.91). (C) Average backsplice coverage in RNase R-treatedagainst mock treated RNA-seq showing enrichment of most backsplice species by RNase R. (D) Mean normalized exon coverage in annotatedexon sequences in RNase R-treated against mock treated RNA-seq showing depletion of the majority of species by RNase R.

ciRNA April 30, 2013 10 / 47

Page 11: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

And generates some interesting results...

B identified >100,000 unique backsplice events throughout the genome.B Of these, 25,166 were present in both RNase R-treated biological replicates

and were enriched by RNase R treatment as compared with mock treatment.B 31% of backsplice species observed in untreated controls were not enriched

by RNase R (Fig. 2C). These species likely represent mapping artifacts ornonsequential exons harbored in linear products, resulting from either RNAtrans-splicing or cleavage of ecircRNAs.

B See “positive controls” cANRIL, cETS-1; not able to detect other knowncircular species (e.g., SRY and DCC), due to low expression of these genes inHs68 cells

B the 25,166 replicated backsplice events detected by CircleSeq included thelarge majority (1025 of 1319) of putative circles identified through apreviously described bioinformatic approach (Salzman 2012) - in different celltypes!!! [T-cells vs fibroblasts]B Of the 25,166 unique backsplicing events that reproducibly enriched by RNase R

digestion in both biological replicates, most were only found in RNase R-treatedsamples and were not observed in the absence of exonuclease digestion.

B many of these events represent rare ecircRNAs arising from pervasivebackground levels of RNA circularization, which may result from an occasionalerror in splicing (Hsu and Hertel 2009).

ciRNA April 30, 2013 11 / 47

Page 12: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Filtering on confidence

B LOW stringency set requiring a single backsplice read in the control data andHIGH stringency requiring coverage on a par with splices in a moderatelyexpressed gene

ciRNA April 30, 2013 12 / 47

Page 13: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Data generated - Memzack 2013

B generated ribominus data for HEK293 cells and combined with humanleukocyte data

B detected 1,950 circRNAs with support from at least two independentjunction-spanning reads

ciRNA April 30, 2013 13 / 47

Page 14: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Expression of genes predicted to give rise to circRNAs wasslightly shifted towards higher expression values

“indicating that circRNAs are not just rare mistakes of the spliceosome”

Histograms of gene expression levels obtained frompolyA+ RNA sequencing in HEK293 cells. The number ofreads per kilobase of exon per million mapped reads(RPKM) reflects mRNA abundance. Genes that arepredicted to give rise to circRNAs (red circles) are notspecifically enriched for high expression, (solid line: allgenes). circRNAs from lowly expressed genes are detectedless frequently, comparable to the loss of sensitivityobserved for linear splicing (black dashed line: genes with> 75% of annotated splice sites recovered, gray dashedline: >50%, light gray:>10%)

ciRNA April 30, 2013 14 / 47

Page 15: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Mouse and nematode

B identified 1,903 circRNAs in mouse (brains, fetal

head, differentiation-induced embryonic stem cells;

81 of these mapped to human circRNAs

B mapped mouse circRNAs were compared with independentlyidentified human circRNAs using liftOver, yielding 229circRNAs with precisely orthologous splice sites between humanand mouse. Of these, 223 were composed exclusively of codingexons and were subsequently used for our conservation analysis(Fig. 1f).

B When intersecting the reported sets of circRNAs supported bytwo independent reads in each species, we found 81 conservedcircRNAs (supported by at least 4 reads in total).

B used sequencing data from various C. elegansdevelopmental stages (Stoeckius, M. manuscript inprep) and detected 724 circRNAs, with at least twoindependent reads.

B Numerous circRNAs seem to be specifically

expressed in a cell type or developmental stage

(Fig. 1b,c, S1e)

B hsa-circRNA 2149 is supported by 13 unique, head-to-tailspanning reads in CD191 leukocytes but is not detected inCD341 leukocytes, neutrophils or HEK293 cells

B a number of nematode circRNAs seem to be expressed inoocytes but absent in 1- or 2-cell embryos

ciRNA April 30, 2013 15 / 47

Page 16: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Where do these circRNAs come from?

Intersection of circRNAs with known transcripts

B computational screen identifies only the splice sites that lead tocircularization but not the internal exon/ intron structure of circular RNAs=> inferred as much as possible from annotated transcripts

B conservative assumption was that as little as possible should be spliced outB coincidence of circRNA splice sites with exonic boundaries inside a transcript

were considered as an indicator for relevant agreement and internal intronsappear to be spliced out

ciRNA April 30, 2013 16 / 47

Page 17: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Where do these circRNAs come from?

B annotated humancircRNAs using theRefSeq database and acatalogue of non-codingRNAs

B 85% of human circRNAsalign sense to knowngenes

B 10% of all circRNAs alignantisense to knowntranscripts, smallerfractions align to UTRs,introns, unannotatedregions of the genome.

ciRNA April 30, 2013 17 / 47

Page 18: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Where do these circRNAs come from?

B Sorted all overlapping transcripts hierarchically by

B splice-site coincidence (2, 1, or 0)B total amount of exonic sequence between the splice sitesB total amount of coding sequence

B Latter was used to break ties only and helped the annotation process.

B if one or both splice sites fell into an exon of the best matching transcript,the corresponding exon boundary was trimmed.

B if it fell into an intron or beyond transcript bounds, the closest exon wasextended to match the circRNA boundaries.

B circRNA start/end coordinates were never altered.

B If no annotated exons overlapped the circRNA we assumed a single-exoncircRNA.

B The resulting annotation of circRNAs is based on the best matchingtranscript and may in some cases not represent the ideal choice. Changingthe annotation rules, however, did not substantially change the numbers inFig. 1d

ciRNA April 30, 2013 18 / 47

Page 19: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

How many exons form part of circRNA?

B Their splice sites typicallyspan one to five exonsand overlap coding exons(84%), but only in 65% ofthese cases are both splicesites that participate inthe circularization knownsplice sites

B Jeck: introns are splicedfrom most circular forms

B Jeck: observed manysingle exon ecircRNAs,that is, where the donorsite splices to theacceptor of the same exon(e.g., KIAA0182)

ciRNA April 30, 2013 19 / 47

Page 20: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Jeck : ratio of forward to back-spliced products

B compared the abundance of backsplice-spanning reads to reads spanningtraditionally spliced junctions.

B The relative rate of backsplicing over forward splicing for these sites variedenormously, from <0.1% to >3200%, with no forward splicing productsobserved in some cases

B using the LOW stringency list, 14.4% of genes expressed in human fibroblastsproduced circular RNA species, suggesting at least one in eight human genesproduces abundant circular as well as linear transcripts.

ciRNA April 30, 2013 20 / 47

Page 21: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Jeck validation: circRNA-generating transcripts arenon-polyadenylated

B oligo-dT priming for reversetranscription significantly reduced thelevels of backspliced products relativeto poly(A)- containing transcripts

B To exclude the possibility that thesetranscripts were the results oftrans-spliced products, used a virtualNorthern approach, which identifies thesize of transcripts by fractionating on adenatur-ing agarose gel, followed byqPCR.

B Backsplice-containing transcriptsappeared in faster migrating fractions ascompared to the associated full-lengthlinear transcripts, as would be predictedof circular RNA species composed ofonly a subset of exons. Trans-splicedproducts, in contrast, would beexpected to be longer than full-length,as they contain repeated exons.

ciRNA April 30, 2013 21 / 47

Page 22: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Examples

B The AFF1 intron is splicedout (Supplementary Fig.2e). Sequenceconservation: placentalmammals phyloP score,scale bar, 200nucleotides.

ciRNA April 30, 2013 22 / 47

Page 23: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Jeck examples

ciRNA April 30, 2013 23 / 47

Page 24: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Conservation of intergenic and intronic circRNAs

Intergenic and a few intronic circRNAs display a mild but significant enrichmentof conserved nucleotides

ciRNA April 30, 2013 24 / 47

Page 25: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Conservation of intergenic and intronic: approach

B downloaded genome-wide human (hg19) phyloP conservation score58 tracks derived from genomealignments of placental mammals from UCSC

B read out the conservation scores along the complete circRNA and searched for blocks of at least

6-nucleotide length that exceeded a conservation score of 0.3 for intergenic and 0.5 for intronic

circRNAs.

B The different cutoffs empirically adjust for the different background levels of conservation andwere also used on the respective controls.

B For each circRNA, we computed the cumulative length of all such blocks and normalized it by thegenomic length of the circRNA.

B Artefacts of constant positive conservation scores in the phyloP profile, apparently caused by missingalignment data, were removed with an entropy filter (this did not qualitatively affect the results).

B circRNAs annotated as intronic by the best-match procedure explained above that had any overlap withexons in alternative transcripts on either strand (5 cases) were removed from the analysis.

B phastCons score takes neighboring bases into account, estimating the probability that each nucleotide belongs to a conserved element. phyloPscore is a separate measurement of conservation at each base, ignoring neighboring bases in its calculation. It can measure acceleration (fasterevolution than expected under neutral drift) as well as conservation (slower than expected evolution). PhyloP is useful for evaluating signatures ofselection at particular nucleotides (e.g. third codon positions, or first positions of miRNA target sites). In the phyloP plots, sites predicted to beconserved are assigned positive scores, while sites predicted to be fast-evolving are assigned negative scores. The absolute values of the scoresrepresent -log p-values under a null hypothesis of neutral evolution (|phyloP| = -log(p-value) under a null hypothesis of neutral evolution)

ciRNA April 30, 2013 25 / 47

Page 26: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

circRNAs from coding loci conservation - approach

To analyse circRNAs composed of coding sequence and thus high overallconservation, we selected 223 human circRNAs with circular orthologues in mouseand entirely composed of coding sequence. Control (linear) exons were randomlyselected to match the level of conservation observed in first and second codonpositions (Methods, Fig. 1f inset and Supplementary Fig. 1k for conservation ofthe remaining coding sequence (CDS))

ciRNA April 30, 2013 26 / 47

Page 27: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

circRNAs from coding loci conservation - results

circRNAs with conserved circularization were significantly more conserved in thethird codon position than controls, indicating evolutionary constraints at thenucleotide level, in addition to selection at the protein level (Fig. 1f andSupplementary Fig. 1j, k)

Coding sequence phyloP conservation score distributions of firstand second codon positions match between circRNAs andcontrols, in contrast the 3rd codon position is significantly moreconserved in circRNAs (P ¡ 3e-10 n=223 Mann-Whitney-U(mwu)(also main Fig1. f). k, The conservation scoredistributions in the remaining parts of the CDS (outside thecircRNA or control) do not differ significantly for codon positionstwo and three. For the first codon position, the controls areactually more conserved, P ¡ 2e-3 n=223 Mann- Whitney-U(mwu)), therefore conservative

ciRNA April 30, 2013 27 / 47

Page 28: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

How they did coding conservation - the gritty details

B used the best-match strategy outlined above to construct an estimated exon- chain for the circRNAsthat overlapped exclusively coding sequence.

B Using this chain we in silico spliced out the corresponding blocks of the conservation score profile.

B kept track of the frame and sorted the conservation scores into separate bins for each codon position.

B also recorded conservation scores in the remaining pieces of coding sequence (outside the circRNA) as acontrol.

B However, we observed that the level of conservation is systematically different between internal parts of

the coding sequence and the amino- or carboxy-terminal parts (not shown).

B We therefore randomly generated chains of internal exons, mimicking the exon-numberdistribution of real circRNAs, as a control.

B When analysing the circRNAs conserved between human and mouse, it became furthermore apparentthat we also needed to adjust for the higher level of overall conservation. High expression generallycorrelates with conservation and thus, an expression cutoff was enforced on the transcripts used togenerate random controls. This resulted in a good to conservative match with the actual circRNAs.

ciRNA April 30, 2013 28 / 47

Page 29: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Conservation - Jeck - HIPK2 and HIPK3

B the paralogous kinases, HIPK2 and

HIPK3, both produced abundant

circRNA, and so do the mouse

orthologues

B sufficiently diverged to allowunique mapping but retain asimilar genomic structure: a largesecond exon that contains thestart codon flanked by largeintrons on either side

B increased coverage of exon 2 was notseen in polyAplus libraries, consistentwith the predominant exon 2 speciesbeing circular (Encode Data, notshown). Based on RPKM andqRT-PCR, the circular exon 2 transcriptof HIPK3 was apparently 5fold moreabundant than the linear form.

B Consistent with transcripts beingecircRNAs originating from the murineHipk2/3 genes, the amplified fragmentswere of the expected size and sequenceand were enriched by RNase R digestion

ciRNA April 30, 2013 29 / 47

Page 30: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Jeck - more global conservation: compare to murine testis

B As was the case for human cells, a high number (646 of 1477) of circlesfound through a prior bioinformatic analysis (Salzman et al. 2012) of murinebrain were identified by CircleSeq of murine testis.

B Of 2121 human circles from the MEDIUM stringency list (human fibroblasts)that could be readily mapped to the murine genome, 457 mapped to genesthat produced a murine circular RNA (in testis...).

B identified 69 murine circular species (including Hipk3) with exactlyhomologous start and stop points of RNA circularization

ciRNA April 30, 2013 30 / 47

Page 31: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Experimentally tested circRNA predictions in HEK293 cells

B Head-to-tail splicing assayedby RT-qPCR with divergentprimers and Sangersequencing.

B Predicted head-to-tailjunctions of 19/23 could bevalidated.

B 5/7 candidates exclusivelypredicted in leukocytes couldnot be detected in HEK293cells, validatingcell-type-specific expression.

ciRNA April 30, 2013 31 / 47

Page 32: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

circRNAs are insensitive(ish) to RNAse R

B Jeck 2012: Backsplices for CDR1-AS were abundant in the control samples (mean SRPBM of 198), but both the backsplice reads and nonsplicingreads within the gene were depleted by exonuclease digestion (mean SRPBM of 16). These observations are most consistent with lineartrans-splicing products rather than circular RNAs or with the cleavage of this circular RNA, as has been reported

Head-to-tail splicing could be produced bytrans-splicing or genomic rearrangements. Torule out these possibilities and PCR artefacts,validated the insensitivity of human circRNAcandidates to digestion with RNase R (degradeslinear RNA molecules) by northern blotting withprobes which span the head-to-tail junctions

Quantified RNase R resistance for 21 candidates with confirmed head-to-tailsplicing by qPCR. All of these were at least 10-fold more resistant than GAPDH

ciRNA April 30, 2013 32 / 47

Page 33: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

circRNAs turn over more slowly than linear RNA

24 h after blockingtranscription circRNAswere highly stable,exceeding the stability ofthe housekeeping geneGAPDH

ciRNA April 30, 2013 33 / 47

Page 34: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

circRNAs turn over more slowly than linear RNA

ciRNA April 30, 2013 34 / 47

Page 35: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Validation in other organisms

3/3 tested mouse circRNAs with human orthologues inmouse brains

C. elegans

ciRNA April 30, 2013 35 / 47

Page 36: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

circRNAs are not translated

B Engineered circular RNAshave previously been shownto have coding potential(Chen and Sarnow 1995).

B Linear products weresignificantly enriched in theribosome-bound fraction forthe genes assayed: HIPK3,KIAA0182, and MYO9B.

B Circular products, in contrast,were abundant in theunbound fractions but notdetected in the boundfractions, indicating thatthese AUG- containingecircRNAs are not translated

ciRNA April 30, 2013 36 / 47

Page 37: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

circRNAs can be targeted by RNAi

B transfected Hs68 cells withsiRNA targeting HIPK3 andZFY that produce both linearand circularized transcripts; 3siRNAs per gene

B one targeting sequenceonly in the linear transcript

B targeting the backsplicesequence

B targeting sequence in acircularized exon shared byboth linear and circularspecies

B It was even possible to design an siRNA to thebacksplice junction of ZFY that specifically targeted thecircular, but not linear, transcript.

ciRNA April 30, 2013 37 / 47

Page 38: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

circRNAs are probably miRNA sponges

B screened for occurrences of conserved miRNAfamily seed matches (Methods).

B counting repetitions of conserved matches to thesame miRNA family, circRNAs were significantlyenriched compared to coding sequences

(P<2.96x10−22 , MannWhitney U-test,n=3873) or 3’ UTR sequences

(P<2.76x10−21 , MannWhitney U-test,n=3182)

ciRNA April 30, 2013 38 / 47

Page 39: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

circRNAs are cytoplasmic

B ecircRNAs either undergo nuclear export or are released to the cytoplasmduring mitosis, where they enjoy extraordinary stability, likely as a result ofresistance to debranching enzymes and RNA exonucleases.

ciRNA April 30, 2013 39 / 47

Page 40: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

CDR1as is localized in the cytoplasm

B CDR1as RNA is cytoplasmic and disperse (white spots;single-molecule RNA FISH; maximum intensity merges ofZ-stacks). siSCR, positive; siRNA1, negative control. Blue,nuclei (DAPI); scale bar, 5mkm

ciRNA April 30, 2013 40 / 47

Page 41: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Bioinformatic analysis

B DAVID revealed an enrichment of protein kinases and related proteins amongthe set of genes producing ecircRNAs

B no specific subfamily of kinase was particularly associated with ecircRNAproduction.

B sought to identify cis-sequence elements proximal to backsplice events.Sequences in the 200 bp preceding or following backsplice sites were analyzedfor enriched motifs compared to similar windows flanking noncircularized,expressed exons.

B the highest information- bearing motif was shared by both the upstreamand downstream introns and was identified as the canonical ALU repeat

B the intronic flanks adjacent to circularized exons were approximately twofoldmore likely to contain an ALU repeat than noncircularized exons. Circularizedexons were sixfold more likely to contain complementary ALUs than control,noncircularized exons.

B Pairs of ALU elements taken from introns flanking circularized exons weresignificantly more likely to be complementary (in in- verted orientation) thannoncomplementary.

B Equally likely in single-exon and multi-exon circRNAs

ciRNA April 30, 2013 41 / 47

Page 42: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Bioinformatic analysis 2: circRNA exons are large

B the upstream and downstream introns flanking circularized exons tended tobe large: on average more than approximately threefold longer than intronsflanking control exons

B circularized exons were larger than expected: when restricted to an analysis ofsingle exon ecircRNAs, we noted that circularized exons were approximatelythreefold longer than expressed exons overall, at an average length of 690 nt

ciRNA April 30, 2013 42 / 47

Page 43: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

How are circRNAs formed?

ciRNA April 30, 2013 43 / 47

Page 44: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Overlap of identified circRNAs with published circularRNAs.

B exons of DCC, ETS1 and a non-coding RNA from the human INK4/ARFlocus and the CDR1as locus

B circRNAs from exons of the genes CAMSAP1, FBXW4, MAN1A2, REXO4,RNF220 and ZKSCAN1 have been recently experimentally validated10. Forthe four genes from this study, where we had ribominus data from the tissuesin which these circRNAs were predicted (leukocytes), we recovered validatedcircRNAs from all of them (ZKSCAN1, CAMSAP1, FBXW4, MAN1A2).

ciRNA April 30, 2013 44 / 47

Page 45: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Known before

B DCC

B scrambled transcripts were estimated to comprise less than oneone-thousandth of transcripts

B Nigro, J. M., Cho, K. R., Fearon, E. R., Kern, S. E., Ruppert, J. M., Oliner, J.D., et al. (1991). Scrambled exons. Cell, 64(3), 607613.

B MLL

B Caldas, C., So, C. W., MacGregor, A., Ford, A. M., McDonald, B., Chan, L.C., Wiedemann, L. M. (1998). Exon scrambling of MLL transcripts occurcommonly and mimic partial genomic duplication of the gene. Gene, 208(2),167176.

B ETS-1

B Cocquerelle, C., Mascrez, B., Hetuin, D., Bailleul, B. (1993). Mis-splicingyields circular RNA molecules. FASEB Journal, 7(1), 155160.

ciRNA April 30, 2013 45 / 47

Page 46: Comparing the early ciRNA papers

Computational pipeline Global analysis Characterization of 50

Best studied: mouse SRY

B consists of a single exon.

B During development, the RNA exists as a linear transcript that is translatedinto protein.

B In the adult testes, the RNA exists primarily as a circular product that ispredominantly localized to the cytoplasm and is apparently not translated

B inverted repeats in the genomic sequence flanking the SRY exon directtranscript circularization

ciRNA April 30, 2013 46 / 47