7
Global analysis of trans-splicing in Drosophila C. Joel McManus, Michael O. Duff, Jodi Eipper-Mains, and Brenton R. Graveley 1 Department of Genetics and Developmental Biology, University of Connecticut Stem Cell Institute, University of Connecticut Health Center, Farmington, CT 06030-3301 Communicated by Tom Maniatis, Columbia University Medical Center, New York, NY, June 8, 2010 (received for review April 20, 2010) Precursor mRNA (pre-mRNA) splicing can join exons contained on either a single pre-mRNA (cis) or on separate pre-mRNAs (trans). It is exceedingly rare to have trans-splicing between protein-coding exons and has been demonstrated for only two Drosophila genes: mod(mdg4) and lola. It has also been suggested that trans-splicing is a mechanism for the generation of chimeric RNA products con- taining sequence from multiple distant genomic sites. Because most high-throughput approaches cannot distinguish cis- and trans- splicing events, the extent to which trans-splicing occurs between protein-coding exons in any organism is unknown. Here, we used paired-end deep sequencing of mRNA to identify genes that un- dergo trans-splicing in Drosophila interspecies hybrids. We did not observe credible evidence for the existence of chimeric RNAs gen- erated by trans-splicing of RNAs transcribed from distant genomic loci. Rather, our data suggest that experimental artifacts are the source of most, if not all, apparent chimeric RNA products. We did, however, identify 80 genes that appear to undergo trans-splicing between homologous alleles and can be classied into three cate- gories based on their organization: (i ) genes with multiple 3termi- nal exons, (ii ) genes with multiple rst exons, and (iii ) genes with very large introns, often containing other genes. Our results suggest that trans-splicing between homologous alleles occurs more com- monly in Drosophila than previously believed and may facilitate expression of architecturally complex genes. chimeric RNA | RNA-seq | genomics | bioinformatics | deep sequencing P recursor mRNA (pre-mRNA) splicing is an essential process in eukaryotic gene expression. Splicing can occur either within a single pre-mRNA (in cis) or between two different pre-mRNAs (in trans) (1, 2). The best-characterized form of trans-splicing occurs commonly in nematodes and trypanosomes. In these organisms, spliced-leader RNAs are added to the 5ends of many, if not all pre-mRNAs (3, 4). Examples of trans-splicing that do not involve spliced-leader RNAs, but rather occur between coding exons, are exceedingly rare, and only two Drosophila genes are known to be trans-spliced: mod(mdg4) (5, 6) and lola (7). The Drosophila genes mod(mdg4) and lola both contain com- mon 5exons and multiple alternative 3terminal exons. Although the exons of mod(mdg4) are encoded on both DNA strands (5, 6), and therefore require trans-splicing, all of the lola exons are encoded on the same DNA strand (7), suggesting that they are cis-spliced. However, interallelic complementation studies have demonstrated that at least some lola isoforms are generated by trans-splicing (7). This nding demonstrates that trans-spliced genes cannot be identied based on their genomic organization alone, and raises the possibility that other Drosophila genes could use trans-splicing for mRNA synthesis. Trans-splicing may also be a mechanism for the generation of so-called chimeric RNAs, which contain sequences originating from distant genomic loci (8). However, apparent chimeric RNAs can also be generated by homology-driven template switching during RT-PCR (911), and adequate controls are needed to identify these experimental artifacts. One of the more complete reports describing chimeric RNAs found an enrichment of short homologous sequences (SHSs) at chimeric RNA junction sites (12). Although the authors suggested that cellular RNA poly- merases switch DNA templates at SHSs (12), RT-PCR strand- switching at SHSs is a more likely explanation, given that both reverse-transcriptase and Taq DNA polymerase are known to strand-switch and multiple ampli cation cycles were used. A more recent study described the existence of several hundred chimeric RNAs in the rice transcriptome; however, control ex- periments to eliminate strand-switching as an explanation were not provided (13). We used high-throughput sequencing of Drosophila hybrid mRNA and a mixed mRNA-negative control sample to investi- gate the extent and specicity of trans-splicing. The trans-splicing of mod(mdg4) and lola were extremely specic, as no chimeric products between these two genes were observed. In addition, 80 other candidate trans-spliced genes were identied, 6 of which were validated. These unique trans-spliced genes have complex genomic architecture, suggesting that trans-splicing may facilitate expression of genes whose structure would otherwise pose chal- lenges to the gene-expression machinery. Finally, we report a high background of chimeric mRNA products in our negative control sample, which suggests that mRNAs that appear to link distant genomic loci likely result from experimental errors. Results Paired-End mRNA-seq to Identify trans-Spliced Genes. To search for additional trans-spliced genes, we performed paired-end deep sequencing of mRNA isolated from F1 hybrid progeny generated from crossing Drosophila melanogaster females to Drosophila sechellia males (Fig. 1). These species were chosen because their genome assemblies are of sufcient quality and these two species have sufcient sequence divergence (23% across annotated genes) to map RNA-seq reads allele-specically. To differentiate trans-spliced RNAs generated in the animal from chimeric products generated through library preparation artifacts (911) or sequencing errors, we also sequenced a negative control li- brary prepared by mixing equal amounts of RNA isolated from the D. melanogaster and D. sechellia parents. We obtained 49 and 54 million mate-pairs from the control and hybrid libraries, respectively. All reads were separately aligned to both the D. melanogaster and D. sechellia genomes to identify reads that mapped perfectly (without mismatches) and uniquely to only one species. This alignment resulted in 9,815,247 hybrid and 9,198,164 control mate-pairs, where both reads were species-specic. Mate- pairs where both reads map to the same species are referred to as cismate-pairs (9,678,331 hybrid and 9,069,982 control mate- pairs). In contrast, mate-pairs where each read maps to a different species are referred to as transmate-pairs (136,916 hybrid and 128,182 control mate-pairs). We next mapped the reads in the cisand transmate-pairs to exons of protein-coding genes. Mate-pairs in which the two reads mapped to different exons (either within Author contributions: C.J.M. and B.R.G. designed research; C.J.M. performed research; C.J.M., M.O.D., and J.E.-M. contributed new reagents/analytic tools; C.J.M., M.O.D., and B.R.G. ana- lyzed data; and C.J.M. and B.R.G. wrote the paper. The authors declare no conict of interest. Freely available online through the PNAS open access option. Data deposition: The data reported in this paper have been deposited in the Gene Ex- pression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession no. GSE20421). 1 To whom correspondence should be addressed. E-mail: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/ doi:10.1073/pnas.1007586107/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1007586107 PNAS | July 20, 2010 | vol. 107 | no. 29 | 1297512979 GENETICS Downloaded by guest on April 8, 2020 Downloaded by guest on April 8, 2020 Downloaded by guest on April 8, 2020 Downloaded by guest on April 8, 2020

Global analysis of trans-splicing in Drosophilaprotein-coding exons in any organism is unknown. Here, we used paired-end deep sequencing of mRNA to identify genes that un-dergo trans-splicing

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Global analysis of trans-splicing in Drosophilaprotein-coding exons in any organism is unknown. Here, we used paired-end deep sequencing of mRNA to identify genes that un-dergo trans-splicing

Global analysis of trans-splicing in DrosophilaC. Joel McManus, Michael O. Duff, Jodi Eipper-Mains, and Brenton R. Graveley1

Department of Genetics and Developmental Biology, University of Connecticut Stem Cell Institute, University of Connecticut Health Center,Farmington, CT 06030-3301

Communicated by Tom Maniatis, Columbia University Medical Center, New York, NY, June 8, 2010 (received for review April 20, 2010)

Precursor mRNA (pre-mRNA) splicing can join exons contained oneither a single pre-mRNA (cis) or on separate pre-mRNAs (trans). It isexceedingly rare to have trans-splicing between protein-codingexons and has been demonstrated for only two Drosophila genes:mod(mdg4) and lola. It has also been suggested that trans-splicingis a mechanism for the generation of chimeric RNA products con-taining sequence from multiple distant genomic sites. Becausemost high-throughput approaches cannot distinguish cis- and trans-splicing events, the extent to which trans-splicing occurs betweenprotein-coding exons in any organism is unknown. Here, we usedpaired-end deep sequencing of mRNA to identify genes that un-dergo trans-splicing in Drosophila interspecies hybrids. We did notobserve credible evidence for the existence of chimeric RNAs gen-erated by trans-splicing of RNAs transcribed from distant genomicloci. Rather, our data suggest that experimental artifacts are thesource of most, if not all, apparent chimeric RNA products. We did,however, identify 80 genes that appear to undergo trans-splicingbetween homologous alleles and can be classified into three cate-gories based on their organization: (i) genes with multiple 3′ termi-nal exons, (ii) genes with multiple first exons, and (iii) genes withvery large introns,oftencontainingothergenes.Our results suggestthat trans-splicing between homologous alleles occurs more com-monly in Drosophila than previously believed and may facilitateexpression of architecturally complex genes.

chimeric RNA | RNA-seq | genomics | bioinformatics | deep sequencing

Precursor mRNA (pre-mRNA) splicing is an essential processin eukaryotic gene expression. Splicing can occur either within

a single pre-mRNA (in cis) or between two different pre-mRNAs(in trans) (1, 2). The best-characterized form of trans-splicingoccurs commonly in nematodes and trypanosomes. In theseorganisms, spliced-leader RNAs are added to the 5′ ends of many,if not all pre-mRNAs (3, 4). Examples of trans-splicing that do notinvolve spliced-leader RNAs, but rather occur between codingexons, are exceedingly rare, and only two Drosophila genes areknown to be trans-spliced: mod(mdg4) (5, 6) and lola (7).The Drosophila genes mod(mdg4) and lola both contain com-

mon 5′ exons and multiple alternative 3′ terminal exons. Althoughthe exons of mod(mdg4) are encoded on both DNA strands (5,6), and therefore require trans-splicing, all of the lola exons areencoded on the same DNA strand (7), suggesting that they arecis-spliced. However, interallelic complementation studies havedemonstrated that at least some lola isoforms are generated bytrans-splicing (7). This finding demonstrates that trans-splicedgenes cannot be identified based on their genomic organizationalone, and raises the possibility that other Drosophila genes coulduse trans-splicing for mRNA synthesis.Trans-splicing may also be a mechanism for the generation of

so-called chimeric RNAs, which contain sequences originatingfrom distant genomic loci (8). However, apparent chimeric RNAscan also be generated by homology-driven template switchingduring RT-PCR (9–11), and adequate controls are needed toidentify these experimental artifacts. One of the more completereports describing chimeric RNAs found an enrichment of shorthomologous sequences (SHSs) at chimeric RNA junction sites(12). Although the authors suggested that cellular RNA poly-merases switch DNA templates at SHSs (12), RT-PCR strand-switching at SHSs is a more likely explanation, given that both

reverse-transcriptase and Taq DNA polymerase are knownto strand-switch and multiple amplification cycles were used.A more recent study described the existence of several hundredchimeric RNAs in the rice transcriptome; however, control ex-periments to eliminate strand-switching as an explanation werenot provided (13).We used high-throughput sequencing of Drosophila hybrid

mRNA and a mixed mRNA-negative control sample to investi-gate the extent and specificity of trans-splicing. The trans-splicingof mod(mdg4) and lola were extremely specific, as no chimericproducts between these two genes were observed. In addition, 80other candidate trans-spliced genes were identified, 6 of whichwere validated. These unique trans-spliced genes have complexgenomic architecture, suggesting that trans-splicing may facilitateexpression of genes whose structure would otherwise pose chal-lenges to the gene-expression machinery. Finally, we report a highbackground of chimeric mRNA products in our negative controlsample, which suggests that mRNAs that appear to link distantgenomic loci likely result from experimental errors.

ResultsPaired-End mRNA-seq to Identify trans-Spliced Genes. To search foradditional trans-spliced genes, we performed paired-end deepsequencing of mRNA isolated from F1 hybrid progeny generatedfrom crossing Drosophila melanogaster females to Drosophilasechellia males (Fig. 1). These species were chosen because theirgenome assemblies are of sufficient quality and these two specieshave sufficient sequence divergence (∼2–3% across annotatedgenes) to map RNA-seq reads allele-specifically. To differentiatetrans-spliced RNAs generated in the animal from chimericproducts generated through library preparation artifacts (9–11)or sequencing errors, we also sequenced a negative control li-brary prepared by mixing equal amounts of RNA isolated fromthe D. melanogaster and D. sechellia parents. We obtained 49and 54 million mate-pairs from the control and hybrid libraries,respectively. All reads were separately aligned to both the D.melanogaster and D. sechellia genomes to identify reads thatmapped perfectly (without mismatches) and uniquely to only onespecies. This alignment resulted in 9,815,247 hybrid and 9,198,164control mate-pairs, where both reads were species-specific. Mate-pairs where both reads map to the same species are referred to ascis–mate-pairs (9,678,331 hybrid and 9,069,982 control mate-pairs). In contrast, mate-pairs where each read maps to a differentspecies are referred to as trans–mate-pairs (136,916 hybrid and128,182 control mate-pairs). We next mapped the reads in the cis–and trans–mate-pairs to exons of protein-coding genes.Mate-pairsin which the two reads mapped to different exons (either within

Author contributions: C.J.M. andB.R.G. designed research;C.J.M.performed research; C.J.M.,M.O.D., and J.E.-M. contributed new reagents/analytic tools; C.J.M., M.O.D., and B.R.G. ana-lyzed data; and C.J.M. and B.R.G. wrote the paper.

The authors declare no conflict of interest.

Freely available online through the PNAS open access option.

Data deposition: The data reported in this paper have been deposited in the Gene Ex-pression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession no. GSE20421).1To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1007586107/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1007586107 PNAS | July 20, 2010 | vol. 107 | no. 29 | 12975–12979

GEN

ETICS

Dow

nloa

ded

by g

uest

on

Apr

il 8,

202

0 D

ownl

oade

d by

gue

st o

n A

pril

8, 2

020

Dow

nloa

ded

by g

uest

on

Apr

il 8,

202

0 D

ownl

oade

d by

gue

st o

n A

pril

8, 2

020

Page 2: Global analysis of trans-splicing in Drosophilaprotein-coding exons in any organism is unknown. Here, we used paired-end deep sequencing of mRNA to identify genes that un-dergo trans-splicing

the same gene or between different annotated genes) were con-sidered as candidates for being generated by splicing (49.4% ofthe hybrid and 56.4% of the control mate-pairs).

Frequency and Specificity of mod(mdg4) and lola trans-Splicing. Ex-amination of the two known trans-spliced genes, mod(mdg4) andlola, revealed that this approach can indeed identify trans-splicingevents. Formod(mdg4), we obtained 50 trans–mate-pairs from thehybrid but only 2 from the control (Fig. S1). Similarly, for lola weobtained 43 trans–mate-pairs from the hybrid library and nonefrom the control library. Importantly, sixmod(mdg4) and four lolatrans-splicing events, including one previously identified event inmod(mdg4), were supported by as few as one trans–mate-pair.Previous studies demonstrated trans-splicing for only 6 of 28

and 4 of 22 known mod(mdg4) and lola 3′ terminal exon groups,and many trans-spliced products were detected only in the con-text of overexpressed trans-genes, which may not reflect naturalphenomena (5–7, 14). Our results show that 22 of 24 (92%) and12 of 17 (71%) of the expressed, annotated mod(mdg4) and lolaisoforms, respectively, have at least one trans–mate-pair or resideon the antisense strand, and are therefore trans-spliced (Fig. 2).Thus, mod(mdg4) and lola mRNAs appear to be generated al-most entirely by trans-splicing.As the mod(mdg4) and lola 3′ terminal exons all have the same

reading frame, chimeric mRNAs synthesized by trans-splicing ofmod(mdg4) common exons to lola variable exons (and vice versa)would be refractory to nonsense-mediated decay. We thereforeassessed the frequency of aberrant mod(mdg4) and lola trans-splicing by searching for mate-pairs between mod(mdg4) and lola.Importantly, we did not observe any mate-pairs from either thesame, or opposite species between mod(mdg4) and lola in the hy-brid dataset. Furthermore, although we did observe some singlemate-pairs betweenmod(mdg4) or lola and other genes, these were

more prevalent in the control (49mate-pairs) than in the hybrid (26mate-pairs), suggesting that these are most likely artifacts (TableS1). Thus, trans-splicing of mod(mdg4) and lola is highly specific.

Detection and Validation of Novel trans-Splicing Events. We nextsearched for new examples of trans-splicing within the samegene. Two thousand one hundred seventy-seven genes had atleast one trans–mate-pair and were considered candidate trans-spliced genes. However, several factors including strand-switch-ing, deep sequencing errors, or reference genome errors resultedin false-positives (Fig. S2). We therefore visually evaluated eachcandidate gene in a genome browser to remove those with po-tential false-positive signals (see Materials and Methods, Fig. S2,and Tables S2 and S3). This visual curation step resulted ina final collection of 80 trans-splicing candidate genes.We used a species-specific RT-PCR/sequencing assay (15) to

validate the existence of trans-spliced mRNAs for mod(mdg4),lola, and six candidate genes. To confirm trans-splicing, we re-quired that an RT-PCR product was obtained from the hybridRNA, but not from the individual parents or the control. Thehybrid RT-PCR products were cloned and sequenced to verifythat SNPs between the primers and the exon boundaries showeda clean transition at exon-exon junctions. Using these stringentcriteria, we confirmed trans-splicing for three undocumented iso-forms from mod(mdg4) and lola, and all of the tested candidategenes (Fig. 3 and Fig. S3).

Candidate Chimeric RNA Products Carry Hallmarks of RT-PCRArtifacts. We searched for cases of trans-splicing of exons lo-cated in different annotated genes on the same chromosome oron different chromosomes. As with mod(mdg4) and lola trans-splicing, we expect that the genes involved in any new caseswould be specific (not promiscuous), would involve splicing ofRNA derived from the transcribed strand of the annotatedisoforms, and would not involve genes from the mitochondrialgenome. Of the 128,958 pairs of genes connected by at least oneintergenic mate-pair, 74,383 (58%) had at least one mate-pairderived from the noncoding strand and 1,307 (1%) involvedgenes from the mitochondrial genome. Nearly all (54,558) ofthe remaining 54,575 gene pairs were promiscuous, in that atleast one of the genes in a pair was involved in more than oneintergenic pairing. Strikingly, 16 of the 17 coding, nonpromis-cuous intergenic pairs involved single mate-pairs between adja-cent or nested genes on the same chromosome, and none ofthese were trans–mate-pairs (opposite allele pairs), suggestingthe genes connected by these mate-pairs may be misannotated,are part of the same transcription unit, and are therefore actuallycases of intragenic cis-splicing (Table S4). The remaining coding,nonpromiscuous intergenic gene pair involves a single cis–mate-pair between two paralogs of His3 (CG33845 and CG33821) thatdiffer in sequence by a single nucleotide, suggesting that thismate-pair resulted from a sequencing error. Given these results,we next investigated whether the intergenic trans–mate-pairs inour dataset could result from strand-switching artifacts gener-ated by RT-PCR during library preparation.Strand-switching is dependent on two major factors: template

homology and concentration. We found that the most frequentcases of intergenic trans–mate-pairs involved different membersof highly homologous gene families. For example, Actin paralogslocated on different chromosomes were the most abundant in-tergenic mate-pairs in our dataset. We also observed strong cor-relations between gene template concentration, measured in totalmapped reads, and the number of intergenic trans–mate-pairs(Pearson’s r = 0.88, 0.81, for different genes on different, andthe same chromosomes, respectively). For comparison, the cor-relation between template concentration and same-gene trans–mate-pairs was relatively weak (Pearson’s r = 0.33). Finally, wecompared the tissue-specific expression patterns of the inter-

cis- or trans-spliced mRNAs

trans-spliced mRNAsor

D. sechelliaD. melanogaster D. sechelliaD. melanogaster

F1 hybrid

Paired-EndSequencing

Experiment Control

AAAAAA

AAA

AAAAAA

AAA

AAAAAA

AAAAAA

AAAAAA

AAAAAA

---

AA

AAA

AA

AAAAAA

AAAAAA

t a s li d RNA

h b id

Paired-End

AAAAAA

AAAAAAA

g

AAAAAAA

AAAA

AAAAAA

AAA

t a s-spliced mRNAsor

- oor- or trtt ansansrr sstrans-spliced mRNAs-spliced mRNAs

trtt ansrr strans-spliced mRNAss-spliced mRNAsoror

Experimental Artifacts

hybridhybrid

Paired-EndPaired-EndSequencingSequencing

AAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAA

cis- or trans-spliced mRNAs

Experimental Artifacts

Fig. 1. Deep sequencing to search for trans-spliced genes and chimeric RNAs.Sequencing librarieswereprepared frompoly(A)-selectedRNAfromF1hybridsof D. melanogaster and D. sechellia, and from a mixture of parental RNA(control). Librarieswere subjected topaired-enddeepsequencing, and species-specific sequence reads were identified by comparing genomic alignments.Sequencemate-pairs inwhichboth readsmapped to the same (cis) or different(trans) species were mapped to genes to identify pairs indicative of splicing.

12976 | www.pnas.org/cgi/doi/10.1073/pnas.1007586107 McManus et al.

Dow

nloa

ded

by g

uest

on

Apr

il 8,

202

0

Page 3: Global analysis of trans-splicing in Drosophilaprotein-coding exons in any organism is unknown. Here, we used paired-end deep sequencing of mRNA to identify genes that un-dergo trans-splicing

chromosomal gene pairs to examine whether mRNAs from thesegenes were expressed in the same tissues (16). We find that∼7.4% of interchromosomal gene pairs are not coexpressed inD. melanogaster (Table S5). Together, these observations suggestthat the vast majority of intergenic trans–mate-pairs are derivedfrom RT-PCR strand-switching artifacts and sequencing errors.Thus, we do not find reliable evidence of chimeric RNA pro-duction in adult Drosophila.

DiscussionThe approach described in this study is unique in providing a ge-nome-wide survey of trans-splicing, and reveals that trans-splicingbetween protein coding exons is more widespread than previouslyappreciated. At the same time, our results indicate that tran-splicing in Drosophila is extremely specific. Interestingly, homol-ogous chromosomes are paired in Drosophila somatic cells (17)and chromosomal pairing appears to be required for efficient lolatrans-splicing (7). This suggests the possibility that chromosomalpairing may be a general requirement for efficient, specific trans-splicing between homologous genes in Drosophila.The candidate trans-spliced genes we identified can be grouped

into three categories. The first class consists of genes that containat least two alternative 3′ terminal exons, likemod(mdg4) and lola(Fig. 3A). The most notable example from this class is CG42235,in which trans–mate-pairs mapped to the CG42235-RD andCG42235-RE isoforms, both of which were validated. The secondclass contains genes with at least two alternative 5′ terminal exons,such as ome (Fig. 3B). The final category included genes with largeintrons, which frequently contain nested genes within the intron,such as Nmdmc (Fig. 3C). Intriguingly, the architecture of trans-spliced genes in each class creates obstacles for the gene-expres-sionmachinery. For example, collisions of transcription complexesmay occur in nested genes. For genes containing alternative5′ terminal exons, use of distal exons requires active repression ofproximal exons. Finally, for genes containing alternative 3′ ter-minal exons, it is necessary to actively repress all proximal 3′ splicesites, premature 3′ end formation, and transcription terminationbefore synthesis and splicing of the distal exons. In each of these

cases, trans-splicing of separate pre-mRNAs generated using dis-tinct promoters and transcription termination sites would over-come all of these obstacles.In some cases, the frequency of trans-splicing is very low. This

finding may reflect a low background of “noisy” trans-splicing ora low level of strand-switching or sequencing errors that occurredonly in the hybrid sample. Alternatively, the trans–mate-pairsfrom these genes could have resulted from cis-splicing of tran-scripts expressed in a small population of cells in which somaticrecombination has occurred between the D. melanogaster andD. sechellia alleles. Although we cannot exclude these possibili-ties, we note that our validation experiments were performedusing biological replicate samples. Thus, it seems unlikely thatthe same experimental errors or somatic recombination eventswould occur in multiple biological samples.Our approach also allowed us to evaluate the extent of strand-

switching that occurs in deep sequencing experiments. Mostcross-chromosomal trans–mate-pairs we observed result fromRT-PCR artifacts and do not represent biologically generatedchimeric mRNAs. Consequently, we do not find credible evidenceof intergenic chimeric RNA production in adult Drosophila. Be-cause we observed a large number of false-positive chimeric RNAsignals, our data further suggest that reports of chimeric RNAsshould be treated with caution, especially when the supportingdata are generated using RT-PCR. However, our results do notpreclude the existence of chimeric RNAs in other species. Forexample, exons from the mosquito bursiconmRNA were recentlyfound to be encoded on two separate chromosomes, suggestingthat trans-splicing is required for bursicon mRNA synthesis (18).Another recent report described a chimeric RNA comprised ofexons from the human JJAZ1 and JAZF1 genes located on chro-mosomes 7 and 17, respectively (15). This chimeric RNA can beformed in in vitro splicing reactions, suggesting the possibility thatthe chimeric RNA can be produced via trans-splicing in vivo.However, this result does not entirely eliminate the possibility thatthe chimeric product was produced during RT-PCR amplifica-tion. Improvements in direct RNA sequencing (19, 20) shouldeventually allow the direct detection of any genuine chimeric

lolaB

A A A A A A A A A A A AA A A A A A A

0

10

20

40

~

Variable ExonsCommon

ExonsAlternative5’ Exons

Mappedmate-pairs

Cis Mate PairsTrans Mate Pairs

AAAAAAAA

AAAAAAA

A A A A A A A A A

~

0

5

10

20

Variable ExonsCommon

Exons

mod(mdg4)A

Cis mate-pairsTrans mate-pairs

Mappedmate-pairs

5 kb

5 kb

Fig. 2. Trans-splicing of mod(mdg4) and lola. The sequencing results obtained for mod(mdg4) (A) and lola (B) are shown. The horizontal gray line separatesthe sense and antisense exons of mod(mdg4). The 3′ terminal exon groups for which deep sequencing data support trans-splicing (green), only cis-splicing(red), or are not expressed in the hybrid (gray) are shown. Isoforms for which trans-splicing was previously reported are depicted with an asterisk. The numberof cis– (red) and trans– (green) mate-pairs observed for each isoform of mod(mdg4) and lola are shown (bar graphs).

McManus et al. PNAS | July 20, 2010 | vol. 107 | no. 29 | 12977

GEN

ETICS

Dow

nloa

ded

by g

uest

on

Apr

il 8,

202

0

Page 4: Global analysis of trans-splicing in Drosophilaprotein-coding exons in any organism is unknown. Here, we used paired-end deep sequencing of mRNA to identify genes that un-dergo trans-splicing

RNAs, without the introduction of strand-switching artifacts in-herent to reverse transcription and PCR amplification.Regardless of the precise mechanism by which trans-splicing

occurs and the purpose of trans-splicing, the results presentedhere identify several additional protein-coding genes that aretrans-spliced inDrosophila. Because we have only examined trans-splicing in adult females, these results certainly underestimate thefrequency of trans-splicing. Thus, deeper sequencing to analyzetrans-splicing throughout Drosophila development will likely id-entify additional trans-spliced genes. Conducting similar experi-ments in other species, including humans, whose genomes containmany genes with long introns (e.g., c-Abl), multiple promoters(e.g., PCDHGA), and multiple 3′ terminal exons (e.g., IGHA1),may reveal that trans-splicing between protein-coding exons iseven more ubiquitous.

Materials and MethodsFlies/Crosses. Flies were reared on standard cornmeal/molasses medium at25 °C. The F1 hybrids used resulted from crossing 7 females of the D. mel-

anogaster strain 14021–0231.36 (y[1]; Gr22b[1] Gr22d[1] cn[1] CG33964[R4.2]bw[1] sp[1]; LysC[1] MstProx[1] GstD5[1] Rh6[1]) with approximately 30 malesof the D. sechellia strain 14021–0248.25 (wild-type). Only female hybrids areviable from this cross.

Library Preparation and Sequencing. mRNA sequencing libraries were per-formed to manufacturer specifications (Illumina). Total RNA was preparedfromwhole flies using TRIzol (Invitrogen) and treated with DNase I to removeany contaminating DNA. Nine micrograms of total RNA from hybrid, andcontrol (4.5 μg D. melanogaster RNA + 4.5 μg D. sechellia RNA) females wasused as input for library preparation. Poly(A)+ RNA selected using Dynalmagnetic beads (Invitrogen) was fragmented using RNA fragmentation re-agent (Ambion), and reverse-transcribed using random primers and Super-Script II (Invitrogen). The resulting cDNA was size-selected (∼370 bp) on 2%agarose (TAE) gels. Libraries were subsequently prepared for sequencingusing the Paired-end Genomic DNA Library kit (Illumina). Libraries weresequenced in six (hybrid) and four (control) lanes on an Illumina GAIIx usinga 37-cycle paired-end sequencing protocol, and one (hybrid) and two (con-trol) lanes using a 76-cycle paired-end protocol. Sequence reads from 76-cycle runs were trimmed to their first 37 bases for comparison with theother data.

Nmdmc

(A)n(A)nRel

D. mel

D. secD. m

el

D. secD. m

el

D. secMix Hyb

rid

Mix Hybrid

D. mel

D. sec

TTTAAACTATA

TTTAAACTATA

CG--------AD. melD. sec

TATCG

TATCG

CATCA

CATCA 12/1212/12

Rel

Del

D. sec

. D. mel

. Dsec

D. mel

. DecMix Hyb

rid

Mix HybDel sec

cD A

RellReellllRR

el

DD. mel

.D.. memel

D. sec

.Dsec

D.. sececD. m

ell.D

el

D.. memelel

D. secc

.Dsec

D.. sesececD. m

el.D

el

D.. memelel

D. sec

.Dsec

D.. sesececMixMixxxMixMixMixxMi Hyb

ridyb

rid

HybHybHybridid

MixMixxMixMixMixMi Hybrii

ybri

Hybrbrii

D. mell

.Del

D.. memel

D. sec

Dsec

D. sesecec

TAAAATTTAAACTATTTAAAA TATCCAAATTTcsecD. s.D ces..D CATCAA CCTACACC AAA

3

CG--------A

trans mate-pairs

C

CCGTG GTTGTACCGCG ATTGTG

D. melD. sec

(A)n

Common ExonsAlternative transcription initiation sites

(A)n

dmcNmdmccdmNmdmNm

(AAA((AA)))nnn(A)n

d

AAD. mel.

D cC

3333333333333

m -pai s

(AA((AA)))))))nnnnnn

brid

bridrid

TAAATTTTTAAACTATAAA ATTTTATCAAATTTAAACG--------CG--------D. mel.D eee... lmD

eD cD seccesDTA GATCA CATTAAT GGCATCAA CTACAC AAA

3333333333333333333333

ns sn mate-pairsaa pe p saeta irm -pai

ome

trans mate-pairs

4 5 6 7 8 9 1011 12

TAAGCTAAAC

TAAGC GTTGTACCGCGGTTGTACCGCG TAAAC

CCGTG ATTGTGTAAAC 6/63/63/6

D. mel

D. secD. m

el

D. secD. m

el

D. secMix Hyb

rid

Mix Hybrid

D. mel

D. sec

. mel

. D. sec

. D. mel

. Dsec

Dmel

D. sec

. Mix Hybrid x

HyD. mel

. D. sec

CCGTCCGC ATTGT

D melD c

t ans i s

44 55 66 77 88 99 1111

GTAAA

brid

444 555 666 777 888 999 11111

GA

GCAC

GG TG

CCGTCC GTGG ATTGTGTTGTAGG TAAACCGCCCGCGG TTGTGTTGTAAA GTGG

D. mell.D ee.. lmellDDD. sec.D eccse..DD

1111999888777666444 555

transrransa sssaararrttt nrr mate-pairsaa pe pa seta irm - i

444444444444444444444 55555555555555555 666666666666666666 7777777777777777 88888888888888888888888 9999999999999999999999 11111111111

TAAGAAAAGT CGCCGTAAAAAAAAAT ACCA

D. mell

.Del

D.. melel

D. sec

.Dsec

D.. sesececD. m

el.D

el

D.. memelel

D. sec

.Dsec

D.. secD. m

el.D

el

D.. mel

D. sec

.Dsec

D.. sesececMixxMixMixMixi Hyb

rid

Hybr

ybrid

HybHybHybridrid

MixMixMix Hybri

ybri

HybHybHybrbrii

bridid

bridrid

D. mel

.Del

D.. memelel secDD. se

cD. sese

cec

B

D. mel

D. secD. m

el

D. secD. m

el

D. secMix Hyb

rid

Mix Hybrid

D. mel

D. sec CTATGAG

CTATGAG

GTATGAG

GTATGAG

D. melD. sec

GGCGT

GGCGT

GACGT

8/8

Common Exons Alternative 3’ Terminal Exon Groups

CG42235

trans mate-pairs

A

(A)n(A)n (A)n (A)n(A)n

(RE)

GATCTGGTCT

GGTCT 8/8

(RD)

RA Isoform RB Isoform RC Isoform RD Isoform RE Isoform

D. mel

. D. sec

. D. mel

. Dec

D. mel

. D. sec

. Mix Hybrid

Mix H

id

D. mel

. D. s

t ans i

( )n

G GTATGAG

D mel. sec.

GGCGTGACGT

irs

( )n ( )n(A)n

(

GATCGGTCT

G 8/8

(i

D. mell

.Del

D.. mel

D. sec

.Dsec

D.. sesececD. m

el.D

el

D.. memelel

D. sec

.Dsec

D.. sesececD. m

el.D

el

D.. melel

D. sec

.Dsec

D.. sesececMixMixxMixMixMixMi Hyb

ridyb

rid

HybHybHybrid

MixMixMixxMixMixMixMi Hybrid

ybrid

HybHybridrid

D. mel

.Del

D.. memelel

D. sec

Dsec

D. sesecec G

CTATGAGATGATATGAGCC

TATGAGA AGGAGGTATCCC

ATGATATGAGGGTATGAGGD.. melmel..D eee... lllmDDD.. secsec..D ecceess...D

GCGTG CGG TGGACGTGACGGACGTAAA

ttrans a srra sssaaarrttt nrar mate-pairsaa pe p saeta irm - ii

(((A))A))((A(( nnnnn)(A)A)(A(( nnnn )((A)A)(A(( nnnnnn))(A))A))(A(( nnnn

(RE)R )((RE

TCTGA CGAT TAAAGTCTGGGTCTGG

TCTG TCCTGGGGG 888/88/88/8

RD(RD))(( DR

1 kb

1 kb

5 kb

2 31

3 (RE)(RD)

3 (RD)

3 6

321

3 64

1 2 3 4

1

1 3

Fig. 3. Examples of newly identified trans-spliced genes. Trans-splicing was validated using RT-PCR with primers specific to D. melanogaster (red) andD. sechellia (blue). Trans-splicing is validated by the presence of RT-PCR products when using opposite species forward and reverse primers with hybrid, butnot mixed control (Mix) cDNA. Several clones of these putative trans-splicing products were sequenced to verify a clean transition of species-specific sequencesat splicing junctions. (A) CG42235 contains a set of common 5′ exons which are trans-spliced to multiple alternative 3′ terminal exon groups. (B) The ome genehas multiple alternative transcription initiation exons which are trans-spliced to a set of common 3′ terminal exons. (C) Nmdmc is an example of trans-splicingof nested genes, as the gene Rel is located within the intron.

12978 | www.pnas.org/cgi/doi/10.1073/pnas.1007586107 McManus et al.

Dow

nloa

ded

by g

uest

on

Apr

il 8,

202

0

Page 5: Global analysis of trans-splicing in Drosophilaprotein-coding exons in any organism is unknown. Here, we used paired-end deep sequencing of mRNA to identify genes that un-dergo trans-splicing

mRNA-seq Data Analysis. Sequence image analysis was performed using theFirecrest, Bustard and GERALD programs (Illumina). Sequences were alignedseparately to both the Dmelanogaster (2006, dm3) and D. sechellia (droSec1)genome assemblies (21) using Bowtie (22). Allele-specific sequence readassignments were performed as previously described (23). Briefly, sequencereads were aligned requiring no-mismatches, and alignment results werecompared to identify sequences that aligned to only one genome andmapped to a single genomic location. The coordinates of D. sechellia-specificreads were converted to their syntenic D. melanogaster coordinates usingthe lift-over tool (http://genome.ucsc.edu). Species-specific sequence readswere mapped to all annotated exons (Flybase 5.11) using a custom perl script“exonhitter” (23).

Additional custom scripts were used to identify cis– and trans–mate-pairsand for further downstream analyses. Mate-pairs were first examined toidentify pairs whose reads mapped to different exons. These pairs were fur-ther separated into “same gene” and “different gene” categories if the endsof the pair mapped to the same or different genes, respectively. “Differentgene” read-pairs were parsed into same- and different-chromosome cate-gories, if the genes to which they mapped were located on the same or dif-ferent chromosomes. The number of mate-pairs mapping to each exon pairwere counted.

The total number of cis- and trans- “same gene” mate-pairs was calcu-lated for each gene. All genes with at least one hybrid trans–mate-pair wereconsidered as trans-splicing candidates. Custom browser tracks were gen-erated to view the location of allele-specific sequence reads and trans–mate-pairs on the University of California–Santa Cruz genome browser. The trans-splicing candidate genes were visually evaluated to identify genes whosehybrid and negative control trans–mate-pairs align to the same sets of SNPs,which is indicative of strand-switching and mapping bias because of refer-ence genome errors (Fig. S2). Candidate trans-splicing events containingthese potential sources of error were not considered further.

The mRNA-seq protocol used in this study results in sequences that are notstrand-specific (i.e., one does not know from which strand an observedmRNA-seq read was generated). However, the relative strands of each se-

quence mate-pair can be analyzed. If a mate-pair was generated froma continuous mRNA, the mate-pair reads should map to opposite strands inthe reference genome.We used this relative strand information to determinewhether the reads in putative chimeric read-pairs could both come froma protein-coding sequence. For example, if two genes were encoded on thepositive DNA strand, the reads in a chimeric mate-pair derived from thecoding sequence of both genes would align to opposite DNA strands. If thereads in a mate-pair aligned to the same DNA strand, the sequence fromone read in the pair must have originated from the noncoding strand ofa gene. We calculated the frequency of coding and noncoding mate-pairs foreach apparent chimeric junction between two different genes using a customperl script. Custom scripts were also used to identify genes with multiplechimeric junctions (gene1 to gene2, gene1 to gene3, and so forth). Tissue-specific gene expression patterns were downloaded from FlyAtlas (http://flyatlas.org/) (16) and custom perl scripts were used to compare the ex-pression patterns of genes in each potential chimeric gene pair. Genes wereconsidered to be expressed in a tissue if all four of the microarray experi-ments reported expression.

Validation. RNA from different biological replicates was reverse-transcribedusing SuperScript II RT (Invitrogen) to prepare cDNA for validation PCR.Species-specific primers (Table S6) were designed for 23 isoforms of 20candidate genes [including mod(mdg4) and lola]. Species-specific PCR am-plification was successful for 11 genes, failed completely (no product wasgenerated) for 3 genes, and was nonspecific for 6 genes. RT-PCR productsgenerated from hybrid cDNA were cloned and sequenced to verify clean SNPtransitions at exon-exon junctions (Fig. S3).

ACKNOWLEDGMENTS. We thank members of the B.R.G. laboratory fordiscussions and comments on the manuscript, Thom Theara for assistancewith the Illumina GAIIx, and the University of Connecticut Health CenterTranslational Genomics Core Facility for use of the instrument. This work wassupported by National Institutes of Health Grant GM062516 (to B.R.G.).

1. Konarska MM, Padgett RA, Sharp PA (1985) Trans splicing of mRNA precursors invitro. Cell 42:165–171.

2. Solnick D (1985) Trans splicing of mRNA precursors. Cell 42:157–164.3. Sutton RE, Boothroyd JC (1986) Evidence for trans splicing in trypanosomes. Cell 47:

527–535.4. Nilsen TW (2001) Evolutionary origin of SL-addition trans-splicing: Still an enigma.

Trends Genet 17:678–680.5. Dorn R, Reuter G, Loewendorf A (2001) Transgene analysis proves mRNA trans-

splicing at the complex mod(mdg4) locus in Drosophila. Proc Natl Acad Sci USA 98:9724–9729.

6. Labrador M, et al. (2001) Protein encoding by both DNA strands. Nature 409:1000.7. Horiuchi T, Giniger E, Aigaki T (2003) Alternative trans-splicing of constant and

variable exons of a Drosophila axon guidance gene, lola. Genes Dev 17:2496–2501.8. Gingeras TR (2009) Implications of chimaeric non-co-linear transcripts. Nature 461:

206–211.9. Cocquet J, Chong A, Zhang G, Veitia RA (2006) Reverse transcriptase template

switching and false alternative transcripts. Genomics 88:127–131.10. Odelberg SJ, Weiss RB, Hata A, White R (1995) Template-switching during DNA

synthesis by Thermus aquaticus DNA polymerase I. Nucleic Acids Res 23:2049–2057.11. Tasic B, et al. (2002) Promoter choice determines splice site selection in protocadherin

alpha and gamma pre-mRNA splicing. Mol Cell 10:21–33.12. Li X, Zhao L, Jiang H, Wang W (2009) Short homologous sequences are strongly

associated with the generation of chimeric RNAs in eukaryotes. J Mol Evol 68:56–65.

13. Zhang G, et al. (2010) Deep RNA sequencing at single base-pair resolution reveals

high complexity of the rice transcriptome. Genome Res 20:646–654.14. Gabler M, et al. (2005) Trans-splicing of the mod(mdg4) complex locus is conserved

between the distantly related species Drosophila melanogaster and D. virilis. Genetics

169:723–736.15. Li H, Wang J, Mor G, Sklar J (2008) A neoplastic gene fusion mimics trans-splicing of

RNAs in normal human cells. Science 321:1357–1361.16. Chintapalli VR, Wang J, Dow JA (2007) Using FlyAtlas to identify better Drosophila

melanogaster models of human disease. Nat Genet 39:715–720.17. Metz CW (1916) Chromosome studies on the Diptera II. The paired association of

chromosomes in the Diptera and its significance. J Exp Zool 21:213–279.18. Robertson HM, Navik JA, Walden KK, Honegger HW (2007) The bursicon gene in

mosquitoes: An unusual example of mRNA trans-splicing. Genetics 176:1351–1353.19. Ozsolak F, et al. (2009) Direct RNA sequencing. Nature 461:814–818.20. Mamanova L, et al. (2010) FRT-seq: Amplification-free, strand-specific transcriptome

sequencing. Nat Methods 7:130–132.21. Clark AG, et al.; Drosophila 12 Genomes Consortium (2007) Evolution of genes and

genomes on the Drosophila phylogeny. Nature 450:203–218.22. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient

alignment of short DNA sequences to the human genome. Genome Biol 10:R25.23. McManus CJ, et al. (2010) Regulatory divergence in Drosophila revealed by mRNA-

seq. Genome Res 20:816–825.

McManus et al. PNAS | July 20, 2010 | vol. 107 | no. 29 | 12979

GEN

ETICS

Dow

nloa

ded

by g

uest

on

Apr

il 8,

202

0

Page 6: Global analysis of trans-splicing in Drosophilaprotein-coding exons in any organism is unknown. Here, we used paired-end deep sequencing of mRNA to identify genes that un-dergo trans-splicing

Corrections

GENETICSCorrection for “Global analysis of trans-splicing in Drosophila,”by C. Joel McManus, Michael O. Duff, Jodi Eipper-Mains, andBrenton R. Graveley, which appeared in issue 29, July 20, 2010,of Proc Natl Acad Sci USA (107:12975–12979; first publishedJuly 1, 2010; 10.1073/pnas.1007586107).The authors note that, within the supporting information

Web link “http://intron.ccam.uchc.edu/Graveley/Publications/Publications.html” should be removed. Tables S1–S6 havebeen added to the online publication. The online version hasbeen corrected.

www.pnas.org/cgi/doi/10.1073/pnas.1304972110

IMMUNOLOGYCorrection for “Association of RIG-I with innate immunity ofducks to influenza,” by Megan R. W. Barber, Jerry R. Aldridge, Jr.,Robert G. Webster, and Katharine E. Magor, which appeared inissue 13, March 30, 2010, of Proc Natl Acad Sci USA (107:5913–5918; first published March 22, 2010; 10.1073/pnas.1001755107).The authors note that on page 5917, right column, second full

paragraph, line 12 “ 5′-GTG TAT GGA GGA AAA CCC TATTTC TTA ACT-3′ ” should instead appear as “ 5′-GTG TATGGA GGA AAA CCC TAT TCT TAA CT-3′ ”.www.pnas.org/cgi/doi/10.1073/pnas.1306250110

MEDICAL SCIENCESCorrection for “Prolonged nerve blockade delays the onset ofneuropathic pain,” by Sahadev A. Shankarappa, Jonathan H.Tsui, Kristine N. Kim, Gally Reznor, Jenny C. Dohlman, RobertLanger, and Daniel S. Kohane, which appeared in issue 43,October 23, 2012, of Proc Natl Acad Sci USA (109:17555–17560;first published October 8, 2012; 10.1073/pnas.1214634109).The authors note that the following statement should be

added as a new Acknowledgments section: “This work wassupported by National Institute of General Medical SciencesGrant GM073626 (to D.S.K.).”

www.pnas.org/cgi/doi/10.1073/pnas.1306394110

BIOCHEMISTRY, ENVIRONMENTAL SCIENCESCorrection for “Proteomic analysis of skeletal organic matrixfrom the stony coral Stylophora pistillata,” by Jeana L. Drake, TaliMass, Liti Haramaty, Ehud Zelzion, Debashish Bhattacharya,and Paul G. Falkowski, which appeared in issue 10, March 5,2013, of Proc Natl Acad Sci USA (110:3788–3793; first publishedFebruary 19, 2013; 10.1073/pnas.1301419110).The authors note that Table 1 appeared incorrectly. Within the

Name column, “CARP8” should instead appear as “CARP4,”and “CARP9” should instead appear as “CARP5.” These errorsdo not affect the conclusions of the article.

www.pnas.org/cgi/doi/10.1073/pnas.1305081110

7958–7959 | PNAS | May 7, 2013 | vol. 110 | no. 19 www.pnas.org

Page 7: Global analysis of trans-splicing in Drosophilaprotein-coding exons in any organism is unknown. Here, we used paired-end deep sequencing of mRNA to identify genes that un-dergo trans-splicing

Table

1.Th

irty-six

predictedproteinsin

S.pistilla

taSO

MsamplesdetectedbyLC

-MS/MSan

dtheirbioinform

aticsan

alysis

Protein

Gen

eAccessionno.

Nam

eP.

dam

icornis

A.digitifera

Faviasp.

N.ve

cten

sis

P.max

ima

S.purpuratus

E.huxley

iiR.filosa

H.sapiens

T.pseudonan

a

P1g11

108

KC50

9948

Protocadherin

fat-lik

e–

++*†

‡+

+–

––

––

P2g11

187

KC49

3647

CARP4

++‡

+†‡

––

––

––

P3g12

510

KC34

2189

Thrombospondin

–+

+†

+–

––

––

P4g98

61KC34

2190

Viral

inclusionprotein

+*

++*†

+–

––

––

P5g11

674

KC15

0884

Hem

icen

tin

++†

+‡

+‡

++

––

+‡

P6g11

666

KC14

9520

Actin

+*†

+*

+*

+*

+*

+*

+*

+*

+*

+*

P7g46

01KC34

2191

Actin

+*

+*‡

+*

+*†

+*

+*

+*

+*

+*

+*

P8g96

54KC34

2192

Majoryo

lkprotein

+‡

++†‡

–+

––

––

P9g10

811

KC00

0002

Protocadherin

fat-lik

e–

+‡

+†‡

+‡

––

––

––

P10

g11

107

KC50

9947

Cad

herin

+*

++†‡

+–

––

––

P11

g13

727

KC34

2193

Actin

+*

+*†

+*

+*

+*

+*

+*

+*

+*

+*

P123

g23

85JX

8916

54—

––

+†‡

––

––

––

P13

g69

18KC34

2194

Sushidomain-containing

++†

+–

––

––

––

P14

g99

51KC34

2195

Colla

gen

-alpha

–+

+†

––

––

––

P15

g15

32KC49

3648

CARP5

–+‡

+†‡

––

––

––

P16

g11

702

KC34

2196

—–

+†

+*

+–

––

––

P17

g12

472

KC14

9521

Glyceraldeh

yde3-phosphatase

deh

ydrogen

ase

+*

+*†

+*

+*

++

++

+*

+

P18

g81

0KC34

2197

Colla

gen

-alpha

–+

+†

+–

––

––

P19

g20

041

KC34

2198

Contactin-associated

protein

–+

+†‡

+–

––

––

P20

g60

66KC34

2199

MAM

domainan

chorprotein

++†‡

+*

+–

–+

–+‡

P21

g18

277

KC47

9163

Zonapellucida

+*†

‡+

+*‡

+–

––

––

P22

g19

762

KC49

3649

—–

––

––

––

––

P23

g10

57KC00

0004

Protocadherin

++

+*

+†

++

––

––

P24

g15

888

KC47

9164

Vitellogen

in–

++*†

‡–

––

––

––

P25

g11

220

KC47

9165

Ubiquitin

+*

+*‡

+*

+*†

+*

+*

+*

+*

+*

+*

P26

g14

41KC47

9166

Vitellogen

in+

–+†‡

––

––

––

P27

g18

472

KC47

9167

Integrin-alpha

+*

++†‡

––

––

––

P28

g11

651

KC14

9519

Late

embryogen

esisprotein

+†

––

––

––

––

P29

g13

377

KC47

9168

Tubulin

-beta

+*

+*

+*†

+*

+*

+*

+*

+*

+*

+P3

0g11

056

KC00

0003

Myo

sinregulatory

lightch

ain

+*

++†‡

–+

––

––

P31

g20

420

KC47

9169

Neu

rexin

–+

+†‡

––

––

––

P32

g55

40KC47

9170

Kielin

/chordin

llke

+*†

+‡

+‡

––

––

–+‡

P33

g89

85KC47

9171

Flag

ellarassociated

protein

+*†

+*

+*

+–

––

––

P34

g17

14KC47

9172

MAM/LDLreceptordomain

containingprotein

++‡

+*†

+–

++

–+

P35

g73

49EU

5321

64.1

Carbonic

anhyd

rase

(STP

CA2)

+*†

+‡

+–

+–

––

+–

P36

g13

890

KC47

9173

Zonad

hesion-likeprecu

rsor

++*†

‡+

+*

–+

––

––

Returned

sequen

ceswithe-va

lues

≤10

−10arepresentedin

order

ofdecreasinge-va

lue.

“Protein

nam

e”isthebestBLA

SThitin

NCBI.“Gen

e”istheco

denumber

inourS.

pistilla

tagen

epredictionmodel.

The“+”an

d“–”representpresence

andab

sence,respective

ly,ofsimila

rsequen

cesin

comparisonspecies.

*Seq

uen

cesimila

rity

isgreater

than

70%

.†Most

simila

rsequen

cebybitscore.

‡Indicates

export

signal.

PNAS | May 7, 2013 | vol. 110 | no. 19 | 7959

CORR

ECTIONS