Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Use and Complexity of existing RNA-tools
M. Marz
University of Leipzig
Tianjin, China09.11.2009
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 1 / 30
Evolution of most important ncRNAs in biological networks
CHOANOFLAGELLATAANIMALIA
FUNGI
AMOEBOZOA
PLANTAERHODOPHYTAHETEROKONTAAPICOMPLEXACILIATESKINETOPLASTIDAEUGLENOZOAMETAMONADA
NANOARCHAEOTACRENARCHAEOTAEURYARCHAEOTA
PROTEOBACTERIACHLAMYDIA
VertebrataUrochordataCephalochordataEchinodermataHemichordata
NematodaArthropodaPlatyhelminthesAnnelidaMolluscaCnidariaPorifera
SmY
RNase P
ACTINOBACTERIACYANOBACTERIAFIRMICUTES
RNAi
telomerase−RNA
snoRNAsTaphrinomycotinaSaccharomycotinaPezizomycotinaBasidomycotaGlomeromycoyaChytridiomycoyaMicrosporidia
AngiospermsConiferalesBryophyta CharalesChlorphyta
LUCA
U7microRNAmechamism
Minor snRNAs
miRNAs
vault
Y RNA
miRNAs
miRNAs
Yfr1
tmRNA6S
SRP
rRNA
gRNAs
Major snRNAs
SL RNA ?
miRNAs
tRNA
RNase MRP
7SK
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 2 / 30
Protein-coding Genes
�����������������������������������
�����������������������������������
����������������������������
����������������������������
5’ 3’
5’ 3’
5’ 3’
CAPAAA
AS
ASAS AS
AS
AS
ASAS
AS
AS
ASAS AS
AS
AS
ASAS
AS
AS
7SK
U7
U4
rRNA
tRNA
CAP
AAA
AAA
Pol II
Histone
TATA
Enhancer
Chromosome
pre−mRNA
mRNA
NUCLEUS
CYTOPLASM
Ribosome
Intron
miRNA
CAP
CAP
Exon
Protein
DNA
RNA
U1U2
U5U6
(SL)
Spliceosome
export
Tra
nscr
iptio
nP
roce
ssin
gT
rans
latio
n
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 3 / 30
Non-(protein)-coding Genes
�����������������������������������
�����������������������������������
����������������������������
����������������������������
(AAA)5’ 3’
pre−ncRNA
ncRNA
action
action
action
action
(Pol II/Pol III)
Histone
(TATA)
(Enhancer)
Chromosome
NUCLEUS
CYTOPLASM
(CAP)
(CAP)
Tra
nscr
iptio
nP
roce
ssin
g
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 4 / 30
Programs for Homology Search
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 5 / 30
Programs for Homology Search
How to choose from 86 programs?
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 5 / 30
Programs for Homology Search
How to choose from 86 programs?
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 5 / 30
Pipeline for Homology Search
Sequence conserved search
Structural conserved search
Pattern andStructure search
Genomwide ncRNA Search
Ensembl comparaBiofuice
SyntenyConservation
Blast
GotohScan RNAmotif
Hypa
Infernal rnabob
fragrep
RNAfold −C
yes
yes yes
yes
yes
yes
yes
yes
no
no
no
no
no
no
no
no
Maybe absent?
Multiple copies?
General Homology Search(known RNA needed)
tRNAscan−SE
SRP−scan
Bcheck
RNAmicro
snoReport
no
no
no
no
Specific Programs(no RNA input)
yes
yes
yes
yes
yes
yes
blastclust (upstream/downstream)
no
(Pseudogenes, Assembly copies)Remove duplicates
rnabob
Promotersearch
(known
no
MEME(unkownpromoter)
promoter)
Clustalw
Locarnate
Ralee mode,RNAsuboptRNAduplex
no
no
Manual Analysis:
MultipleAlignment
SnoplexRNAup
RNAduplex(RIP)
TargetPrediction
no
Synblast
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 6 / 30
BLAST 1
Sequence based local alignments(blastn, blastp, blastx, tblastn, PSI-blast)
index based databases (NCBI, Rfam, Noncode, ...)
1Altschul et al (1990)M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 7 / 30
BLAST 1
Sequence based local alignments(blastn, blastp, blastx, tblastn, PSI-blast)
index based databases (NCBI, Rfam, Noncode, ...)
heuristic Smith-Waterman algorithm
Fi ,j = max
0,
Fi−1,j−1 + σ(pi , qj ),
Fi−1,j − d ,
Fi ,j−1 − d
1Altschul et al (1990)M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 7 / 30
BLAST 1
Sequence based local alignments(blastn, blastp, blastx, tblastn, PSI-blast)
index based databases (NCBI, Rfam, Noncode, ...)
heuristic Smith-Waterman algorithm
Fi ,j = max
0,
Fi−1,j−1 + σ(pi , qj ),
Fi−1,j − d ,
Fi ,j−1 − d
seed
11nt (blastn), 28nt (megablast), 3aa (other programs)
insertions/deletions
constant costs per nucleotide/amino acid1Altschul et al (1990)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 7 / 30
GoTohScan 2
full dynamic programming approach
semi-global alignment
affine gap costs for long insertions/deletions
2Hertel et al (2009)M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 8 / 30
GoTohScan 2
full dynamic programming approach
semi-global alignment
affine gap costs for long insertions/deletions
Dij = max {Si−1,j + γo ,Di−1,j + γe}
Fij = max {Si ,j−1 + γo ,Fi ,j−1 + γe}
Sij = max {Dij ,Fij ,Si−1,j−1 + σ(pi , qj )}
100 150 200alignment score
0
1
2
3
4
5
6
log(
# al
ignm
ents
)
U4atac
150 200 250alignment score
U17 snoRNA
150 200alignment score
RNAse MRP
2Hertel et al (2009)M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 8 / 30
GoTohScan 2
full dynamic programming approach
semi-global alignment
affine gap costs for long insertions/deletions
Dij = max {Si−1,j + γo ,Di−1,j + γe}
Fij = max {Si ,j−1 + γo ,Fi ,j−1 + γe}
Sij = max {Dij ,Fij ,Si−1,j−1 + σ(pi , qj )}
100 150 200alignment score
0
1
2
3
4
5
6
log(
# al
ignm
ents
)
U4atac
150 200 250alignment score
U17 snoRNA
150 200alignment score
RNAse MRP
Slow: O(n × m) time and memory2Hertel et al (2009)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 8 / 30
Genomic Context
Genome Browser (Ensembl, UCSC, flybase, wormbase, ...)
1.60 Mb 1.70 Mb 1.80 Mb 1.90 Mb 2.00 Mb 2.10 Mb 2.20 Mb 2.30 Mb 2.40 Mb 2.50 Mb
p13.3Chromosome bands
< AC130689.8.1.202269 < AC090617.16.1.204630 < AC015799.23.1.180157Contigs
TLCD2
C17orf91
AC130689.8
AC130689.8
RTN4RL1
DPH1
HIC1
SMG6
TSR1
SGSM2
AC006435.7
METT10D
PAFAH1B1
PRPF8
WDR81
SERPINF2
SERPINF1
SMYD4 RPA1
OVCA2
SRR
MNT
Ensembl/Havana g...
hsa-mir-22 hsa-mir-132
hsa-mir-212
SNORD91
SNORD91
AC015799.23
SRP_euk_arch
AC015799.23
AC005696.1
SRP_euk_arch
AC005696.1
ncRNA gene
1.60 Mb 1.70 Mb 1.80 Mb 1.90 Mb 2.00 Mb 2.10 Mb 2.20 Mb 2.30 Mb 2.40 Mb 2.50 Mb
Ensembl Homo sapiens version 53.36o (NCBI36) Chromosome 17: 1,531,851 - 2,531,850
1.00 Mb Forward strand
3Thompson et al (1994)4Lehmann et al. (2008)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 9 / 30
Genomic Context
Genome Browser (Ensembl, UCSC, flybase, wormbase, ...)
1.60 Mb 1.70 Mb 1.80 Mb 1.90 Mb 2.00 Mb 2.10 Mb 2.20 Mb 2.30 Mb 2.40 Mb 2.50 Mb
p13.3Chromosome bands
< AC130689.8.1.202269 < AC090617.16.1.204630 < AC015799.23.1.180157Contigs
TLCD2
C17orf91
AC130689.8
AC130689.8
RTN4RL1
DPH1
HIC1
SMG6
TSR1
SGSM2
AC006435.7
METT10D
PAFAH1B1
PRPF8
WDR81
SERPINF2
SERPINF1
SMYD4 RPA1
OVCA2
SRR
MNT
Ensembl/Havana g...
hsa-mir-22 hsa-mir-132
hsa-mir-212
SNORD91
SNORD91
AC015799.23
SRP_euk_arch
AC015799.23
AC005696.1
SRP_euk_arch
AC005696.1
ncRNA gene
1.60 Mb 1.70 Mb 1.80 Mb 1.90 Mb 2.00 Mb 2.10 Mb 2.20 Mb 2.30 Mb 2.40 Mb 2.50 Mb
Ensembl Homo sapiens version 53.36o (NCBI36) Chromosome 17: 1,531,851 - 2,531,850
1.00 Mb Forward strand
Information from close related species
3Thompson et al (1994)4Lehmann et al. (2008)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 9 / 30
Genomic Context
Genome Browser (Ensembl, UCSC, flybase, wormbase, ...)
1.60 Mb 1.70 Mb 1.80 Mb 1.90 Mb 2.00 Mb 2.10 Mb 2.20 Mb 2.30 Mb 2.40 Mb 2.50 Mb
p13.3Chromosome bands
< AC130689.8.1.202269 < AC090617.16.1.204630 < AC015799.23.1.180157Contigs
TLCD2
C17orf91
AC130689.8
AC130689.8
RTN4RL1
DPH1
HIC1
SMG6
TSR1
SGSM2
AC006435.7
METT10D
PAFAH1B1
PRPF8
WDR81
SERPINF2
SERPINF1
SMYD4 RPA1
OVCA2
SRR
MNT
Ensembl/Havana g...
hsa-mir-22 hsa-mir-132
hsa-mir-212
SNORD91
SNORD91
AC015799.23
SRP_euk_arch
AC015799.23
AC005696.1
SRP_euk_arch
AC005696.1
ncRNA gene
1.60 Mb 1.70 Mb 1.80 Mb 1.90 Mb 2.00 Mb 2.10 Mb 2.20 Mb 2.30 Mb 2.40 Mb 2.50 Mb
Ensembl Homo sapiens version 53.36o (NCBI36) Chromosome 17: 1,531,851 - 2,531,850
1.00 Mb Forward strand
Information from close related species
Alignment: ClustalW 3/ClustalX
3Thompson et al (1994)4Lehmann et al. (2008)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 9 / 30
Genomic Context
Genome Browser (Ensembl, UCSC, flybase, wormbase, ...)
1.60 Mb 1.70 Mb 1.80 Mb 1.90 Mb 2.00 Mb 2.10 Mb 2.20 Mb 2.30 Mb 2.40 Mb 2.50 Mb
p13.3Chromosome bands
< AC130689.8.1.202269 < AC090617.16.1.204630 < AC015799.23.1.180157Contigs
TLCD2
C17orf91
AC130689.8
AC130689.8
RTN4RL1
DPH1
HIC1
SMG6
TSR1
SGSM2
AC006435.7
METT10D
PAFAH1B1
PRPF8
WDR81
SERPINF2
SERPINF1
SMYD4 RPA1
OVCA2
SRR
MNT
Ensembl/Havana g...
hsa-mir-22 hsa-mir-132
hsa-mir-212
SNORD91
SNORD91
AC015799.23
SRP_euk_arch
AC015799.23
AC005696.1
SRP_euk_arch
AC005696.1
ncRNA gene
1.60 Mb 1.70 Mb 1.80 Mb 1.90 Mb 2.00 Mb 2.10 Mb 2.20 Mb 2.30 Mb 2.40 Mb 2.50 Mb
Ensembl Homo sapiens version 53.36o (NCBI36) Chromosome 17: 1,531,851 - 2,531,850
1.00 Mb Forward strand
Information from close related species
Alignment: ClustalW 3/ClustalX
Synblast4
3Thompson et al (1994)4Lehmann et al. (2008)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 9 / 30
Genomic Context
Genome Browser (Ensembl, UCSC, flybase, wormbase, ...)
1.60 Mb 1.70 Mb 1.80 Mb 1.90 Mb 2.00 Mb 2.10 Mb 2.20 Mb 2.30 Mb 2.40 Mb 2.50 Mb
p13.3Chromosome bands
< AC130689.8.1.202269 < AC090617.16.1.204630 < AC015799.23.1.180157Contigs
TLCD2
C17orf91
AC130689.8
AC130689.8
RTN4RL1
DPH1
HIC1
SMG6
TSR1
SGSM2
AC006435.7
METT10D
PAFAH1B1
PRPF8
WDR81
SERPINF2
SERPINF1
SMYD4 RPA1
OVCA2
SRR
MNT
Ensembl/Havana g...
hsa-mir-22 hsa-mir-132
hsa-mir-212
SNORD91
SNORD91
AC015799.23
SRP_euk_arch
AC015799.23
AC005696.1
SRP_euk_arch
AC005696.1
ncRNA gene
1.60 Mb 1.70 Mb 1.80 Mb 1.90 Mb 2.00 Mb 2.10 Mb 2.20 Mb 2.30 Mb 2.40 Mb 2.50 Mb
Ensembl Homo sapiens version 53.36o (NCBI36) Chromosome 17: 1,531,851 - 2,531,850
1.00 Mb Forward strand
Information from close related species
Alignment: ClustalW 3/ClustalX
Synblast4, other Special Synteny Programs
cel 10
cel 13
cel 11
cre 12
cre 13
cre 23 cre 27 cre 29
3Thompson et al (1994)4Lehmann et al. (2008)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 9 / 30
Genomic Context
Genome Browser (Ensembl, UCSC, flybase, wormbase, ...)
1.60 Mb 1.70 Mb 1.80 Mb 1.90 Mb 2.00 Mb 2.10 Mb 2.20 Mb 2.30 Mb 2.40 Mb 2.50 Mb
p13.3Chromosome bands
< AC130689.8.1.202269 < AC090617.16.1.204630 < AC015799.23.1.180157Contigs
TLCD2
C17orf91
AC130689.8
AC130689.8
RTN4RL1
DPH1
HIC1
SMG6
TSR1
SGSM2
AC006435.7
METT10D
PAFAH1B1
PRPF8
WDR81
SERPINF2
SERPINF1
SMYD4 RPA1
OVCA2
SRR
MNT
Ensembl/Havana g...
hsa-mir-22 hsa-mir-132
hsa-mir-212
SNORD91
SNORD91
AC015799.23
SRP_euk_arch
AC015799.23
AC005696.1
SRP_euk_arch
AC005696.1
ncRNA gene
1.60 Mb 1.70 Mb 1.80 Mb 1.90 Mb 2.00 Mb 2.10 Mb 2.20 Mb 2.30 Mb 2.40 Mb 2.50 Mb
Ensembl Homo sapiens version 53.36o (NCBI36) Chromosome 17: 1,531,851 - 2,531,850
1.00 Mb Forward strand
Information from close related species
Alignment: ClustalW 3/ClustalX
Synblast4, other Special Synteny Programs
cel 10
cel 13
cel 11
cre 12
cre 13
cre 23 cre 27 cre 29
3Thompson et al (1994)4Lehmann et al. (2008)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 9 / 30
Flanking region/Promoter and TFBS search
Motif search: rnabob 5,
5Eddy (1992)6Bailey & Elkan (1994)7Prohaska, in prep.
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 10 / 30
Flanking region/Promoter and TFBS search
Motif search: rnabob 5,
New motif: MEME 6,
5Eddy (1992)6Bailey & Elkan (1994)7Prohaska, in prep.
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 10 / 30
Flanking region/Promoter and TFBS search
Motif search: rnabob 5,
New motif: MEME 6,
Enhancer elements:Tracker7
A D
B C
E
5Eddy (1992)6Bailey & Elkan (1994)7Prohaska, in prep.
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 10 / 30
Flanking region/Promoter and TFBS search
Motif search: rnabob 5,
New motif: MEME 6,
Enhancer elements:Tracker7
A D
B C
E
TFBS: Transfac (commercial)
5Eddy (1992)6Bailey & Elkan (1994)7Prohaska, in prep.
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 10 / 30
Flanking region/Promoter and TFBS search
Motif search: rnabob 5,
New motif: MEME 6,
Enhancer elements:Tracker7
A D
B C
E
TFBS: Transfac (commercial) Polymerase II/III transcriptIdentification of assembly artefactsPrediction of pseudogenes (!)
5Eddy (1992)6Bailey & Elkan (1994)7Prohaska, in prep.
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 10 / 30
Sequence vs. Structure
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 11 / 30
Sequence vs. StructureExample: U12 snRNA of C. capitata and X. tropicalis (nt 25-78)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 11 / 30
Sequence vs. StructureExample: U12 snRNA of C. capitata and X. tropicalis (nt 25-78)
AUGC
CU
UAAA
CUAAUG
A G UAAGGAAAAUAAUGAGUCCUG
GUGA
C GC G G G G C U C
CC
AG
GUUCA
C UAU
CC
UG
GACGAAUUUCUGAGAG G G C UCA G G U C G U
CC G U G GGG U G G C C C G C
C U ACU
UUUGCGGGCUGCCCGCGU
UGUAGCGAUCUGC
CCGA
GCCC
C. capitata U12 snRNA
UGCC
UU
AAA
CUAAUG
A G UAAGGAAAAUAACAAACCAGG
GUGA
U GC C U G G U U U
AU
UC
ACU
AC U
UG
UG
AAAUGAAUUUUU
GAGC A G G UACA G G C C U U
CC C U U GCA G G U U C U A U
C UAC
UUUGUGGGACCGUGAGGU
GCACUGGACUGCCUG
X. tropicalis U12 snRNA
RNAfold a,
aHofacker (2003)bHofacker (2003)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 11 / 30
Sequence vs. StructureExample: U12 snRNA of C. capitata and X. tropicalis (nt 25-78)
AUGC
CU
UAAA
CUAAUG
A G UAAGGAAAAUAAUGAGUCCUG
GUGA
C GC G G G G C U C
CC
AG
GUUCA
C UAU
CC
UG
GACGAAUUUCUGAGAG G G C UCA G G U C G U
CC G U G GGG U G G C C C G C
C U ACU
UUUGCGGGCUGCCCGCGU
UGUAGCGAUCUGC
CCGA
GCCC
C. capitata U12 snRNA
UGCC
UU
AAA
CUAAUG
A G UAAGGAAAAUAACAAACCAGG
GUGA
U GC C U G G U U U
AU
UC
ACU
AC U
UG
UG
AAAUGAAUUUUU
GAGC A G G UACA G G C C U U
CC C U U GCA G G U U C U A U
C UAC
UUUGUGGGACCGUGAGGU
GCACUGGACUGCCUG
X. tropicalis U12 snRNA
_UGC
CU
UAA
ACU
AAU G
AG U
AAGGAAAAUAACAAACCAGG
GUGA
C GC C G G G C U C
_C
CA
AC__
CA
C UA_
CC
GA
AACGAAUUUCUGAG_C A G C U
_C A G G C C G U
C _C C U G G CAG G G C C C A C
C U ACU
UU_GCGGGACCCCCA_CG
UGC_ACCGAACUG_
____CCCC _U
AG
CG
CU
Alignment of C. capitata and X. tropicalis U12 snRNA
RNAfold a, RNAalifold b
aHofacker (2003)bHofacker (2003)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 11 / 30
Sequence vs. StructureExample: U12 snRNA of C. capitata and X. tropicalis (nt 25-78)
AUGC
CU
UAAA
CUAAUG
A G UAAGGAAAAUAAUGAGUCCUG
GUGA
C GC G G G G C U C
CC
AG
GUUCA
C UAU
CC
UG
GACGAAUUUCUGAGAG G G C UCA G G U C G U
CC G U G GGG U G G C C C G C
C U ACU
UUUGCGGGCUGCCCGCGU
UGUAGCGAUCUGC
CCGA
GCCC
C. capitata U12 snRNA
UGCC
UU
AAA
CUAAUG
A G UAAGGAAAAUAACAAACCAGG
GUGA
U GC C U G G U U U
AU
UC
ACU
AC U
UG
UG
AAAUGAAUUUUU
GAGC A G G UACA G G C C U U
CC C U U GCA G G U U C U A U
C UAC
UUUGUGGGACCGUGAGGU
GCACUGGACUGCCUG
X. tropicalis U12 snRNA
_UGC
CU
UAA
ACU
AAU G
AG U
AAGGAAAAUAACAAACCAGG
GUGA
C GC C G G G C U C
_C
CA
AC__
CA
C UA_
CC
GA
AACGAAUUUCUGAG_C A G C U
_C A G G C C G U
C _C C U G G CAG G G C C C A C
C U ACU
UU_GCGGGACCCCCA_CG
UGC_ACCGAACUG_
____CCCC _U
AG
CG
CU
Alignment of C. capitata and X. tropicalis U12 snRNA
RNAfold a, RNAalifold b
aHofacker (2003)bHofacker (2003)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 11 / 30
Precasted specific RNA Finder
tRNAscan-SE8
BRUCE9 (tmRNAs)
Bcheck10 (RNase P)
SRPRNA11
8Lowe & Eddy (1997)9Laslett et al. (2002)
10Yusuf et al. (in prep.)11Regalia et al. (2002)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 12 / 30
Precasted specific RNA Finder
tRNAscan-SE8
BRUCE9 (tmRNAs)
Bcheck10 (RNase P)
SRPRNA11
No query neededUsually for whole genomes
8Lowe & Eddy (1997)9Laslett et al. (2002)
10Yusuf et al. (in prep.)11Regalia et al. (2002)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 12 / 30
Structure Based Search Programs
Erpin12
Infernal13
U3 snoRNA Bitscore 123.10 [6,218]
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::((((−−−−−−−−−−−−−−((((((((((((,,,,AaGACcaUACUUUGAAcAGGAUCauUUCUAUAGgaUauuaCuauuaaauUuuaucuaaAAguAGacAagaaccuAAACCcgGAuGAuGAgauauggCcuugucgcCcGAGCAAGAC+ UACUUU AGGAUCAUUUCUAUAG+A A C+ +U ++U UU UC AAAG AGACAA C U AACC: GA GA GA +AU+:C:UU: ::CC:GAGCAAGACUGUACUUU−−CUAGGAUCAUUUCUAUAGUACACGUCCCGUCUUUCUUCUC−CAAAGAAGACAACCGCAUCAACCAUGAGGAGGAUUAAUAACGUUCUUUCCUGAGC
,,,,,,,,<<<<−−<<<<<<<<<_____>>>>>−>>>>−−>>>>,,,,,,,,,,,,<<<<<.−<<<<__...__>>>>−.>>>>>))))))))))))−−−−−−−))))GUGAaguagccgccgggcgcugCuUuuuGcagcugcccuucggcaUaGAUGAuCGUuCccg.cccccUu...uugggga.cggGagGgcgacaagGcugUCUGAcgGGG GAAG G C + :::: :U:C U UG:A: ::::U G CAU+GAUGA CGUUC:CG + ::CU+ G:: + CG:GA:GG:: :AA:G:++UCUGA :GGGGGAAGCGGGCGA−UAUUGUUCCAGUCUGGAAU−GAUAU−UGUCAUUGAUGACCGUUCUCGuUGUACUAuugCAGUAUUuCGGGAGGGAAGGAACGUAUUCUGAGUGG
Trichoplax adhaerens U3 snoRNA, bitscore 123.10.
12Gautheret & Lambert (2001)13Nawrocki et al (2009)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 13 / 30
Structure Based Search Programs
Erpin12
Infernal13
U3 snoRNA Bitscore 123.10 [6,218]
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::((((−−−−−−−−−−−−−−((((((((((((,,,,AaGACcaUACUUUGAAcAGGAUCauUUCUAUAGgaUauuaCuauuaaauUuuaucuaaAAguAGacAagaaccuAAACCcgGAuGAuGAgauauggCcuugucgcCcGAGCAAGAC+ UACUUU AGGAUCAUUUCUAUAG+A A C+ +U ++U UU UC AAAG AGACAA C U AACC: GA GA GA +AU+:C:UU: ::CC:GAGCAAGACUGUACUUU−−CUAGGAUCAUUUCUAUAGUACACGUCCCGUCUUUCUUCUC−CAAAGAAGACAACCGCAUCAACCAUGAGGAGGAUUAAUAACGUUCUUUCCUGAGC
,,,,,,,,<<<<−−<<<<<<<<<_____>>>>>−>>>>−−>>>>,,,,,,,,,,,,<<<<<.−<<<<__...__>>>>−.>>>>>))))))))))))−−−−−−−))))GUGAaguagccgccgggcgcugCuUuuuGcagcugcccuucggcaUaGAUGAuCGUuCccg.cccccUu...uugggga.cggGagGgcgacaagGcugUCUGAcgGGG GAAG G C + :::: :U:C U UG:A: ::::U G CAU+GAUGA CGUUC:CG + ::CU+ G:: + CG:GA:GG:: :AA:G:++UCUGA :GGGGGAAGCGGGCGA−UAUUGUUCCAGUCUGGAAU−GAUAU−UGUCAUUGAUGACCGUUCUCGuUGUACUAuugCAGUAUUuCGGGAGGGAAGGAACGUAUUCUGAGUGG
Trichoplax adhaerens U3 snoRNA, bitscore 123.10.
Query dependendNo information about structure as input
12Gautheret & Lambert (2001)13Nawrocki et al (2009)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 13 / 30
Support Vector Machines: SnoReport14
14Hertel & Stadler (2008)M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 14 / 30
Support Vector Machines: SnoReport14
MFE
z-score
GC-content
Box scores and distances
Stems and lengths
Loops and lengths
14Hertel & Stadler (2008)M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 14 / 30
Support Vector Machines: SnoReport14
MFE
z-score
GC-content
Box scores and distances
Stems and lengths
Loops and lengths
SVM
5. extractfeatures
HACA: SE=78% SP=89%CD: SE=87% SP=95%
Input:sequencessingle
2. truncate sequence
fold
4. check structure
reject
1. find and score motifs
putativeCD / HACA snoRNA
6. if (P > 0.5)
3. create constraint
scor
e >
thre
shol
d
Model:
other ncRNAs− HACA / CD snoRNAs
+ CD / HACA snoRNAs
mfe, z−score, GC−contentBox scores + distancesStem and loop length(s)
14Hertel & Stadler (2008)M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 14 / 30
Support Vector Machines: SnoReport14
MFE
z-score
GC-content
Box scores and distances
Stems and lengths
Loops and lengths
SVM
5. extractfeatures
HACA: SE=78% SP=89%CD: SE=87% SP=95%
Input:sequencessingle
2. truncate sequence
fold
4. check structure
reject
1. find and score motifs
putativeCD / HACA snoRNA
6. if (P > 0.5)
3. create constraint
scor
e >
thre
shol
d
Model:
other ncRNAs− HACA / CD snoRNAs
+ CD / HACA snoRNAs
mfe, z−score, GC−contentBox scores + distancesStem and loop length(s)
Input: single sequences; whole genomes possible
14Hertel & Stadler (2008)M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 14 / 30
Support Vector Machines: RNAmicro15
MFE
z-score
GC-content
best 23nt block
Stems and lengths
Loops and lengths
15Hertel et al. (2006)M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 15 / 30
Support Vector Machines: RNAmicro15
MFE
z-score
GC-content
best 23nt block
Stems and lengths
Loops and lengths
− other ncRNA alignments shuffled miRNA alignments
+ miRNA alignmentsInput:sequencesaligned
SVM2. extractfeatures
1. checkstructure
Model:
reject
SE=84% SP=99%alifold
putative miRNAprecursor
3. if (P > 0.5)
15Hertel et al. (2006)M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 15 / 30
Support Vector Machines: RNAmicro15
MFE
z-score
GC-content
best 23nt block
Stems and lengths
Loops and lengths
− other ncRNA alignments shuffled miRNA alignments
+ miRNA alignmentsInput:sequencesaligned
SVM2. extractfeatures
1. checkstructure
Model:
reject
SE=84% SP=99%alifold
putative miRNAprecursor
3. if (P > 0.5)
Input: multiple sequence alignments; multiple genomes also possible
15Hertel et al. (2006)M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 15 / 30
Support Vector Machines: RNAz16
SCI
Meanwise pairwise identity
Number of sequences
Average z-Score
16Washietl et al. (2005)M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 16 / 30
Support Vector Machines: RNAz16
SCI
Meanwise pairwise identity
Number of sequences
Average z-Score
u3sc01 sc03 mir5StRNA 1384 1249
Ciona intestinalis – known and new predicted ncRNAs by RNAz.
16Washietl et al. (2005)M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 16 / 30
Support Vector Machines: RNAz16
SCI
Meanwise pairwise identity
Number of sequences
Average z-Score
u3sc01 sc03 mir5StRNA 1384 1249
Ciona intestinalis – known and new predicted ncRNAs by RNAz.
Mainly alignment dependentMany false positives
16Washietl et al. (2005)M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 16 / 30
ci_558117 ***ci_555438 ***ci_554296 ci_557698 ci_555929 ***ci_554730 ***ci_555491 ***ci_554599 ***ci_556562 ***ci_555236 ***ci_554528 ***ci_555486 ci_557864
ci_556204 ***ci_556966 ***
ci_557168 ci_556973 ***ci_556971 ***ci_556968 ***ci_556955 ***ci_554931 ***ci_557471 ci_557305 ***ci_555637 ci_556275 ***ci_556105 ***ci_555312 ***ci_556276 ***ci_555555 ***ci_554842 ***ci_554683 ***ci_554678 ci_554324 ***ci_554354 ***ci_557087 ***ci_555122 ci_555447 ***ci_556560 ***ci_555756 ***ci_554903 ***ci_555970-5Sci_555994 ***ci_557058 ci_555492 ***ci_554321 ***ci_556663 ***ci_556021 ***ci_555550 ci_556949 ci_555833 ***ci_555828 ***ci_555456 ci_557837-sc19ci_555813 ***ci_554098 ***ci_554384 ***ci_555508 ci_554681
AGGG_CC_
AA
UAA
AA A
GUU
UC
GAAG
CUGC_
_GA
GG_
UUGCA
AC
CAAA
_C
ACCG__
U_CA
AC _ U A
UAUC
AGG
AAU
_G
UUGA
_U
AAUA__UC _A
_A_
__A
CA
_UCGC U
GC
UGC _ CA
AUG_AA
CAUCG
AUCCGA
CG
CAGGUU
CG
CAUG
CG
___UAUUG
AA
AC
UAUA
ACAC
alidot.ps
A G G G _ C C _ A A U A A A A A G U U U C G A A G C U G C _ _ G A G G _ U U G C A A C C A A A _ C A C C G _ _ U _ C A A C _ U A U A U C A G G A A U _ G U U G A _ U A A U A _ _ U C _ A _ A _ _ _ A C A _ U C G C U G C U G C _ C A A U G _ A A C A U C G A U C C G A
A G G G _ C C _ A A U A A A A A G U U U C G A A G C U G C _ _ G A G G _ U U G C A A C C A A A _ C A C C G _ _ U _ C A A C _ U A U A U C A G G A A U _ G U U G A _ U A A U A _ _ U C _ A _ A _ _ _ A C A _ U C G C U G C U G C _ C A A U G _ A A C A U C G A U C C G AAG
GG
_C
C_
AA
UA
AA
AA
GU
UU
CG
AA
GC
UG
C_
_G
AG
G_
UU
GC
AA
CC
AA
A_
CA
CC
G_
_U
_C
AA
C_
UA
UA
UC
AG
GA
AU
_G
UU
GA
_U
AA
UA
__
UC
_A
_A
__
_A
CA
_U
CG
CU
GC
UG
C_
CA
AU
G_
AA
CA
UC
GA
UC
CG
A
AG
GG
_C
C_
AA
UA
AA
AA
GU
UU
CG
AA
GC
UG
C_
_G
AG
G_
UU
GC
AA
CC
AA
A_
CA
CC
G_
_U
_C
AA
C_
UA
UA
UC
AG
GA
AU
_G
UU
GA
_U
AA
UA
__
UC
_A
_A
__
_A
CA
_U
CG
CU
GC
UG
C_
CA
AU
G_
AA
CA
UC
GA
UC
CG
A
cluster152 N=6 MPI=26.40 SCI=0.42
alidot.ps
_ _ U _ G _ G _ G _ A _ G A U _ G _ _ A _ G _ A U G A U G U A U G _ A U U _ U G G C _ C A U A U C A G U _ U U A _ U C _ U G U _ A U A A A A _ A A G A U G A A _ C U G U A G _ U U G C A _ A _ A A U U C C A _ A A U G C G U A _ _ _ U _ G U A C C A U A
_ _ U _ G _ G _ G _ A _ G A U _ G _ _ A _ G _ A U G A U G U A U G _ A U U _ U G G C _ C A U A U C A G U _ U U A _ U C _ U G U _ A U A A A A _ A A G A U G A A _ C U G U A G _ U U G C A _ A _ A A U U C C A _ A A U G C G U A _ _ _ U _ G U A C C A U A__
U_
G_
G_
G_
A_
GA
U_
G_
_A
_G
_A
UG
AU
GU
AU
G_
AU
U_
UG
GC
_C
AU
AU
CA
GU
_U
UA
_U
C_
UG
U_
AU
AA
AA
_A
AG
AU
GA
A_
CU
GU
AG
_U
UG
CA
_A
_A
AU
UC
CA
_A
AU
GC
GU
A_
__
U_
GU
AC
CA
UA
__
U_
G_
G_
G_
A_
GA
U_
G_
_A
_G
_A
UG
AU
GU
AU
G_
AU
U_
UG
GC
_C
AU
AU
CA
GU
_U
UA
_U
C_
UG
U_
AU
AA
AA
_A
AG
AU
GA
A_
CU
GU
AG
_U
UG
CA
_A
_A
AU
UC
CA
_A
AU
GC
GU
A_
__
U_
GU
AC
CA
UA
__U_G_G_
G_
A_GAU_G
__ A _ G _ A
UGAU
GUAU
G _ AUU
_U
GGC
_CA U
AU
CAG
U_
UUA_UC
_UG U
_A
U AA
AA
_A
AGAUGA
A_
CUGU
AG
_U
UGCA
_A_
AAUU
CCA_
AAUG
CGU A _
__U_GUAC
CA
UA
UAGU
GA
UA
AU
AA
UA
UAAU
A_
cluster107 N=12 MPI=21.10 SCI=0.29alidot.ps
G C U A U U C U U _ C A _ _ A U _ U U U U A C A _ U A G _ _ A U G _ G U U U U A U G _ G A _ C U G G C U A U U U A U A G A U A A _ A A G _ C U G _ G C _ U A U G _ A U G A A _ G U C A _ C G A A A _ _ U A A U G _ C _ _ G U C _ _ A _ C A _ _ _ U U G A
G C U A U U C U U _ C A _ _ A U _ U U U U A C A _ U A G _ _ A U G _ G U U U U A U G _ G A _ C U G G C U A U U U A U A G A U A A _ A A G _ C U G _ G C _ U A U G _ A U G A A _ G U C A _ C G A A A _ _ U A A U G _ C _ _ G U C _ _ A _ C A _ _ _ U U G AGC
UA
UU
CU
U_
CA
__
AU
_U
UU
UA
CA
_U
AG
__
AU
G_
GU
UU
UA
UG
_G
A_
CU
GG
CU
AU
UU
AU
AG
AU
AA
_A
AG
_C
UG
_G
C_
UA
UG
_A
UG
AA
_G
UC
A_
CG
AA
A_
_U
AA
UG
_C
__
GU
C_
_A
_C
A_
__
UU
GA
GC
UA
UU
CU
U_
CA
__
AU
_U
UU
UA
CA
_U
AG
__
AU
G_
GU
UU
UA
UG
_G
A_
CU
GG
CU
AU
UU
AU
AG
AU
AA
_A
AG
_C
UG
_G
C_
UA
UG
_A
UG
AA
_G
UC
A_
CG
AA
A_
_U
AA
UG
_C
__
GU
C_
_A
_C
A_
__
UU
GA
GCUA
UU
CUU_CA
__
A U _ UUUUACA
_UAG_
_AUG
_ GUUUUAUG
_GA_
CUG
GCUA
UUUA
U A G AUAA
_AAG _
CUG
_GC
_UAUG_AUGA A _
GUC
A_C
GA
AA__UAAUG _ C _ _
GU
C__A_
CA
__
_UUGA
GUUUAUAUUAACAA
GUCA
AGGUUUAUGUUA
UG
CGGA
GGACAU
cluster127 N=13 MPI=21.34 SCI=0.18
AGU__ A
UG_UG_UAUCUAUGAA
UAU
AUUCAUU
GAACCUC
AUUACU
UAG
CU_
_AG
C C A UC_G
CUA
GA
UGUGA
_GAAGGAUC
CAUGGGUA
CUAAUCUAAA
AAAAUAAAU
A_A
AU
AU
AUACAUUA
GU
CU
UA
GC
GU
alidot.ps
A G U _ _ A U G _ U G _ U A U C U A U G A A U A U A U U C A U U G A A C C U C A U U A C U U A G C U _ _ A G C C A U C _ G C U A G A U G U G A _ G A A G G A U C C A U G G G U A C U A A U C U A A A A A A A U A A A U A _ A
A G U _ _ A U G _ U G _ U A U C U A U G A A U A U A U U C A U U G A A C C U C A U U A C U U A G C U _ _ A G C C A U C _ G C U A G A U G U G A _ G A A G G A U C C A U G G G U A C U A A U C U A A A A A A A U A A A U A _ AAG
U_
_A
UG
_U
G_
UA
UC
UA
UG
AA
UA
UA
UU
CA
UU
GA
AC
CU
CA
UU
AC
UU
AG
CU
__
AG
CC
AU
C_
GC
UA
GA
UG
UG
A_
GA
AG
GA
UC
CA
UG
GG
UA
CU
AA
UC
UA
AA
AA
AA
UA
AA
UA
_A
AG
U_
_A
UG
_U
G_
UA
UC
UA
UG
AA
UA
UA
UU
CA
UU
GA
AC
CU
CA
UU
AC
UU
AG
CU
__
AG
CC
AU
C_
GC
UA
GA
UG
UG
A_
GA
AG
GA
UC
CA
UG
GG
UA
CU
AA
UC
UA
AA
AA
AA
UA
AA
UA
_A
cluster144 N=4 MPI=28.11 SCI=0.87
alidot.ps
C U A A A U U _ U U G U U U U A U U _ _ U U _ A G U U U U C C C U G A A A A U U G _ U G A U U C A U U U A A U G G C C C U C A C U C A A U U G A U U G U C U C A U C _ _ A C A A U _ C G G G A _ A U G A _ _ U U _ G G U U G U A A A G U A A A A G G U C U U G G A
C U A A A U U _ U U G U U U U A U U _ _ U U _ A G U U U U C C C U G A A A A U U G _ U G A U U C A U U U A A U G G C C C U C A C U C A A U U G A U U G U C U C A U C _ _ A C A A U _ C G G G A _ A U G A _ _ U U _ G G U U G U A A A G U A A A A G G U C U U G G ACU
AA
AU
U_
UU
GU
UU
UA
UU
__
UU
_A
GU
UU
UC
CC
UG
AA
AA
UU
G_
UG
AU
UC
AU
UU
AA
UG
GC
CC
UC
AC
UC
AA
UU
GA
UU
GU
CU
CA
UC
__
AC
AA
U_
CG
GG
A_
AU
GA
__
UU
_G
GU
UG
UA
AA
GU
AA
AA
GG
UC
UU
GG
A
CU
AA
AU
U_
UU
GU
UU
UA
UU
__
UU
_A
GU
UU
UC
CC
UG
AA
AA
UU
G_
UG
AU
UC
AU
UU
AA
UG
GC
CC
UC
AC
UC
AA
UU
GA
UU
GU
CU
CA
UC
__
AC
AA
U_
CG
GG
A_
AU
GA
__
UU
_G
GU
UG
UA
AA
GU
AA
AA
GG
UC
UU
GG
A
CUAA
A UU
_UUGUUUUAUU
__
UU
_AGUUUUCCCUGA
AAAUUG
_U
GAU
UCA
UUUA
AU
GGC
CC
U C ACU
CA
AU
UGA
UUGUC
UCA
UC _
_A
CAAU
_CGGGA_AUGA__UU
_GG
UU
GUAAAGUAA
AAG G
UCU
UGGA
GU
AUUG
AU
UAGU
GU
AUACGCGCGU
AUAU
CGUA
GC
cluster115 N=9 MPI=42.30 SCI=0.71alidot.ps
U G U A A G G _ A U G G G _ _ G U U _ C C A G U G _ U U U U G G C U A A C G G _ A A U U A C _ A U G U G _ U _ U G U A A U A C _ A U G A A A _ _ _ U U C A G _ U A G U _ C A G _ _ A U A U U G _ U U A C C _ C U U _ U A C U _ U G U A C U
U G U A A G G _ A U G G G _ _ G U U _ C C A G U G _ U U U U G G C U A A C G G _ A A U U A C _ A U G U G _ U _ U G U A A U A C _ A U G A A A _ _ _ U U C A G _ U A G U _ C A G _ _ A U A U U G _ U U A C C _ C U U _ U A C U _ U G U A C UUG
UA
AG
G_
AU
GG
G_
_G
UU
_C
CA
GU
G_
UU
UU
GG
CU
AA
CG
G_
AA
UU
AC
_A
UG
UG
_U
_U
GU
AA
UA
C_
AU
GA
AA
__
_U
UC
AG
_U
AG
U_
CA
G_
_A
UA
UU
G_
UU
AC
C_
CU
U_
UA
CU
_U
GU
AC
U
UG
UA
AG
G_
AU
GG
G_
_G
UU
_C
CA
GU
G_
UU
UU
GG
CU
AA
CG
G_
AA
UU
AC
_A
UG
UG
_U
_U
GU
AA
UA
C_
AU
GA
AA
__
_U
UC
AG
_U
AG
U_
CA
G_
_A
UA
UU
G_
UU
AC
C_
CU
U_
UA
CU
_U
GU
AC
U
UGUAAGG
_A
UGGG
__
GUU
_C
CAGUG
_UU
UUG
GC
UAACGG_AAU
UA
C_A
U GUG_
U_
U G UAA
UAC _ A
UGAAA
___U
UCAG_UAG
U_
CAG _
_A
UAUUG _
UU
AC
C_
CUU
_U
ACU_UGUAC
U
AUUG
UA
CG
UUUGCG
CG
UGAUUG
GUAUCG
AUUA_AGC
CGUA
CGAU
cluster134 N=8 MPI=22.71 SCI=0.39alidot.ps
_ A G U U G A C C _ _ _ A A _ U A U A A C U _ _ C G _ G _ U A _ G G G U U C G C _ A G C _ C A U G C C A G _ G G U U U A U C A _ C C A A G G _ A A C A U G G C U G C G A A G _ _ C C A _ G C C G G G _ A A A C A A U A G G U C C _ G _ A U U U
_ A G U U G A C C _ _ _ A A _ U A U A A C U _ _ C G _ G _ U A _ G G G U U C G C _ A G C _ C A U G C C A G _ G G U U U A U C A _ C C A A G G _ A A C A U G G C U G C G A A G _ _ C C A _ G C C G G G _ A A A C A A U A G G U C C _ G _ A U U U_A
GU
UG
AC
C_
__
AA
_U
AU
AA
CU
__
CG
_G
_U
A_
GG
GU
UC
GC
_A
GC
_C
AU
GC
CA
G_
GG
UU
UA
UC
A_
CC
AA
GG
_A
AC
AU
GG
CU
GC
GA
AG
__
CC
A_
GC
CG
GG
_A
AA
CA
AU
AG
GU
CC
_G
_A
UU
U
_A
GU
UG
AC
C_
__
AA
_U
AU
AA
CU
__
CG
_G
_U
A_
GG
GU
UC
GC
_A
GC
_C
AU
GC
CA
G_
GG
UU
UA
UC
A_
CC
AA
GG
_A
AC
AU
GG
CU
GC
GA
AG
__
CC
A_
GC
CG
GG
_A
AA
CA
AU
AG
GU
CC
_G
_A
UU
U
_AGUUG
ACC__
_A A
_UA
UAACU
__
CG_G_U
A_ G G G
UUCGC_AGC_CAUGC
CA
G_GG
UU
U A U C A _C
CAAG
G_
AACA
UGGCUGCGAAG _ _
CC
A_GC
CG
GG_A
AA
CAAU A
GGUCC
_G_A
UUUU_
UAA_
CA
AA
UA
UA
cluster139 N=6 MPI=25.90 SCI=0.34
0.1
cluster152
cluster144
cluster139
cluster134
cluster127
cluster115
cluster107
mir−7 candidate
mir−126 candidate
let−7
mir−124−b
mir−124−a
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 17 / 30
Hand made secondary structures
17Eddy (1992)18Mosig et al. (2006)19Macke et al. (2001)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 18 / 30
Hand made secondary structures
rnabob 17, Fragrep 18, RNAmotif 19, Vienna RNA Package
17Eddy (1992)18Mosig et al. (2006)19Macke et al. (2001)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 18 / 30
Hand made secondary structures
rnabob 17, Fragrep 18, RNAmotif 19, Vienna RNA Package
Tetrapoda CCCTCCCGAAGCTGCGC----------GCTCGG-TCGTeleostei CCCTCCCGAAGCYCRGC----------GCTCGG-TGGMustelus CCCTCCCGAAGCTCAGC----------GCTCGG-TCGLampetra CCCTCCCGATGCTCTGC----------GCTCGG-TGGMyxine CCTCGCCGATGCCCCGC----------GCTCGGATCGBranchiostoma CTCTCCCGACGCCTCGC----------GCTCGG-TCGCiona intest. ---TCCCGATGCTTGCG---------CGCTCGG-TTGCiona savignyi ----CCCGATGCCATGC----------GCTCGG-TCGSaccoglossus CTCTCCCGATGCTTAGC----------GCTCGG-TCGLottia -TCTCCCGCTGCCTCGTC---------GCACGG-TAGHelix ---TCCCGCTGCACCCCCGGGGA---CGCACGG-TCGAplysia AGCTCTCGATGCACTGGCGGGTC----GCACGG-TCGCapitella AGGCGCCGATGCACCCGTCGAGGGCCCGCTCGG-CCGHelobdella GCAACGGCATGCACTTCCACCTGTC--GCTGGC-CAGSTRUCTURE -----<<<<-<<--------------->>>>>>----
Mammalia TCCAAATGAGGCGCTGC-ATGTG-GCAGTCTGCCTTTCTTTGallus TCCAAGTGAGGCACTGC-ATGGG-GCAGTCTGCCATTGTTTAnolis TCCAAGTCAGGCGCTGC-ACGGG-GCAGTCTGCCATTCTTTXenopus TCCAAGTGTGGCGCTGC-ATGTG-GCAGTGTGCCTTTCTTTOryzias TCCAACTGCGGCGCTGC-ACGTG-GCAGTCTGCCTTCCTTTGasterosteus TCCAAATGAGGCGCTGC-ACGTG-GCAGTCTGCCTTCCTTTFugu TCCAATTGCGGCGCTGC-ACGTG-GCAGTCTGCCTTACTTTTetraodon TCCAATTGCGGCGCTGC-ACGTG-GCAGTCTGCCTTCCTTTDanio TCCAAATGAGGCACTGC-ATGTG-GCAGTCTGCCTTTCTTTGadus TCCAAATGAGGCGCTGC-ACGTG-GCAGTCTGCCGTAATTTMustelus TCCAAGTCAGGCACTGC-ACGTG-GCAGTCTGCCGTTCTTTLampetra TCCAGATC-GGCACTGC-ACGTG-GCAGTCTGCCTGT-TTTPetromyzon TCCAGATC-GGCGCTGC-ACGTG-GCAGTTCGCCTGT-TTTMyxine TCCAAC-ACGGCGCTGC-ACGTG-GCAGTTTGCCTT--GTTCiona_int TCCATA-TAGGCACTGC-ACGGG-GCAGTATGCCTTCATTTCiona_sav TCCATA-TAGGCACTGC-ACGGG-GCAGTATGCCTTCATTTBranchiostoma_l TCCAAT-ACGGCGCTGCCACGCGGGCAGCCTGCCAT---TTBranchiostoma_f TCCAAT-ACGGCGCTGCCACGCAGGCGGCCTGCCATT-TTTSaccoglossus TCCATC-ATGGCGCTGCCTTG-GGGTAGCTTGCCTTCACTTLottia TCCAAT-ACGGCACTAC-AAGTG-GTAGTTTGCCTTCCTTTHelix TCCATTGGAGGCATTAC-ACGTG-GTAATCTGCCTTTCTTTCapitella TCCACA-CTGGCACCGC-ATGTG-GTGGTATGCCATTGTTTSTRUCTURE ---------<<<<<<<<<----->>>>>>-->>>-------
UG
AGGC
GCUGCc
AC
GUG
gGCAGU C
UGCCU
UUCU
UU3’STEM
basalDeuterostomes
Lophotrochozoa
Vertebrate
5’STEM
1 300100 200−100
STEM A
STEM B (Vertebrata only)
5’STEM STEM A STEM B 3’STEMVertebrata
basal Deuterostomes
Lophotrochozoa
PSE
PSE
PSE
TATA
TATA
TATA
(a)
(b)
(c)
(d)
(e)
17Eddy (1992)18Mosig et al. (2006)19Macke et al. (2001)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 18 / 30
Hand made secondary structures
rnabob 17, Fragrep 18, RNAmotif 19, Vienna RNA Package
Tetrapoda CCCTCCCGAAGCTGCGC----------GCTCGG-TCGTeleostei CCCTCCCGAAGCYCRGC----------GCTCGG-TGGMustelus CCCTCCCGAAGCTCAGC----------GCTCGG-TCGLampetra CCCTCCCGATGCTCTGC----------GCTCGG-TGGMyxine CCTCGCCGATGCCCCGC----------GCTCGGATCGBranchiostoma CTCTCCCGACGCCTCGC----------GCTCGG-TCGCiona intest. ---TCCCGATGCTTGCG---------CGCTCGG-TTGCiona savignyi ----CCCGATGCCATGC----------GCTCGG-TCGSaccoglossus CTCTCCCGATGCTTAGC----------GCTCGG-TCGLottia -TCTCCCGCTGCCTCGTC---------GCACGG-TAGHelix ---TCCCGCTGCACCCCCGGGGA---CGCACGG-TCGAplysia AGCTCTCGATGCACTGGCGGGTC----GCACGG-TCGCapitella AGGCGCCGATGCACCCGTCGAGGGCCCGCTCGG-CCGHelobdella GCAACGGCATGCACTTCCACCTGTC--GCTGGC-CAGSTRUCTURE -----<<<<-<<--------------->>>>>>----
Mammalia TCCAAATGAGGCGCTGC-ATGTG-GCAGTCTGCCTTTCTTTGallus TCCAAGTGAGGCACTGC-ATGGG-GCAGTCTGCCATTGTTTAnolis TCCAAGTCAGGCGCTGC-ACGGG-GCAGTCTGCCATTCTTTXenopus TCCAAGTGTGGCGCTGC-ATGTG-GCAGTGTGCCTTTCTTTOryzias TCCAACTGCGGCGCTGC-ACGTG-GCAGTCTGCCTTCCTTTGasterosteus TCCAAATGAGGCGCTGC-ACGTG-GCAGTCTGCCTTCCTTTFugu TCCAATTGCGGCGCTGC-ACGTG-GCAGTCTGCCTTACTTTTetraodon TCCAATTGCGGCGCTGC-ACGTG-GCAGTCTGCCTTCCTTTDanio TCCAAATGAGGCACTGC-ATGTG-GCAGTCTGCCTTTCTTTGadus TCCAAATGAGGCGCTGC-ACGTG-GCAGTCTGCCGTAATTTMustelus TCCAAGTCAGGCACTGC-ACGTG-GCAGTCTGCCGTTCTTTLampetra TCCAGATC-GGCACTGC-ACGTG-GCAGTCTGCCTGT-TTTPetromyzon TCCAGATC-GGCGCTGC-ACGTG-GCAGTTCGCCTGT-TTTMyxine TCCAAC-ACGGCGCTGC-ACGTG-GCAGTTTGCCTT--GTTCiona_int TCCATA-TAGGCACTGC-ACGGG-GCAGTATGCCTTCATTTCiona_sav TCCATA-TAGGCACTGC-ACGGG-GCAGTATGCCTTCATTTBranchiostoma_l TCCAAT-ACGGCGCTGCCACGCGGGCAGCCTGCCAT---TTBranchiostoma_f TCCAAT-ACGGCGCTGCCACGCAGGCGGCCTGCCATT-TTTSaccoglossus TCCATC-ATGGCGCTGCCTTG-GGGTAGCTTGCCTTCACTTLottia TCCAAT-ACGGCACTAC-AAGTG-GTAGTTTGCCTTCCTTTHelix TCCATTGGAGGCATTAC-ACGTG-GTAATCTGCCTTTCTTTCapitella TCCACA-CTGGCACCGC-ATGTG-GTGGTATGCCATTGTTTSTRUCTURE ---------<<<<<<<<<----->>>>>>-->>>-------
UG
AGGC
GCUGCc
AC
GUG
gGCAGU C
UGCCU
UUCU
UU3’STEM
basalDeuterostomes
Lophotrochozoa
Vertebrate
5’STEM
1 300100 200−100
STEM A
STEM B (Vertebrata only)
5’STEM STEM A STEM B 3’STEMVertebrata
basal Deuterostomes
Lophotrochozoa
PSE
PSE
PSE
TATA
TATA
TATA
(a)
(b)
(c)
(d)
(e)
Structure information necessary
17Eddy (1992)18Mosig et al. (2006)19Macke et al. (2001)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 18 / 30
7SK RNA
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 19 / 30
7SK RNA
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 19 / 30
7SK RNA
M1
M2c
M2b
M5
M4 M6
M2a
M8
M7
expansion domains
M3
Meta.Ins.Deut.Vert.Element
oldnew
M1
M2b
M2c
M3
M4
M5
M6
M7
M8 6
5
3
1
M2a
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 20 / 30
7SK RNA
M1
M2c
M2b
M5
M4 M6
M2a
M8
M7
expansion domains
M3
Meta.Ins.Deut.Vert.Element
oldnew
M1
M2b
M2c
M3
M4
M5
M6
M7
M8 6
5
3
1
M2a
ACC
UU
AUC
CUAGU
CGG
GCC
A CUG
GG
UAGUU
G
UGG
CCGA
AGC
UGCG
CGC
UCGG
GUCCC
CUC
CGU
CG C
GAA
CGA
GGGAU
UCCGU
CU
CAUGU
GGCAG
AGGC
GCUG
GGA
UGUGAGG
CG
GA
GG
UCU
GA
CUGC
CAUCUGUC
ACC
CUG
GCU
AGGCG
CUGUG
CCCU
UC
CUCCCU
C
ACC
GCU
CCAUGUGCGU
C GG
CCUC G
AGGA
AUAGCCCCUAC
AC
CGAGGA
GAAGCU
ACCGGUCUUCGGUCAAGGGUAUACGAGUA
G U
U
C GA
CA
AC
AG
AA
UCC
GGAA
AA UC GAA
UU
G UCU
AC
C CU UC GA GA
UUCCAA AG CUCCAGACACAUCCAAA
UG UU
CUUU
2
3
4
G
5
6
1
U
C G
U
GGU
CCAU
UGAUC
CUAGU
CGG
GCC
A CUG
GG AGUU
CCGA
AGC
UGCG
CGC
UCGG
AGGC CG U CG
GCAGCU
UGCCU
AG
UG
U
AGCUGC
CU CC CCUGC
CU GAAC
GAUGGGAUG
GGAUGUG C
ACAUCCAAA GU
UUCUUU
AGGGC
GCU
G
GU
A
CU
GCGACAUC
C
CUGUGGG
U
C
AGGU
GUCAC
C
CGGUC
CCUUCCUCCCUCAC
CU
C
UAGAGGAGG
ACCGG
UC
UCGG
CU
U
AAG
UGAG
GG
CAUA
U
A
AGC
CCC
U
A G A C C U C A G A A C UC
ACC
UG
M1
M2a
M3
M5
M6
M7
M8
M2c
M2b
M4UGC
GUCC
GCAGGA
GAU
CAGC
AC
G
U
UAGA
ACCU
CAAG
AGGA
CCAAA
U UUUG
CAA
CCAUGG
GCUCUC
AA
CU
CG
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 20 / 30
The 7SK-Automaton
1 4
5
3
8 9 10
6 7
GAUC
GAUC
GAUC
GAUC
GAUC
"M5"
"M5"
polyUd=7−30nt
d=7−30nt
d=7−30nt
GAUCd=1−6nt ||d>30nt
GAUCd=1−6nt ||d>30nt
GAUCd=1−6nt ||d>30nt
2
d>30ntd=1−6nt ||GAUC
d=7−30ntGAUC
"M5"
polyU
"M5"
polyU
GAUC
polyU
7SK?
no 7SK
GAUC
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 21 / 30
7SK RNA in Caenorhabditis
M2b
M2c
GAU
ACGU
C
ACUGAAU U
UCGG
GCGA
UGAUC
G
GUUG
A
A
G CACUU
U
M1
AU
A
M3
UGAA
UU
GUGAUU
A
AUC
M5
U
GGGU
UAAC
UCUC
UA
GCACGGC
GA
UGGG C
C
G
U
AA
AU
CGCA
A
GA
C
UC
UUA G C C C G
M2a
CC GA UUUUU
G
CG
A
Ce Hs nt
1 2
- 163
- 143
- 116
- U1
- U4
- U5
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 22 / 30
RNA:RNA interaction
snoRNA Targets: snoplexa,snoScanb, SnoGPSc
aTafer et al. (in prep)bLowe & Eddy (1999)cSchattner et al. (2005)
miRNA Targets: PicTara,RNAhybridb, miRandac, ...
aKrek et al. (2005)bRehmsmeier et al. (2004)cBetel et al. (2008)
20Muckstein et al (2006)21Hofacker (2003)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 23 / 30
RNA:RNA interaction
snoRNA Targets: snoplexa,snoScanb, SnoGPSc
aTafer et al. (in prep)bLowe & Eddy (1999)cSchattner et al. (2005)
miRNA Targets: PicTara,RNAhybridb, miRandac, ...
aKrek et al. (2005)bRehmsmeier et al. (2004)cBetel et al. (2008)
Generally: RNAup 20, RNAduplex 21, RNAcofold 21, RIP
C
AGUUUGCGCAG
UGGCAGUAU
CG
UAGC
CAAUGA
G
G
G
U
U
G
U
C
U
U
A
C
U
G
C
C
CG
U
A
U
G
C
G
G
C
G
G
C
C
A
G
G
A
C
U
A
U
C
A
A
U
U
U
A
G
U
C
A
U
C
A
U
A
A
U
A
U
A
G
A
A
U
A
U
A
G
A
G
C
A
U
A
U
C
U
GUA
C
U
C
A
C
C
A
A
A
G
U
A
A
CC
G
CC
A
G
A
C
G
C
A
A
U
U
U
G
A
A
G
C
C
G
A
A
U
C
GG
U
C
U
C
G
C
A
C
A
U
A
G
U
U
A
G
U
C
A
A
G
A
U
G
C
G
G
A
G
U
C
G
A
A
U
C
U
A
G
C
G
G
C
C
A
A
A
A
U
A
U
U
U
U
U
C
U
G
G
U
A
G
C
A
A
A
G
G
U
C
C
G
UU
C UUC
A
C
C
A
G
U
G
A
A
U
G
U
A
U
C
U
UUG
U6
U4
20Muckstein et al (2006)21Hofacker (2003)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 23 / 30
ncRNA challenges in silico
Target prediction
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 24 / 30
ncRNA challenges in silico
Target prediction
Secondary structure prediction of highly divergent ncRNAs
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 24 / 30
ncRNA challenges in silico
Target prediction
Secondary structure prediction of highly divergent ncRNAs
Pseudoknot prediction
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 24 / 30
ncRNA challenges in silico
Target prediction
Secondary structure prediction of highly divergent ncRNAs
Pseudoknot prediction
3-dimensional structure prediction
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 24 / 30
ncRNA challenges in silico
Target prediction
Secondary structure prediction of highly divergent ncRNAs
Pseudoknot prediction
3-dimensional structure prediction
RNA:Protein interaction
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 24 / 30
Target Prediction: U7 RNA
processing of 3’ end of histones
smallest RNA polymersae-II transcipt known to-date: 57-70nt
one stem only, many highly conserved sequences (Sm, HDE-rev-comp)
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 25 / 30
Target Prediction: U7 RNA
processing of 3’ end of histones
smallest RNA polymersae-II transcipt known to-date: 57-70nt
one stem only, many highly conserved sequences (Sm, HDE-rev-comp)
TCCCGG
AGGGCC
T TTT A TT C A A T TCGT
TTC
TAAT
TG
CA
GGG
GTTA
TT
TT
T G A A
CA A
GT
A CG C A A A
TT T
T
33ntC T A A A G A C T G A T
CT T T C T A T T T A3’
5’
5’
3’
Sm
HDE
Stem loop
Histone H3
U7
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 25 / 30
Target Prediction: SL-Smy System in Nematodes?
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 26 / 30
Complex Secondary Structure Prediction: 7SK RNA
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 27 / 30
Complex Secondary Structure Prediction: 7SK RNA
Homo/1331 ............................CGTCC.CTC.CCGAAGC....................TGC.................GCGC.TCGGTCG...................................................................................................Mus/1331/133 ............................CGTCC.CTC.CCGAAGC....................TGC.................GCGC.TCGGTCG...................................................................................................Anolis/1341 ............................CGTCC.CTC.CCGAAGC....................TGC.................GCGC.TCGGTGG...................................................................................................Xenopus/1330 ............................CGTCC.CTC.CCGAAGC....................TGC.................GCGC.TCGGTCG...................................................................................................Danio/1300/1 ............................TGTCC.CTC.CCGAAGC....................TCC.................GCGC.TCGGTGG...................................................................................................B_lanceolatu ..............................GGCTCTC.CCGACGC....................CTC.................GCGC.TCGGTC....................................................................................................Ciona_intes ...............................GTTCTC.CCGATGC...................TTGC.................GCGC.TCGGTT....................................................................................................Culex/1329 .............TCTGGTATCA.CGGGTGA..ACTC.TCGCTGC.ACGGC..........GCCGG............GCCGA..ACGCA.CGATT....................................................................................................Nasonia/1304 .........................GTGCTC.GGCTC.CCGATGC..GCCT.........ACAAACCG..........AGGC...CTGTCTCG.TG....................................................................................................Pediculus/12 .....................AGTTCA.GGGAC.CTC.CCGATGC...............TACAAAT...................CGCA.CGGTG....................................................................................................Capitella/12 .........................CATTGCAAGGCG.CCGATGC..ACCC..........GTCGA............GGG...CCCGC.TCGGCCGC..................................................................................................Platynereis ..........................GTCTGTCCCTC.CCGTTGC................CTCAGC...................CGCA.CGGTC....................................................................................................Myxine/1300 ..........................TC.CGGCC.TCGCCGATGC................CCCG.....................CGC.TCGGATC...................................................................................................Lottia/1277 ............................GGGT.C.TC.CCGCTGC................CTCGT....................CGCA.CGGTA....................................................................................................Helix/1303 ...........................AGTTGAGCTC.CCGCTGC..ACCC...........CCG.............GGG....ACGCA.CGGTC....................................................................................................Mytilus_gall ............................ATGGAACTC.CCGCTGC.................CTTGT...................CGCA.CGGTT....................................................................................................Helobdella ........................GCACTTCCACCTG.TCGCTGGCCAGCAGCAGCAACAAGAACCTGTTCCACGACCCCTCCGACAGCAGCGG......................................................................................................Petrolisthes .......................CTCTTGC.GGGCTC.CCGCTGC.................CTTGC...................CGCA.CGGT.....................................................................................................dmoj_scaffol TGGCATTGATGTGGCAAC.ACGTTC.TGATTGGCTTT.CCGCTGCCTTT.GCTAA.CGACGACGG....GTCGATTAG.CAACAGACGCA.CGGTCATGCATCAGC.A.CCACCCACCGCCCAACCTCCGCCCCTCTCACGCGTATTTCAACCGCTTCTGGTTGAGGATGCGT.GTATAGGTAACGGGTT.GGGCGdmel_3R_3300 TGGCGTTGCCGTGGCT.CCTCGTT.CGGATCGGCTTT.CCGCTGCCTTCCACTGGATGACGACGG....GTTATCCGGCGGTC.GACGCA.CGGTCATGCACCCCCGATCCGTC....GCCCCCACCACCCC........GCGGATTCTGGT.......CTCG.ACCGGAAGCCGTATTGGG..CGGGGACGGGCG#=GC SS_cons (((((((((((((((((((((((((((((((((((((.(((((((.((((((((((((((.........))))))))))))))....))))))))...((((..(((((.........((((.((((.(((.........((((.((((((((..........)))))))).))))...)))...))))..))))))))).......)))).#=GC SS_cons |----------------M4-----------------|.|-----------------------M5------------------------------|...|--------------------------------------------------M5drosohophila-expansion-------------------------------------|.
Homo/1331 AAGAGGACG..............ACCATCCCCG.ATAGAGGA................GGACCGGTCT......TCGGTC............AAGGGTATACGAGTAGCTGCGCTCCCCTGCT.AGAACCTCCAAACAAGCT....CTCAA..GGTCCATTTGTAGGAG.AACGTAGGGTAGTCAAGCT.......Mus/1331/133 AAGAGGACG..............ACCTTCCCCGAATAGAGGA................GGACCGGTCT......TCGGTC............AAGGGTATACGAGTAGCTGCGCTCCCCTGCT.AGAACCTCCAAACAAGCT....CTCAA..GGTCCA.TTGTAGGAG.AACGTAGGGTAGTCAAGCT.......Anolis/1341 AAGAGGACG..............ACGTCCCAGGTATAGAAGGAGTGT.........accgaggtctcca.....gTCTTCGGT........CCCGGGTATACGA.TAGCTGCGCTCCCCTGCT.AGAACCTCCAAACAAGCT......CAA..GGTCCATTTGTAGGAG.AACGTAGGGTAGTCAAGCT.......Xenopus/1330 AAGAGGATG..............GC.TGTCCCCGGTAGAGAAGC................ACCGATCT......TCGGTC............AAGGGTATACGAGTAGCTGCGCTCCCCTGCT.AGAACCTCCAAACAAGCT.....CCAA.GGCCCCA.TTGTAGGAGAGACGTAGGGTAGTCAAGCT.......Danio/1300/1 AAGAGGACG..............AGtttCCC.........................CCGGCGG..ACacGAGCA..TCGCTGG..............TATAGAAGTAGCTGCGCTCCCCTGCT.AGAACCTCCAAACAAGCT......CAA.GGCaaCATTTGTAGGCGAAACGTAGGGAAGTCGAGCT.......B_lanceolatu GAGAGTC...................................TACCT..CCTCCCCG.AGTCA.ACCCCC....TGTGATTGCCGAAAGGTTGGGTGAAAAGCGTAGCTGCAGCCC...CTGATGTTCTCCACTGC............TAG........GGTCA.GAGAGCGTCGTGTCGAGC.GCAGC.......Ciona_intes GAGAAC.................GAGAATGAACCCCCTC...................................................................GGATGCTCG.CGTGGA.TTAGAGATTAAAGTAGGAGT....AACTCGCCCCCACTT.AATTCT....TCCCCTTCGG.CATCT....CAGCulex/1329 GATG..TCATTCG.TGATACAAGA..CGCTGCCCAG................ACCCAACTATTT.CTCA......AAATTGTTGAGT..............ATATCGTAAT.TTAATACAGATAGC....................TTA....................GCT.TCGG.ATTAAAATTAC.......Nasonia/1304 .GCCCCTGGCAC............................................CTGTAGGCCCGCAC....GGTC.G.AG..........TCTTCACGTCGCTCCTCGAACT.ACC...GCGATT.TCC...........AAATTGGG............GGGCAATCGAATAGGTCAGA.CGAGG.......Pediculus/12 GATGGTCCCGAGGACT..........................................................................................CCTCGATTGCC..GCGATT.....................CCA.....................AATTGTTAGGCG..TGAGG.......Capitella/12 CGCCTT.CAATG................CACACAT.............GGTTCCTTG..TGAGCCGATTG....GGTTTAAACAAG.AGCA.............AGGTAATTCTGGATTATTAGT...................TAAC......................GCTAATGGG.TAGGGTTAC......APlatynereis GAGGGA..GGC.................CAAATTCTA...........GCTTCATTAGCT..GCTCATG.....GGTATGGGTA.TAAAGT.............AGCCTAGCTTCTTAACT......................GACTTA.........................AG..GGGAAGTTGGG....GGAMyxine/1300 GACGGCCG....................AGAGGCTCA............CCGCACGCACCACGCTCAC......GGCTACG.GCGCACGG.......GTTTAACCACGGAGC.TGCGAGTACCCACTTA.GACCAAACCCCG...GAGA....CGGCG.ACGGCGATAAGAGGG...AAGCACGCTCTG.......Lottia/1277 GA.GGCTC......................................ATTCTAAATTGGT.CGCTCTCCC.....GAGTGCACCG...TAGGGT.......TTACATGTT....CACTGGTCCTGTCT................AAATTCAA.................AGGTAGGGTTA......TAAC......THelix/1303 GAGCTCTGTT..................CAA.............AATTGC..TGGTCGTAGAGTTTGCAT....GGCTCGGCGGCCAATGGGGTT..........ATTTCTG..TTAGGGCTTCTCT..CTAT...........TTCCGC............GTAAC.GGGGAAGTTCGTTTTCAGAAA.......Mytilus_gall GAGATCTGTTT.................AATT..........................TGGCCTCTCGTT....TAGGCCG.........................TGGGTATAAAAAGTTAACATTCGACTTCTAAAGTTTCTAG..................................................Helobdella CA..TGGGCCTGC...............CGAAA..................CTGGC.CTACC.....ACT........TCGCCGCCAG.................CTTGCTGGCTCACGGC........................CCAAC........................GCTGTGTGCCTTCG........Petrolisthes GAGCCCTGC.TGGG..............TTCCTCT.................CT.CGGGCTGTGGTTGT.....CCTCT.CCCGTAGCAT...............CTGTCCTCATGCTAGCCTTG....................GGTAA....................CAGGG..GGTGT..TGATA.......dmoj_scaffol GAAGCCAA....CAACA..GTTGCCCAAGT.CAGCCATTTTC................................................................AAAATTTCTTGGTTAAGTAAC...................TTT...................GTAGCTTAGCTT.CGGATTTTCGTAATAdmel_3R_3300 GAAGCCGG....CGAC.AG.TTGCCCGAGT.CAGCCACTTTC................................................................AAAATTTGTTGGTTAAGTAAC...................TTA...................GTAGCTTAGCTT.CGGATTTTCGTAACA#=GC SS_cons .)))))))))))))))))))))))))))))))))))).....(((((((((((((((((((((((.........))))))))))))))))))))))).........(((((((((((((((((((((((((((((((((((((.........)))))))))))))))))))))))))))))))))))))......................#=GC SS_cons .|-----------------M4’--------------|.....|--------------------------M6-------------------------|.........|----------------------------------------M7---------------------------------------|......................
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 27 / 30
Pseudoknot: Telomerase RNAReplication of chromosomal ends
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 28 / 30
Pseudoknot: Telomerase RNAReplication of chromosomal ends
Leads to cancer
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 28 / 30
Pseudoknot: Telomerase RNAReplication of chromosomal ends
Leads to cancer
Telomerase Enzym: Telomerase RNA andTERT
RNA part: highly variable: 100 nt – 2 000 nt
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 28 / 30
Pseudoknot: Telomerase RNAReplication of chromosomal ends
Leads to cancer
Telomerase Enzym: Telomerase RNA andTERT
RNA part: highly variable: 100 nt – 2 000 nt
CS4CS2
CS3
CS1
CS5a
CS5
CS7
Ku80
TBtemplate
S1
S2
CS6
S3
pseudoknot
template
IIIa
IIIb
IV
I
TBII
Yeast
Ciliate
Vertebrate
P5
P6b CR5P6.1
CR4
pseudoknot
TB
CAB
snoRNAH ACA
template
pseudoknot
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 28 / 30
Specific Telomerase RNA Pseudoknot Finder
Organism Genome Size Obtained(bp) Frequency
S. purpuratus 809 952 877 170 820C. intestinalis 141 233 565 22 330C. savignyi 255 955 828 82 776N. crassa 1 860 657 949 342 708N. discreta 556 883 022 183 461N. tetrasperma 487 800 222 133 339
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 29 / 30
Specific Telomerase RNA Pseudoknot Finder
B
A
C
xxxxxx
xxxxxx
xxxxxx
............ ..................
s1 s2 s3 s4 s5
s1 s2 s3 s4 s5
s1 s2 s3 s4 s5 s6G
s8
s6G
s8
s6G
s8
xxxxxxxxxxxxxxxx
xxxxxxxxxxxx
...................
<<<<< xxxxxxxxx>>>>>
>>>>>
>>>>>
<<<<<
<<<<<
<< >>>
............
s1V
s2
G*s7
TT
TT
TTTC
AA
AAA
s5
s8
s3
s4
s6
A
B
5’
3’
Organism Genome Size Obtained(bp) Frequency
S. purpuratus 809 952 877 170 820C. intestinalis 141 233 565 22 330C. savignyi 255 955 828 82 776N. crassa 1 860 657 949 342 708N. discreta 556 883 022 183 461N. tetrasperma 487 800 222 133 339
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 29 / 30
Specific Telomerase RNA Pseudoknot Finder
B
A
C
xxxxxx
xxxxxx
xxxxxx
............ ..................
s1 s2 s3 s4 s5
s1 s2 s3 s4 s5
s1 s2 s3 s4 s5 s6G
s8
s6G
s8
s6G
s8
xxxxxxxxxxxxxxxx
xxxxxxxxxxxx
...................
<<<<< xxxxxxxxx>>>>>
>>>>>
>>>>>
<<<<<
<<<<<
<< >>>
............
s1V
s2
G*s7
TT
TT
TTTC
AA
AAA
s5
s8
s3
s4
s6
A
B
5’
3’
Organism Genome Size Obtained(bp) Frequency
S. purpuratus 809 952 877 170 820C. intestinalis 141 233 565 22 330C. savignyi 255 955 828 82 776N. crassa 1 860 657 949 342 708N. discreta 556 883 022 183 461N. tetrasperma 487 800 222 133 339
M =
0
B
B
B
B
B
B
B
B
B
@
t
c
a
mferel
hom
rdrc
haca
1
C
C
C
C
C
C
C
C
C
A
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 29 / 30
Acknowledgements
Thx 2:Christian Reidys
Qin JingPeter Stadler
and the whole bioinformatics group leipzig
Thank You!
M. Marz (University of Leipzig) Use and Complexity of existing RNA-tools Tianjin, China 09.11.2009 30 / 30