Upload
hoangdang
View
219
Download
0
Embed Size (px)
Citation preview
RNA Secondary Structures
� Nesting convention:
� Prerequisite for Dynamic Programming
� Pseudoknot:
lkjijlki
kilkbpjibp
<<<<<<
<
or
with ),( and ),( allFor
ljki <<<
Pseudoknot topologies
� H-type pseudoknots
� Simple recursive pseudoknot
� Planar pseudoknots
� Complex non-planar knots
Planar pseudoknots
� Bi-secondary structures
� Superposition of two disjoint secondary
structures (without knots)
� Allows for chained pseudoknots
� Book thickness (page number) = 2
Non-planar pseudoknots
� Crossing lines in plane
� Book thickness > 2
� Few known biological examples
� In general no coincidence with
algorithms
Pseudoknotted basepairs in vivo
116
96
69
402
21
Average # bp
14.4RNAse P
6.2Group I Intron
1.9SRP RNA
1.4SSU rRNA
0tRNA
% pseudoknot
Mathews et al. JMB(288),1999
Biological functions by example
� Local pseudoknots on mRNA :• signals for frameshifting, readthrough
• Replication control
� Ribozymes – catalytically active pseudoknots:• Hepatitis delta virus
• Group I introns (self splicing)
� Telomerase RNA
Hepatitis delta virus ribozyme
� Circular genome
� Rolling circle replication → multiple
genomes → self-cleaving in genome
length pieces
� Fastest known self-cleaving ribozyme
Computational prediction
� Proof: “general pseudoknot prediction in
energy based models is NP complete”
� Solution:
• Sacrifice optimality: Heuristics (e.g. Genetic
algorithms, stochastic simulations)
• Restrict energy model: Maximum weighted
matching (MWM)
• Restrict class of predictable pseudoknots
Algorithm: PKNOTS
� First DP algorithm for pseudoknot
prediction
� Extension of standard energy based
folding algorithms
� Allows for a wide class of pseudoknots
� Best known and widely used
� Rivas & Eddy, 1999
Graphical notation of recursions
� wx: best folding between position i and j
� vx: best folding between position i and j,
given that i and j pair
Conclusion: PKNOTS
� Ambiguous !
� Analysis: O(n4) space and O(n6) time
� Limit: ~150 bases
� How complex can sequences < 150
bases fold?
pknotsRG
� Idea: Improve runtime by restricting
pseudoknots
� Dynamic programming algorithm
� Implements newer energy model
Simple (recursive) pseudoknot
� Two helices (a-a’,
b-b’) and three loops
(u,v,w)
� Recursive: (pseudoknotted)
structures in loops
possible
Canonization Rule 1
� |a| = |a’| and |b| = |b’|
� f = l – (e – i)
� h = j – (g – k)
• Implies no bulges in pseudoknot stem
Canonization Rule 2
� Helices a-a’ and b-b’ have maximal extent
� maxhel (i, j) : length of maximal helix from i to j
� e = i + maxhel (i, l)
� g = k + maxhel (k, j)
Canonization Rule 3
� Resolve possible overlap of maximal helices.
� For each (i, j) two moving boundaries (k, l) left
� O(n4) time and O(n2) space
Conclusion: pknotsRG
� Limit raised to over 800 nucleotides
� Predicts many biological known
structures