42
RNA secondary structure prediction

RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

RNA secondary structure prediction

Page 2: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

RNA Basics

• RNA bases A,C,G,U

• Canonical Base Pairs

– A-U

– G-C

– G-U

“wobble” pairing

– Bases can only pair with one other base.

Image: http://www.bioalgorithms.info/

2 Hydrogen Bonds 3 Hydrogen Bonds – more stable

Page 4: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

RNA Basics

• transfer RNA (tRNA)

• messenger RNA (mRNA)

• ribosomal RNA (rRNA)

• small interfering RNA (siRNA)

• micro RNA (miRNA)

• small nucleolar RNA (snoRNA)

http://www.genetics.wustl.edu/eddy/tRNAscan-SE/

Page 5: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Non-coding RNAs • A non-coding RNA (ncRNA) is a functional RNA molecule that is not

translated into a protein; small RNA (sRNA) is often used for bacterial ncRNAs.

• tRNA (transfer RNA), rRNA (ribosomal RNA), snoRNA (small RNA molecules that guide chemical modifications of other RNAs)

• microRNAs (miRNA, μRNA, single-stranded RNA molecules of 21-23 nucleotides in length, regulate gene expression)

• siRNAs (short interfering RNA or silencing RNA, double-stranded, 20-25 nucleotides in length, involved in the RNA interference (RNAi) pathway, where it interferes with the expression of a specific gene. )

• piRNAs (expressed in animal cells, forms RNA-protein complexes through interactions with Piwi proteins, which have been linked to transcriptional gene silencing of retrotransposons and other genetic elements in germ line cells)

• long ncRNAs (non-protein coding transcripts longer than 200 nucleotides)

Page 6: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

RNA Secondary Structure

Hairpin loop

Junction (Multiloop)

Bulge Loop

Single-Stranded

Interior Loop

Stem

Image– Wuchty

Pseudoknot

Page 7: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Sequence Alignment as a method to determine structure

• Bases pair in order to form backbones and determine the secondary structure

• Aligning bases based on their ability to pair with each other gives an algorithmic approach to determining the optimal structure

Page 8: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Complex folds

Page 9: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Pseudoknots

i

j

j’

i’ i j j’ i’

?

Page 10: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

RNA secondary structure representation

• 2D

• Circle plot

• Dot plot

• Mountain

• Parentheses

• Tree model (((…)))..((

….))

Page 11: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Main approaches to RNA secondary structure prediction

• Energy minimization

– dynamic programming approach

– does not require prior sequence alignment

– require estimation of energy terms contributing to secondary structure

• Comparative sequence analysis

– using sequence alignment to find conserved residues and covariant base pairs.

– most trusted

• Simultaneous folding and alignment (structural alignment)

Page 12: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Assumptions in energy minimization approaches

• Most likely structure similar to energetically most stable structure

• Energy associated with any position is only influenced by local sequence and structure

• Neglect pseudoknots

Page 13: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Base-pair maximization • Find structure with the most base pairs

– Only consider A-U and G-C and do not distinguish them

• Nussinov algorithm (1970s)

– Too simple to be accurate, but stepping-stone for later algorithms

Page 14: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

• Problem definition – Given sequence X=x1x2…xL,compute a structure that has

maximum (weighted) number of base pairings

• How can we solve this problem? – Remember: RNA folds back to itself!

– S(i,j) is the maximum score when xi..xj folds optimally

– S(1,L)?

– S(i,i)?

Nussinov algorithm

1 L i j

S(i,j)

Page 15: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

“Grow” from substructures (1) (2) (4) (3)

1 L i j i+1 j-1 k

Page 16: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Dynamic programming

• Compute S(i,j) recursively (dynamic programming) – Compares a sequence against itself in a dynamic

programming matrix

• Three steps

Page 17: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Nussinov RNA Folding Algorithm

• Initialization: γ(i, i-1) = 0 for I = 2 to L;

γ(i, i) = 0 for I = 2 to L.

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G

2 G

3 G

4 A

5 A

6 A

7 U

8 C

9 C

i

j

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

Page 18: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Nussinov RNA Folding Algorithm

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G

2 G 0

3 G 0

4 A 0

5 A 0

6 A 0

7 U 0

8 C 0

9 C 0

j

i

• Initialization: γ(i, i-1) = 0 for I = 2 to L;

γ(i, i) = 0 for I = 2 to L.

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

Page 19: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Nussinov RNA Folding Algorithm

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0

2 G 0 0

3 G 0 0

4 A 0 0

5 A 0 0

6 A 0 0

7 U 0 0

8 C 0 0

9 C 0 0

j

i

• Initialization: γ(i, i-1) = 0 for I = 2 to L;

γ(i, i) = 0 for I = 2 to L.

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

Page 20: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Nussinov RNA Folding Algorithm

• Recursive Relation:

• For all subsequences from length 2 to length L:

)],1(),([max

),()1,1(

)1,(

),1(

max),(

jkki

jiji

ji

ji

ji

jki

Case 1

Case 2

Case 3

Case 4

Page 21: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Nussinov RNA Folding Algorithm

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0

2 G 0 0 0

3 G 0 0 0

4 A 0 0 0

5 A 0 0 0

6 A 0 0 1

7 U 0 0 0

8 C 0 0 0

9 C 0 0

)],1(),([max

),()1,1(

)1,(

),1(

max),(

jkki

jiji

ji

ji

ji

jki

j

i

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

Page 22: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Nussinov RNA Folding Algorithm

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0

2 G 0 0 0 0

3 G 0 0 0 0

4 A 0 0 0 0

5 A 0 0 0 1

6 A 0 0 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

)],1(),([max

),()1,1(

)1,(

),1(

max),(

jkki

jiji

ji

ji

ji

jki

j

i

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

Page 23: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Nussinov RNA Folding Algorithm

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0 0

2 G 0 0 0 0 0

3 G 0 0 0 0 0

4 A 0 0 0 0 1

5 A 0 0 0 1 1

6 A 0 0 1 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

)],1(),([max

),()1,1(

)1,(

),1(

max),(

jkki

jiji

ji

ji

ji

jki

j

i

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

Page 24: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Example Computation

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0 0

2 G 0 0 0 0 0

3 G 0 0 0 0 0

4 A 0 0 0 0

5 A 0 0 0 1 1

6 A 0 0 1 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

j

i

)]7,1(),4([max

)7,4()6,5(

)6,4(

)7,5(

max)7,4(

74 kkk

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

Page 25: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Example Computation

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0 0

2 G 0 0 0 0 0

3 G 0 0 0 0 0

4 A 0 0 0 0

5 A 0 0 0 1 1

6 A 0 0 1 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

j

i

)]7,1(),4([max

)7,4()6,5(

)6,4(

)7,5(

max)7,4(

74 kkk

A U

A

A

i

i+1 j

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

Page 26: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Example Computation

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0 0

2 G 0 0 0 0 0

3 G 0 0 0 0 0

4 A 0 0 0 0

5 A 0 0 0 1 1

6 A 0 0 1 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

j

i

)]7,1(),4([max

)7,4()6,5(

)6,4(

)7,5(

max)7,4(

74 kkk

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

Page 27: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Example Computation

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0 0

2 G 0 0 0 0 0

3 G 0 0 0 0 0

4 A 0 0 0 0

5 A 0 0 0 1 1

6 A 0 0 1 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

j

i

)]7,1(),4([max

)7,4()6,5(

)6,4(

)7,5(

max)7,4(

74 kkk

i+1 j-1

i j A U

A A

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

Page 28: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Example Computation

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0 0

2 G 0 0 0 0 0

3 G 0 0 0 0 0

4 A 0 0 0 0

5 A 0 0 0 1 1

6 A 0 0 1 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

j

i

)]7,1(),4([max

)7,4()6,5(

)6,4(

)7,5(

max)7,4(

74 kkk

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

Page 29: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Example Computation

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0 0

2 G 0 0 0 0 0

3 G 0 0 0 0 0

4 A 0 0 0 0 1

5 A 0 0 0 1 1

6 A 0 0 1 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

j

i

)]7,1(),4([max

)7,4()6,5(

)6,4(

)7,5(

max)7,4(

74 kkk

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

Page 30: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Completed Matrix

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0 0 0 0 1 2 3

2 G 0 0 0 0 0 0 1 2 3

3 G 0 0 0 0 0 1 2 2

4 A 0 0 0 0 1 1 1

5 A 0 0 0 1 1 1

6 A 0 0 1 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

)],1(),([max

),()1,1(

)1,(

),1(

max),(

jkki

jiji

ji

ji

ji

jki

j

i

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

Page 31: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Traceback

• value at γ(1, L) is the total base pair count in the maximally base-paired structure

• as in other DP, traceback from γ(1, L) is necessary to recover the final secondary structure

• pushdown stack is used to deal with bifurcated structures

Page 32: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Retrieving the Structure

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0 0 0 0 1 2 3

2 G 0 0 0 0 0 0 1 2 3

3 G 0 0 0 0 0 1 2 2

4 A 0 0 0 0 1 1 1

5 A 0 0 0 1 1 1

6 A 0 0 1 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

j

i

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

STACK

(1,9)

CURRENT PAIRS

Page 33: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Retrieving the Structure

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0 0 0 0 1 2 3

2 G 0 0 0 0 0 0 1 2 3

3 G 0 0 0 0 0 1 2 2

4 A 0 0 0 0 1 1 1

5 A 0 0 0 1 1 1

6 A 0 0 1 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

j

i

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

STACK

(2,9)

CURRENT

(1,9)

PAIRS

Page 34: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Retrieving the Structure

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0 0 0 0 1 2 3

2 G 0 0 0 0 0 0 1 2 3

3 G 0 0 0 0 0 1 2 2

4 A 0 0 0 0 1 1 1

5 A 0 0 0 1 1 1

6 A 0 0 1 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

j

i

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

STACK

(3,8)

CURRENT

(2,9)

C

G

G

PAIRS

(2,9)

Page 35: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Retrieving the Structure

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0 0 0 0 1 2 3

2 G 0 0 0 0 0 0 1 2 3

3 G 0 0 0 0 0 1 2 2

4 A 0 0 0 0 1 1 1

5 A 0 0 0 1 1 1

6 A 0 0 1 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

j

i

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

STACK

(4,7)

CURRENT

(3,8)

C

G

G

C G

PAIRS

(2,9)

(3,8)

Page 36: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Retrieving the Structure

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0 0 0 0 1 2 3

2 G 0 0 0 0 0 0 1 2 3

3 G 0 0 0 0 0 1 2 2

4 A 0 0 0 0 1 1 1

5 A 0 0 0 1 1 1

6 A 0 0 1 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

j

i

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

STACK

(5,6)

CURRENT

(4,7) U

C

G

A

G

C G

PAIRS

(2,9)

(3,8)

(4,7)

Page 37: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Retrieving the Structure

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0 0 0 0 1 2 3

2 G 0 0 0 0 0 0 1 2 3

3 G 0 0 0 0 0 1 2 2

4 A 0 0 0 0 1 1 1

5 A 0 0 0 1 1 1

6 A 0 0 1 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

j

i

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

STACK

(6,6)

CURRENT

(5,6)

A

U

C

G

A

G

C G

PAIRS

(2,9)

(3,8)

(4,7)

Page 38: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Retrieving the Structure

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0 0 0 0 1 2 3

2 G 0 0 0 0 0 0 1 2 3

3 G 0 0 0 0 0 1 2 2

4 A 0 0 0 0 1 1 1

5 A 0 0 0 1 1 1

6 A 0 0 1 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

j

i

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

STACK

-

CURRENT

(6,6)

A

U

C

G

A

G

C G

A PAIRS

(2,9)

(3,8)

(4,7)

Page 39: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Retrieving the Structure

1 2 3 4 5 6 7 8 9

G G G A A A U C C

1 G 0 0 0 0 0 0 1 2 3

2 G 0 0 0 0 0 0 1 2 3

3 G 0 0 0 0 0 1 2 2

4 A 0 0 0 0 1 1 1

5 A 0 0 0 1 1 1

6 A 0 0 1 1 1

7 U 0 0 0 0

8 C 0 0 0

9 C 0 0

j

i

A

U

C

G

A

G

C G

A

Image Source: Durbin et al. (2002) “Biological Sequence Analysis”

Page 40: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Evaluation of Nussinov

• unfortunately, while this does maximize the base pairs, it does not create viable secondary structures

• in Zuker’s algorithm, the correct structure is assumed to have the lowest equilibrium free energy (ΔG) (Zuker and Stiegler, 1981; Zuker 1989a)

Page 41: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Limitation of Nussinov Algorithm

Page 42: RNA secondary structure prediction€¦ · Main approaches to RNA secondary structure prediction • Energy minimization –dynamic programming approach –does not require prior

Pseudo code for Nussinov Algorithm