97

Inverse Folding and Sequence-Structure

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Inverse Folding and Sequence-StructureMaps of Ribonucleic Acids

Peter SchusterInstitut für Theoretische Chemie und Molekulare

Strukturbiologie der Universität Wien

Inverse Problem Workshop

IPAM, UCLA, 22.10.2003

Web-Page for further information:

http://www.tbi.univie.ac.at/~pks

1. The role of RNA in the cell and the notion of structure

2. RNA folding

3. Inverse folding of RNA

4. Sequence structure maps, neutral networks, and intersection

5. Reference to experimental data

6. Concluding remarks

1. The role of RNA in the cell and the notion of structure

2. RNA folding

3. Inverse folding of RNA

4. Sequence structure maps, neutral networks, and intersection

5. Reference to experimental data

6. Concluding remarks

RNA

RNA as scaffold for supramolecular complexes

ribosome? ? ? ? ?

RNA as adapter molecule

GA

C...

CU

G ..

.

leu

genetic code

RNA as transmitter of genetic information

DNA

...AGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUC...

messenger-RNA

protein

transcription

translation

RNA as of genetic informationworking copy

RNA as carrier of genetic information

RNA

RNA

viruses and retroviruses

as information carrier inevolution and

evolutionary biotechnologyin vitro

RNA as catalyst

ribozyme

The RNA

DNA protein

world as a precursor of the current + biology

RNA as regulator of gene expression

gene silencing by small interfering RNAs

RNA is modified by epigenetic control

RNA

RNA

editing

Alternative splicing of messenger

RNA is the catalytic subunit in supramolecular complexes

Functions of RNA molecules

OCH2

OHO

O

PO

O

O

N1

OCH2

OHO

PO

O

O

N2

OCH2

OHO

PO

O

O

N3

OCH2

OHO

PO

O

O

N4

N A U G Ck = , , ,

3' - end

5' - end

NaØ

NaØ

NaØ

NaØ

RNA

nd 3’-endGCGGAU AUUCGCUUA AGUUGGGA G CUGAAGA AGGUC UUCGAUC A ACCAGCUC GAGC CCAGA UCUGG CUGUG CACAG

3'-end

5’-end

70

60

50

4030

20

10

Definition of RNA structure

5'-e

RNA sequence

Empirical parameters

Biophysical chemistry: thermodynamics and

kinetics

RNA structure

Inverse folding of RNA:

Biotechnology,design of biomolecules

with predefined structures and functions

RNA folding:

Structural biology,spectroscopy of biomolecules, understanding

molecular function

Sequence, structure, and function

Definition and physical relevance of RNA secondary structures

RNA secondary structures are listings of Watson-Crick and GU wobble base pairs, which are free of knots and pseudokots.

„Secondary structures are folding intermediates in the formation of full three-dimensional structures.“

D.Thirumalai, N.Lee, S.A.Woodson, and D.K.Klimov. Annu.Rev.Phys.Chem. 52:751-762 (2001):

5'-End

5'-End 3'-End

3'-End

70

60

50

4030

20

10

GCGGAUUUAGCUCAGDDGGGAGAGCMCCAGACUGAAYAUCUGGAGMUCCUGUGTPCGAUCCACAGAAUUCGCACCASequence

Secondary structure

The RNA secondary structure lists the double helical stretches or stacks of a folded single strand molecule

The three-dimensional structure of a short double helical stack of B-DNA

James D. Watson, 1928- , and Francis Crick, 1916- ,Nobel Prize 1962

1953 – 2003 fifty years double helix

Canonical Watson-Crick base pairs:

cytosine – guanineuracil – adenine

W.Saenger, Principles of Nucleic Acid Structure, Springer, Berlin 1984

5'-End

5'-End 3'-End

3'-End

70

60

50

4030

20

10

GCGGAUUUAGCUCAGDDGGGAGAGCMCCAGACUGAAYAUCUGGAGMUCCUGUGTPCGAUCCACAGAAUUCGCACCASequence

Secondary structure

5'-End

5'-End

5'-End

3'-End

3'-End

3'-End

70

60

50

4030

20

10

GCGGAUUUAGCUCAGDDGGGAGAGCMCCAGACUGAAYAUCUGGAGMUCCUGUGTPCGAUCCACAGAAUUCGCACCASequence

Secondary structure

Symbolic notation

Í

A symbolic notation of RNA secondary structure that is equivalent to the conventional graphs

Tertiary elements in RNA structure

1. Different classes of pseudoknots

2. Different classes of non-Watson-Crick base pairs

3. Base triplets, G-quartets, A-platforms, etc.

4. End-on-end stacking of double helices

5. Divalent metal ion complexes, Mg2+, etc.

6. Other interactions involving phosphate, 2‘-OH, etc.

Tertiary elements in RNA structure

1. Different classes of pseudoknots

2. Different classes of non-Watson-Crick base pairs

3. Base triplets, G-quartets, A-platforms, etc.

4. End-on-end stacking of double helices

5. Divalent metal ion complexes, Mg2+, etc.

6. Other interactions involving phosphate, 2‘-OH, etc.

3'-end

"H-type pseudoknot"

5'-end

3'-end

pseudoknot

"Kissing loops"

5'-end

·· (((( ····· [[ ·)))) ···· ((((( · ]] ·····))))) ···

Two classes of pseudoknots in RNA structures

Tertiary elements in RNA structure

1. Different classes of pseudoknots

2. Different classes of non-Watson-Crick base pairs

3. Base triplets, G-quartets, A-platforms, etc.

4. End-on-end stacking of double helices

5. Divalent metal ion complexes, Mg2+, etc.

6. Other interactions involving phosphate, 2‘-OH, etc.

Twelve families of base pairsWatson-Crick / Hogsteen / Sugar

edgeCis / Transorientation

N.B. Leontis, E. Westhof, Geometric nomenclature and classification of RNA base pairs. RNA 7:499-512, 2001.

Tertiary elements in RNA structure

1. Different classes of pseudoknots

2. Different classes of non-Watson-Crick base pairs

3. Base triplets, G-quartets, A-platforms, etc.

4. End-on-end stacking of double helices

5. Divalent metal ion complexes, Mg2+, etc.

6. Other interactions involving phosphate, 2‘-OH, etc.

5'-End

3'-End7060

50

4030

20

10

5'-End

3'-End

70

60

50

4030

20

10

End-on-end stacking of double helical regions yields the L-shape of tRNAphe

1. The role of RNA in the cell and the notion of structure

2. RNA folding

3. Inverse folding of RNA

4. Sequence structure maps, neutral networks, and intersection

5. Reference to experimental data

6. Concluding remarks

How to compute RNA secondary structures

Efficient algorithms based on dynamic programming are available for computation of minimum free energy and many suboptimal secondary structures for given sequences.

M.Zuker and P.Stiegler. Nucleic Acids Res. 9:133-148 (1981)

M.Zuker, Science 244: 48-52 (1989)

Equilibrium partition function and base pairing probabilities in Boltzmann ensembles of suboptimal structures.J.S.McCaskill. Biopolymers 29:1105-1190 (1990)

The Vienna RNA Package provides in addition: inverse folding (computing sequences for given secondary structures), computation of melting profiles from partitionfunctions, all suboptimal structures within a given energy interval, barrier tress of suboptimal structures, kinetic folding of RNA sequences, RNA-hybridization and RNA/DNA-hybridization through cofolding of sequences, alignment, etc.. I.L.Hofacker, W. Fontana, P.F.Stadler, L.S.Bonhoeffer, M.Tacker, and P. Schuster. Mh.Chem.125:167-188 (1994)

S.Wuchty, W.Fontana, I.L.Hofacker, and P.Schuster. Biopolymers 49:145-165 (1999)

C.Flamm, W.Fontana, I.L.Hofacker, and P.Schuster. RNA 6:325-338 (1999)

Vienna RNA Package: http://www.tbi.univie.ac.at

G

GG

G

GG

G GG

G

GG

G

GG

G

U

U

U

U

UU

U

U

U

UU

A

A

AA

AA

AA

A

A

A

A

UC

C

CC

C

C

C

C

C

CC

C

5’-end 3’-end

Folding of RNA sequences into secondary structures of minimal free energy, DG0300

G

GG

G

GG

G GG

G

GG

G

GG

G

U

U

U

U

UU

U

U

U

UU

A

A

AA

AA

AA

A

A

A

A

UC

C

CC

C

C

C

C

C

CC

C

5’-end 3’-end

i j

k l

Edges: i·j,k·l á S .... base pairs

(i) i· i+1 á S .... backbone(ii) #base pairs per node = {0,1}(iii) if i·j and l·k á S, then

i<k<j ñ i<l<j ....pseudoknot exclusion

Folding of RNA sequences into secondary structures of minimal free energy, DG0300

G

GG

G

GG

G GG

G

GG

G

GG

G

U

U

U

U

UU

U

U

U

UU

A

A

AA

AA

AA

A

A

A

A

UC

C

CC

C

C

C

C

C

CC

C

5’-end 3’-end

free energy of stacking < 0

L∑∑∑∑ ++++=∆

loopsinternalbulges

loopshairpin

pairsbaseofstacks

,3000 )()()( iblklij ninbnhgG

Folding of RNA sequences into secondary structures of minimal free energy, DG0300

hairpinloop

hairpinloop

stack

stack

stack

hairpin loop

stack

free end

freeend

freeend

hairpin loop

hairpinloop

stack

stack

free end

freeend joint

hairpin loop

stackstack

stack

internal loop

bulgem

ultiloop

Elements of RNA secondary structures as used in free energy calculations

Maximum matching

An example of a dynamic programming computation of the maximum number of base pairs

Back tracking yields the structure(s).

i i+1 i+2 k

Xi,k-1

j-1 j

Xk+1,j

j+1

[ k+1,j ][i,k-1]

( ){ }1,,11,1,1, )1(max,max ++−−≤≤+ ++= jkjkkijkijiji XXXX ρ

Minimum free energy computations are based on empirical energies

GGCGCGCCCGGCGCC

GUAUCGAAAUACGUAGCGUAUGGGGAUGCUGGACGGUCCCAUCGGUACUCCA

UGGUUACGCGUUGGGGUAACGAAGAUUCCGAGAGGAGUUUAGUGACUAGAGGRNAStudio.lnk

Maximum matching j 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15i G G C G C G C C C G G C G C C1 G * * 1 1 1 1 2 3 3 3 4 4 5 6 62 G * * 0 1 1 2 2 2 3 3 4 4 5 63 C * * 0 1 1 1 2 3 3 3 4 5 54 G * * 0 1 1 2 2 2 3 4 5 55 C * * 0 1 1 2 2 3 4 4 46 G * * 1 1 1 2 3 3 3 47 C * * 0 1 2 2 2 2 38 C * * 1 1 1 2 2 29 C * * 1 1 2 2 2

10 G * * 1 1 1 211 G * * 0 1 112 C * * 0 113 G * * 114 C * *15 C *

An example of a dynamic programming computation of the maximum number of base pairs

Back tracking yields the structure(s).

i i+1 i+2 k

Xi,k-1

j-1 j

Xk+1,j

j+1

[ k+1,j ][i,k-1]

( ){ }1,,11,1,1, )1(max,max ++−−≤≤+ ++= jkjkkijkijiji XXXX ρ

Minimum free energy computations are based on empirical energies

GGCGCGCCCGGCGCC

GUAUCGAAAUACGUAGCGUAUGGGGAUGCUGGACGGUCCCAUCGGUACUCCA

UGGUUACGCGUUGGGGUAACGAAGAUUCCGAGAGGAGUUUAGUGACUAGAGGRNAStudio.lnk

S1(h)

S9(h)

Free

ene

rgy

GD

0

Minimum of free energy

Suboptimal conformations

S0(h)

S2(h)

S3(h)

S4(h)

S7(h)

S6(h)

S5(h)

S8(h)

G

GG

G

GG

G GG

G

GG

G

GG

G

U

U

U

U

UU

U

U

U

UU

A

A

AA

AA

AA

A

A

A

A

UC

C

CC

C

C

C

C

C

CC

C

5’-end 3’-end

The minimum free energy structures on a discrete space of conformations

Free

ene

rgy

GD

0

"Reaction coordinate"

Sk

S{Saddle point T {k

Free

ene

rgy

GD

0

Sk

S{

T {k

"Barrier tree"

Definition of a ‚barrier tree‘

5.10

2

8

1415

1817

23

19

2722

38

45

25

3633

3940

4341

3.30

7.40

5

3

7

4

109

6

1312

3.10

11

2120

16

2829

26

3032

424644

24

353437

49

31

4748

S0S1

Kinetic folding

S0

S1

S2

S3S4S5 S6

S7S8

S10S9

Suboptimal structures

lim t Á¸ finite folding time

5.90

A typical energy landscape of a sequence with two (meta)stable comformations

Kinetics RNA refolding between a long living metastable conformation and the minmum free energy structure

1. The role of RNA in the cell and the notion of structure

2. RNA folding

3. Inverse folding of RNA

4. Sequence structure maps, neutral networks, and intersection

5. Reference to experimental data

6. Concluding remarks

Minimum free energycriterion

Inverse folding of RNA secondary structures

The idea of inverse folding algorithm is to search for sequences that form a given RNA secondary structure under the minimum free energy criterion.

Structure

CUGGGAAAAAUCCCCAGACCGGGGGUUUCCCCGG

Compatible sequenceStructure

5’-end

3’-end

CUGGGAAAAAUCCCCAGACCGGGGGUUUCCCCGG

G

G

G

G

G

G

G

C

C

C

C G

G

G

G

C

C

C

C

C

C

C

U A

U

U

G

U

AA

A

A

U

Compatible sequenceStructure

5’-end

3’-end

CUGGGAAAAAUCCCCAGACCGGGGGUUUCCCCGG

G

G

G

G

G

G

G

C

C

C

C

U

U

G

G

G

G

G

C

C

C

C

C

C

C

U

U

AA

A

A

A

U

Compatible sequenceStructure

5’-end

3’-end

Single nucleotides: A U G C, , ,

Single bases pairs are varied independently

CUGGGAAAAAUCCCCAGACCGGGGGUUUCCCCGG

G

G

C

C

C

C G

G

G

G

C

C

G

G

G

G

G

C

C

C

C

C

U A

U

U

G

U

AA

A

A

U

Compatible sequenceStructure

5’-end

3’-end

Base pairs: AU , UAGC , CGGU , UG

Base pairs are varied in strict correlation

CUGGGAAAAAUCCCCAGACCGGGGGUUUCCC

GG

U

CUGGGAAAAAUCCCCAGACCGGGGGUUUCCCCGG

G G

G G

G G

G G

G G

G G

G G

C U

C C

C C

C C

U U

U U

G G

G G

G G

G G

G G

C C

C C

C C

C C

C C

C C

C C

U U

U U

A AA A

A A

A A

A A

U U

Compatible sequencesStructure

5’-end 5’-end

3’-end 3’-end

CUGGGAAAAAUCCCCAGACCGGGGGUUUCCGCGG

G

G

G

G

G

G

G

CG

C

C

U

U

G

G

GG

G

C

C

C

C

C

C

C

U

U

AA

A

A

A

U

Structure Incompatible sequence

5’-end

3’-end

.... GC UC ....CA

.... GC UC ....GU

.... GC UC ....GA .... GC UC ....CU

d =1H

d =1H

d =2H

City-block distance in sequence space 2D Sketch of sequence space

Single point mutations as moves in sequence space

0

421 8 16

10

19

9

14

6

13

5

11

3

7

12

21

17

22

18

25

20

26

24

28

272315 29 30

31

Binary sequences are encodedby their decimal equivalents:

= 0 and = 1, for example,

"0" 00000 =

"14" 01110 = ,

"29" 11101 = , etc.

,

C

CCCCC

C C

C

G

GGG

GGG G

Mutant class

0

1

2

3

4

5

Hypercube of dimension n = 5 Decimal coding of binary sequences

Sequence space of binary sequences of chain lenght n = 5

CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG.....

CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG.....

G A G T

A C A C

Hamming distance d (I ,I ) = H 1 2 4

d (I ,I ) = 0H 1 1

d (I ,I ) = d (I ,I )H H1 2 2 1

d (I ,I ) d (I ,I ) + d (I ,I )H H H1 3 1 2 2 3¶

(i)

(ii)

(iii)

The Hamming distance between sequences induces a metric in sequence space

Target structure Sk

Initial trial sequences

Target sequence

Stop sequence of anunsuccessful trial

Intermediate compatible sequences

Approach to the target structure Sk in the inverse folding algorithm

Inverse folding algorithm

I0 Á I1 Á I2 Á I3 Á I4Á ... Á Ik Á Ik+1 Á ... Á It

S0Á S1Á S2Á S3Á S4Á ... Á SkÁ Sk+1Á ... Á St

Ik+1 = Mk(Ik) and DdS(Sk,Sk+1) = dS(Sk+1,St) - dS(Sk,St) < 0

M ... base or base pair mutation operator

dS (Si,Sj) ... distance between the two structures Si and Sj

‚Unsuccessful trial‘ ... termination after n steps

Minimum free energycriterion

Inverse folding of RNA secondary structures

1st2nd3rd trial4th5th

The inverse folding algorithm searches for sequences that form a given RNA secondary structure under the minimum free energy criterion.

1. The role of RNA in the cell and the notion of structure

2. RNA folding

3. Inverse folding of RNA

4. Sequence structure maps, neutral networks, and intersection

5. Reference to experimental data

6. Concluding remarks

Minimal hairpin loop size:

nlp Æ 3

Minimal stack length:nst Æ 2

Recursion formula for the number of acceptable RNA secondary structures

Computed numbers of minimum free energy structures over different nucleotide alphabets

P. Schuster, Molecular insights into evolution of phenotypes. In: J. Crutchfield & P.Schuster, Evolutionary Dynamics. Oxford University Press, New York 2003, pp.163-215.

Hamming distance d (S ,S ) = H 1 2 4

d (S ,S ) = 0H 1 1

d (S ,S ) = d (S ,S )H H1 2 2 1

d (S ,S ) d (S ,S ) + d (S ,S )H H H1 3 1 2 2 3¶

(i)

(ii)

(iii)

The Hamming distance between structures in parentheses notation forms a metric in structure space

RNA sequences as well as RNA secondary structures can be visualized as objects in metric spaces. At constant chain length the sequence space is a (generalized) hypercube.

The mapping from RNA sequences into RNA secondary structures is many-to-one. Hence, it is redundant and not invertible.

RNA sequences, which are mapped onto the same RNA secondary structure, are neutral with respect to structure. The pre-images of structures in sequence space are neutral networks. They can be represented by graphs where the edgesconnect sequences of Hamming distance dH = 1.

Sk I. = ( )ψfk f Sk = ( )

Sequence space Structure space Real numbers

Mapping from sequence space into structure space and into function

Sk I. = ( )ψfk f Sk = ( )

Sequence space Structure space Real numbers

Sk I. = ( )ψfk f Sk = ( )

Sequence space Structure space Real numbers

The pre-image of the structure Sk in sequence space is the neutral network Gk

Neutral networks are sets of sequences forming the same structure. Gk is the pre-image of the structure Sk in sequence space:

Gk = y-1(Sk) π {yj | y(Ij) = Sk}

The set is converted into a graph by connecting all sequences of Hamming distance one.

Neutral networks of small RNA molecules can be computed by exhaustive folding of complete sequence spaces, i.e. all RNA sequences of a given chain length. This number, N=4n , becomes very large with increasing length, and is prohibitive for numerical computations.

Neutral networks can be modelled by random graphs in sequence space. In this approach, nodes are inserted randomly into sequence space until the size of the pre-image, i.e. the number of neutral sequences, matches the neutral network to be studied.

Random graph approach to neutral networks

Sketch of sequence spaceStep 00

Random graph approach to neutral networks

Sketch of sequence spaceStep 01

Random graph approach to neutral networks

Sketch of sequence spaceStep 02

Random graph approach to neutral networks

Sketch of sequence spaceStep 03

Random graph approach to neutral networks

Sketch of sequence spaceStep 04

Random graph approach to neutral networks

Sketch of sequence spaceStep 05

Random graph approach to neutral networks

Sketch of sequence spaceStep 10

Random graph approach to neutral networks

Sketch of sequence spaceStep 15

Random graph approach to neutral networks

Sketch of sequence spaceStep 25

Random graph approach to neutral networks

Sketch of sequence spaceStep 50

Random graph approach to neutral networks

Sketch of sequence spaceStep 75

Random graph approach to neutral networks

Sketch of sequence spaceStep 100

λj = 27 = 0.444 ,/12 λk = ø l (k)j

| |Gk

λ κ cr = 1 - -1 ( 1)/ κ-

λ λk cr . . . .>

λ λk cr . . . .<

network is connectedGk

network is connectednotGk

Connectivity threshold:

Alphabet size : = 4k ñ kAUGC

G S Sk k k= ( ) | ( ) = y y-1 U { }I Ij j

k lcr

2 0.5

3 0.423

4 0.370

GC,AU

GUC,AUG

AUGC

Mean degree of neutrality and connectivity of neutral networks

A connected neutral network

Giant Component

A multi-component neutral network

Reference for postulation and in silico verification of neutral networks

GkNeutral Network

Structure S k

Gk Cà k

Compatible Set Ck

The compatible set Ck of a structure Sk consists of all sequences which form Sk as its minimum free energy structure (the neutral network Gk) or one of itssuboptimal structures.

Structure S 0

Structure S 1

The intersection of two compatible sets is always non empty: C0 Ú C1 â Ù

Reference for the definition of the intersection and the proof of the intersection theorem

CUGGGAAAAAUCCCCAGACCGGGGGUUUCCCCGG

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

C

C

C

C

C

C

C

C

UU

UU

U

U

G

G

G

G

G

C

C

C

C

C

C

C

CC

C

C

C

CU

UU

AA

A

AA

A

A

A

A

A

U

3’- end

Min

imum

free

ene

rgy

conf

orm

atio

n S 0

Subo

ptim

al c

onfo

rmat

ion

S 1

C

G

A sequence at the intersection of two neutral networks is compatible with both structures

5.10

5.902

8

1415

1817

23

19

2722

38

45

25

3633

3940

4341

3.30

7.40

5

3

7

4

109

6

1312

3.10

11

2120

16

2829

26

3032

424644

24

353437

49

31

4748

S0S1

basin '1'

long livingmetastable structure

basin '0'

minimum free energy structure

Barrier tree for two long living structures

1. The role of RNA in the cell and the notion of structure

2. RNA folding

3. Inverse folding of RNA

4. Sequence structure maps, neutral networks, and intersection

5. Reference to experimental data

6. Concluding remarks

A ribozyme switch

E.A.Schultes, D.B.Bartel, Science 289 (2000), 448-452

Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase (A) and a natural cleavage ribozyme of hepatitis-d-virus (B)

The sequence at the intersection:

An RNA molecules which is 88 nucleotides long and can form both structures

Two neutral walks through sequence space with conservation of structure and catalytic activity

Nature , 323-325, 1999402

Catalytic activity in the AUG alphabet

Nature , 841-844, 2002420

Catalytic activity in theDU alphabet

5'-End

3'-End

70

60

50

4030

20

10

RNA clover-leaf secondary structures of sequences with chain length n=76

tRNAphe

Target structure Sk

Initial trial sequences

Target sequence

Stop sequence of anunsuccessful trial

Intermediate compatible sequences

Approach to the target structure Sk in the inverse folding algorithm

5'-End5'-End 5'-End5'-End

3'-End3'-End 3'-End3'-End

7070 7070

60606060

5050 5050

4040 4040 3030 3030

2020 20

20

1010 1010

Alphabet Probability of successful trials in inverse folding

AU

AUG

AUGC

UGC

GC

- -

- -

0.794 0.007

0.548 0.011

0.067 0.007

Ä

Ä

Ä

- -

0.003 0.001

0.884 0.008

0.628 0.012

Ä

Ä

Ä

0.086 0.008Ä

0.051 0.006

0.374 0.016

0.982 0.004

0.818 0.012

0.127 0.006

Ä

Ä

Ä

Ä

Ä

Accessibility of cloverleaf RNA secondary structures through inverse folding

5'-End5'-End 5'-End5'-End

3'-End3'-End 3'-End3'-End

7070 7070

60606060

5050 5050

4040 4040 3030 3030

2020 20

20

1010 1010

Alphabet Degree of neutrality l

AU

AUG

AUGC

UGC

GC

- -

- -

0.275 0.064

0.263 0.071

0.052 0.033

Ä

Ä

Ä

- -

0.217 0.051

0.279 0.063

0.257 0.070

Ä

Ä

Ä

0.057 0.034Ä

0.073 0.032

0.201 0.056

0.313 0.058

0.250 0.064

0.068 0.034

Ä

Ä

Ä

Ä

Ä

`

Degree of neutrality of cloverleaf RNA secondary structures over different alphabets

1. The role of RNA in the cell and the notion of structure

2. RNA folding

3. Inverse folding of RNA

4. Sequence structure maps, neutral networks, and intersection

5. Reference to experimental data

6. Concluding remarks

Concluding remarks

1. At constant chain lengths the number of RNA sequences exceeds the number of secondary structures by orders of magnitude.

2. The pre-images of common structures in sequence space are extended and connected neutral networks.

3. The intersection of the sets of compatible sequences of two structures is always non-empty.

4. Inverse folding allows for the design of RNA molecules with predefined structures and properties.

Acknowledgement of support

Fonds zur Förderung der wissenschaftlichen Forschung (FWF)

Projects No. 09942, 10578, 11065, 1309313887, and 14898

Jubiläumsfonds der Österreichischen Nationalbank

Project No. Nat-7813

European Commission: Project No. EU-980189

Siemens AG, Austria

The Santa Fe Institute and the Universität Wien

The software for producing RNA movies was developed by Robert Giegerich and coworkers at the Universität Bielefeld

Universität Wien

Coworkers

Universität WienWalter Fontana, Santa Fe Institute, NM

Christian Reidys, Christian Forst, Los Alamos National Laboratory, NM

Peter Stadler, Bärbel Stadler, Universität Leipzig, GE

Ivo L.Hofacker, Christoph Flamm, Universität Wien, AT

Andreas Wernitznig, Michael Kospach, Universität Wien, ATUlrike Langhammer, Ulrike Mückstein, Stefanie Widder

Jan Cupal, Kurt Grünberger, Andreas Svrček-Seiler, Stefan WuchtyAndreas DeStefano

Ulrike Göbel, Institut für Molekulare Biotechnologie, Jena, GEWalter Grüner, Stefan Kopp, Jaqueline Weber

Web-Page for further information:

http://www.tbi.univie.ac.at/~pks