47
Principles of nucleic acid structure Biophysical Chemistry 1, Fall 2010 Reading assignment: Chap. 3 Web assignment: http://w3dna.rutgers.edu

Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Embed Size (px)

Citation preview

Page 1: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Principles of nucleic acid structureBiophysical Chemistry 1, Fall 2010

Reading assignment: Chap. 3Web assignment: http://w3dna.rutgers.edu

Page 2: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Nucleic acids: phosphates, sugars, basesThe phosphodiester linkage is directional.

The 3´-oxygen of nucleotide i is joined to the 5´-oxygen of nucleotide i+1.

(i)

(I+1)

Page 3: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

The sugar-phosphate backbone

main categories, E and T. If four atoms of the aldofuranose ring lie approxi-mately in a plane, the conformation is described as envelope (E), due to the pic-torial resemblance to an envelope, otherwise the conformation is described astwist (T) (Fig. 3.5).

The puckering of the ribose is conveniently described by the phase angle P,which is defined in terms of the internal ribose conformation angles ν0–ν4 (seeFig. 3.6):

P must be in the interval 0°–360°, so if ν0 < 0 then P = P + 180°. The various con-formational nomenclatures of the ribose ring can conveniently be drawn along acircle as a function of P (Fig. 3.7).

Adenine and guanine are the two most common purine bases, but inosine isfound in some nucleic acid molecules. DNA has two types of pyrimidines, cyto-sine and thymine. In RNA, thymine is normally replaced by a similar base, uracil,which lacks the methyl group found in thymine (Fig. 3.8).

P =+ - +

+arctan

( ) ( )[sin( ) sin( )]

.u u u u

u p p2 4 1 3

02 5 2 5

Basics of Nucleic Acid Structure 61

FA

Chain direction

nucleotide unit

O3’

O5’

O4’

O3’

P

Base

1’

2’

3’4’

5’

OO Pα

β

δε

ζ

χ

ζα = O3’i-1—P—O5’—C5’

β = P—O5’—C5’—C4’

γ = O5’—C5’—C4’-C3’

δ = C5’—C4’—C3’—C2’

ε = C4’—C3’—O3’—Pi+1

ζ = C3’—O3’—Pi+1-O3’i+1

γ

P

OP2

OP1

O5’

C5’

C3’

O3’

Residue i

Residue i-1

FIGURE 3.2 Section of the DNA backbone, showing the atom naming and the naming of var-ious torsion angles. The cyclic ring is the deoxyribose moiety of the backbone, and successiveribose units are connected via phosphate groups. The backbone is therefore often referred to as“the sugar-phosphate backbone.” Bottom: The direction of the chain is by convention definedby two oxygen atoms on the ribose, O5′ and O3′, and specifies the direction in which tran-scription takes place (from 5′ to 3′ in the new strand).

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 61

Page 4: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Sugar puckering

There are two preferred ways of arranging the base in relation to the ribose,syn and anti (Figs. 3.9 and 3.10). In pyrimidine nucleotides only the anti confor-mation is found, because this avoids collision between the oxygen and the ribose.Purines can have both orientations, but the anti conformation is the most commonof them.

Mammalian DNA is known to contain a methylated base, 5-methylcytosine(m5C). Bacterial DNA contains this one and two other methylated bases, namely,

Basics of Nucleic Acid Structure 63

FA

C5’2’

3’ O4’

B C5’

2’

3’

O4’

B

C5’ 2’

3’ O4’

B C5’

2’

3’ B

C5’ 2’

3’

O4’ B C5’

2’

3’ O4’ B

2E

2T3

E3

3E

3T2

E2

BB C5’

C5’

S conformations N conformations

FIGURE 3.5 Various sugar ring puckering conformations. Those on the left are denoted S (forsouth); those on the right, N (for north). The C3′-endo conformation is seen at the top right, andthe C2′-endo conformation at the top left. The notation of E and T conformations is also given.Superscript numbers preceding E or T refer to carbon atoms on the same side of the referenceplane (horizontal line) as C5′. Subscripts following E or T denote atoms on the opposite side ofthe reference plane.

O4’

C1’

C2’C3’

C4’

ν4

ν3 ν1

ν0

ν2

ν0 = C4’—O4’—C1’—C2’

ν1 = O4’—C1’—C2’—C3’

ν2 = C1’—C2’—C3’—C4’

ν3 = C2’—C3’—C4’—O4’

ν4 = C3’—C4’—O4’—C1’

FIGURE 3.6 Naming scheme for the torsion angles of the sugar ring in nucleotides.

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 63

Page 5: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Sugar puckering64 A Textbook of Structural Biology

FA

72˚

108˚

180˚

252˚

288˚

324˚

144˚216˚

36˚

C4’-exo

C4’-endo

C1’-exo

C2’-endoC3’-exo

C4’-endo

O4’-exo

C1’-endo

C2’-exo C3’-endo

S

N

T

E3T34E404

E0

T01E1

T21E2T23E3

T43

E4T40

E0

T10

E1T12

E2 T32

FIGURE 3.7 Diagram showing the correlation between the phase angle P and the ribose con-formation. Conformational angles of P are divided into two categories, north (N) and south (S).

N

N

NH

N

NH2

N

NH

NH

N

NH2

O

NH

NH

O

O NH

NH

O

O

N

NH

NH2

O

Adenine Guanine

Uracil Thymine Cytosine

5

61

2

34

56

7

89

1

23

4

56

7

89

1

23

4

5

61

2

34

5

61

2

34

FIGURE 3.8 The most common bases found in nucleic acids: the top row is purines; thebottom row pyrimidines. The atom-numbering scheme of purines and pyrimidines is given.

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 64

Page 6: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

purines and pyrimidines

64 A Textbook of Structural Biology

FA

72˚

108˚

180˚

252˚

288˚

324˚

144˚216˚

36˚

C4’-exo

C4’-endo

C1’-exo

C2’-endoC3’-exo

C4’-endo

O4’-exo

C1’-endo

C2’-exo C3’-endo

S

N

T

E3T34E404

E0

T01E1

T21E2T23E3

T43

E4T40

E0

T10

E1T12

E2 T32

FIGURE 3.7 Diagram showing the correlation between the phase angle P and the ribose con-formation. Conformational angles of P are divided into two categories, north (N) and south (S).

N

N

NH

N

NH2

N

NH

NH

N

NH2

O

NH

NH

O

O NH

NH

O

O

N

NH

NH2

O

Adenine Guanine

Uracil Thymine Cytosine

5

61

2

34

56

7

89

1

23

4

56

7

89

1

23

4

5

61

2

34

5

61

2

34

FIGURE 3.8 The most common bases found in nucleic acids: the top row is purines; thebottom row pyrimidines. The atom-numbering scheme of purines and pyrimidines is given.

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 64

Page 7: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

The glycosidic torsion parameterBasics of Nucleic Acid Structure 65

FA

FIGURE 3.9 The anti and syn conformations of adenine monophosphate.

syn

anti

300˚

240˚

60˚

120˚

180˚

O4’4’

3’

2’

-60˚

-120˚

300˚

240˚

60˚

120˚

180˚

O4’4’

3’

2’

-60˚

Purine Pyrimidine

C2

C4

χ

FIGURE 3.10 Diagram defining the torsion angle χ around the N-glycosidic bond. The penta-gon illustrates the ribose unit, and the base is seen edge-on. The sequence of atoms chosen todefine this angle is O4′–C1′–N9–C4 for purine and O4′–C1′–N1–C2 for pyrimidine deriva-tives. Thus when χ = 0° the O4′–C1′ bond is eclipsed with the N9–C4 bond for purine and theN1–C2 bond for pyrimidine derivatives. The syn conformation is defined as χ = ±90° and antias χ = 180 ± 90°; thus, the syn conformational region is given by the upper half-circle, and theanti conformation by the lower half. The purine on the left is therefore in the syn conformation,and the pyrimidine on the right, in the anti conformation.

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 65

Page 8: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Watson-Crick base pairing

phosphodiester linkages between adjacent nucleotides. In the sugar-phosphatepart the phosphate groups connect to the 3′ carbon of one deoxyribose moietyand the 5′ carbon of the next moiety, thereby linking successive deoxyribosestogether. The two ends of a chain differ; the end where the 5′ carbon is not con-nected to another nucleotide is called the 5′ end. The other end is called the 3′end. The two ends may have or lack free phosphate groups.

Each of the four bases in DNA has a unique set of hydrogen bond donors andacceptors that allows it to form base pairs with the other bases. In double-strandedDNA we have AT (adenine-thymine) base pairs with two hydrogen bonds and GC(guanine-cytidine) base pairs with three hydrogen bonds (Fig. 3.13). These inter-actions are called Watson–Crick base pairs to honor the scientists who first sug-gested that these base pairs are the basis of heredity.

In a double helix, the flat bases that base pair in the center are perpendicularto the helical axes. The two chains, which form a cylindrical spiral, run in oppo-site directions. The 5′ end of one chain is paired with the 3′ end of the other strand,and vice versa. This can be represented with an arrow for each chain, runningfrom the 5′ end to the 3′ end. The arrows point in different directions — they areantiparallel. The orientation parameters are crucial during the copying of a singlechain/strand.

In an ideal DNA double helix, there are approximately 10 base pairs per turnof DNA. Usually, the helical conformation is right-handed, that is, it twists to theright as do the threads on most screws. The sugar-phosphate backbone of thetwo antiparallel chains forms ridges on the edges of the helix. There are twogrooves between the ridges formed by the ribose-phosphate backbone. The twogrooves are of different widths and thus traditionally called the major and minorgrooves. The major groove is wider and the bases are more accessible than in theminor groove. The exposed edges of the base pairs contain different hydrogen

68 A Textbook of Structural Biology

FA

N

NHN

N

N

O

R

H

H

N

N

N

O R

H

H

Guanine Cytosine

N

N

N

N

N

H H

R

N

NH

O

OR

Adenine Thymine

major groove

minor groove

major groove

minor groove

FIGURE 3.13 The Watson–Crick base pairs. The sugar moieties are represented by R. Noticethat the GC base pair on the left interacts via three hydrogen bonds, whereas the AT base pairon the right has only two. This makes the GC base pair and thus GC-rich DNA morestable than the AT base pair and AT-rich DNA.

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 68

Page 9: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

A and B-form helices

Base pairs displaced from A-DNA helical axis.

A-DNA minor groovewider and more

shallow, major groovenarrower and more

deep cf. B!DNA

A-DNA B-DNA

-+salt and/or

+alcohol

.+H2O

A-DNA base pairs inclined with respect to helical axis and untwisted cf. B DNA.

The A.B transition: first known change of DNA double-helical state.

10 bp/turn11 bp/turn

Page 10: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

A and B-form helices

bond donors and acceptors, which can interact with protein molecules. For eachbase pair, four different combinations of donors and acceptors exist at the exposededges in the major and minor grooves. This creates a variety of patterns alongthe DNA double helix, available for highly specific DNA-protein interactions (seeSec 7.2).

Two factors are mainly responsible for the stability of the double helix struc-ture: the base pairing between complementary strands and — especially — thestacking between adjacent base pairs. DNA molecules show considerable varia-tion in conformation, dependent on the actual base sequence, the AT/GC content,and the presence of counter ions and stabilizing/destabilizing proteins. Since theAT base pair only contains two hydrogen bonds, it is easy to distort. AT-richsequences are functionally important, and often serve as binding sites of DNA-binding proteins. In the crystal structure of the TATA box binding protein com-plex, the protein induces a significant bending of the DNA (see Sec. 7.5.2).

Double-stranded DNA can adopt several conformations, also called forms(Fig. 3.14). These conformations are determined by the activity of water and thenature of counterions. In vivo, the conformation is also influenced by the presence ofproteins. Even if various forms have been observed under in vitro conditions, it isstill not known whether they play any physiological role under in vivo conditions.

In aqueous solutions the DNA helix most often occurs in the B form, which has10 bases per turn. One full turn measures 3.4 nm in the axial direction. The helical

Basics of Nucleic Acid Structure 69

FA

FIGURE 3.14 CPK representations of A-, B- and Z-DNA.

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 69

Page 11: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Base stackingThe overlap of successive base pairs depends on duplex form.

Top-down “stacking diagrams” of dG2·dC2 and dA2·dT2 units in canonical A and B forms.

Page 12: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Concentrate on the basepair structures

just help to neutralize the negative phosphate charge and thereby help topack the molecule into a more dense structure, other proteins actively changethe structure by causing local openings to form in the double helix. DNA heli-cases (Fig. 3.19) hydrolyze ATP when they are bound to single strands of DNA.The ATP hydrolysis changes the helicase structure and thereby enables it to per-form mechanical work (see Secs. 6.1.1 and 7.5). Helicases propel themselvesthrough the helix by separating the two strands and exposing the locally dena-tured DNA to other proteins, such as DNA polymerases. The situation is evenmore complex in eukaryotes, where the majority of DNA is located in thenucleus and packed into distinct units called chromosomes. A human chromosome

Basics of Nucleic Acid Structure 73

FA

x

x

x

x

x

x x

x

x

x

x

x

x

x

x

x

y

yy

yy

yy y

yy

y

y

y

y

yy

y

z

zz

zz

zz z

zz

z

zx

z

z

zz

z

Propeller

Opening

Buckle

Stagger

Shear

Stretch

Shift

Slide

Rise Twist

Roll

Tilt

xdisplacement inclination

ydisplacement tip

Images created with 3DNAillustrating positive valuesof designated parameters

3'

5'

5'

3'

III Coordinate frame

FIGURE 3.17 The Cambridge convention’s definition of the standard parameters used todescribe the relative orientation of bases in a double helix. (Reprinted with permission fromLu X-J and Olson WK. (2003) 3DNA: a software package for the analysis, rebuilding, and visu-alization of three-dimensional nuclei acid structures. Nucl Acids Res 31: 5108–5121. Copyright(2003) Oxford University Press.)

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 73

Page 13: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Getting DNA to bend

•••••

•••••

•••••

•••••

C

Combined B.A and B.C deformations tighten the bending of DNA:

Combined B.A and B.C deformations tighten the bending of DNA:

Global bend:360°/75 bp

left-handed superhelix

A

Page 14: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

The nucleosome core particle

is composed of three α-helices connected by two loops. This structure is preservedin all eukaryotes. There are hundreds of hydrogen bonds, salt linkages andhydrophobic interactions formed between the histone octamer and the correspon-ding DNA helix. The DNA structure is not uniformly wound but several distor-tions and kinks can be seen.

Each of the core histones has a long N-terminal tail, which is subject toseveral modifications in the cell. These changes strongly influence the nucleo-some and its higher structures and play important regulatory roles, for exam-ple during gene expression. The highly variable histone modifications mayaffect the interactions between the DNA and the histones as well as betweendifferent nucleosomes. In addition, the pattern of modifications may be recog-nized for the binding of other proteins. In particular, histone modifications canregulate the transcription of a certain part of the genome (Chap. 7 and App. E).The combination of histone modifications has been called the histone code.It has been suggested that these modifications work as combinatorial signalsthat direct the binding of different molecules and subsequent events. Whenthe DNA is replicated for a daughter cell the structure of the chromatin isalso copied. Thus, the histone code is also copied. This phenomenon is calledepigenetics.

The nucleosomes are packed into higher level chromatin structures. The firstlevel is the 30-nm fiber, where an additional histone, H1, plays the important role

76 A Textbook of Structural Biology

FA

FIGURE 3.21 Left: A dimer of histone proteins H3 (blue) and H4 (light blue). Right: Nucleosomestructure. The octameric complex of histone proteins forms the center and the DNA is woundaround. The color scheme of the histone subunits in the core particle is the same as in Fig. 3.20(PDB: 1KX3).

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 76

Page 15: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

DNA binding to the histone core proteins

Page 16: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Comparison to elastic rod models

Bending rigidity: A = M(2πνn)2

LP4n

; a = A/(kBT )

Twisting rigidity: C = I(2Lνn)2

n2

Stretching rigidity: Y = ML(2νn)2

n2

Lord Rayleigh, The Theory of Sound, 1894

Page 17: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Bending rigidity for linear duplex DNA

n=1 n=2 n=3

Bending motions

n GB frequencies Analytical frequencies1 0.114 0.116 0.1002 0.294 0.295 0.2753 0.522 0.532 0.539

d(GACT) 60 base pairs

a =594 Å 550 ÅA = 2.44∗10−19erg.cm 2.26∗10−19erg.cm

Page 18: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Stretching rigidity for linear duplex DNA

Stretching motions

d(GACT) 60 base pairs

n GB frequencies Analytical frequencies1 0.664 0.6192 1.289 1.2373 1.807 1.846

Y = 1502 pN 1000−1500 pN

n=1 n=2 n=3

Page 19: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Salt dependence of bending and stretching are not the same

Exp: Baumann, Smith, Bloomfield, Bustamante, PNAS 94, 6185 (1997)

Theory: Bomble & Case, Biopolymers 89, 722 (2008)

Page 20: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Sequence dependence of bending rigidity

d(GACT)

d(GC)

d(AT)

dG.dC

dA.dT

d(CTG)

d(CGG)

r1 r2 r3 r4 r5 r6 r7 r8 r9 r10

r11

r12

r13

r14

r15

r16

r17

r18

r19

r20

r21

r22

r23

r24

r25

r26

r27

r28

r29

r30

S equence (60 bp)

450

500

550

600

650

700

750

800Persisten

celeng

th(Å

)

d(CGG)

dA.dT

dG.dC

Page 21: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Now consider circular DNA

νn = f (Ω,R,∆Tw ,n,ρ)R is the circle radius

∆Tw is the excess twist

Ω = C/A

ρ is the mass density

Matsumoto, Tobias, Olson,JCTC 1, 117 (2005)

Page 22: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

In-plane and out-of-plane modes for circular DNA

n GB frequencies Analytical frequencies GB frequencies Analytical frequencies2 0.165 0.172 0.121 0.166 0.178 0.1163 0.394 0.452 0.366 0.443 0.471 0.3774 0.672 0.724 0.697 0.704 0.778 0.734

Relaxed minicircle with 94 base pairs Overtwisted minicircle with 94 base pairs

In plane bending motions

n=3n=2

n=2

Out of plane bending motions

n GB frequencies Analytical frequencies GB frequencies Analytical frequencies2 0.217 0.231 0.211 0.211 0.248 0.2353 0.525 0.545 0.496 0.520 0.542 0.5514 0.793 0.851 0.864 0.802 0.901 0.958

Relaxed minicircle with 94 base pairs Overtwisted minicircle with 94 base pairs

n=3

Page 23: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Moving on to RNA

10 nucleotides per turn, RNA prefers the A-form with 11–12 nucleotides per turn.In DNA, the base pairs are centered over the helix axis. In an RNA double helix,the base pairs slide ∼ 5 Å away from the helix center. All these factors contributeto the tighter packing of the RNA double helix.

The surface of an RNA helix is also quite different from the DNA double helix.The major groove of RNA is very narrow and deep, accentuated by the fact thatRNA does not have the thymine methyl group, which resides in the major groove.In contrast, the minor groove is wide and shallow. For this reason, the major andminor grooves in RNA are more descriptively referred to as the deep groove and theshallow groove. The deep groove of RNA is a favored binding site of cations, watermolecules and protein side chains.

3.4.2 Primary, Secondary and Tertiary Structure

We can classify the different structural levels of RNA in the same way as withproteins. The secondary structure is determined by base pairing. In addition tothe normal Watson–Crick base pairs, mismatches such as GU base pairs arefound. The secondary structure can form complicated patterns like the cruci-form of tRNA. The secondary structure is formed by hydrogen bonds, with anenergy of ∼ −12kJ/mol. The formation of just a single base pair tends to be ener-getically unfavorable. The stacking of base pairs gives the largest stabilizingenergy contribution of ∼ −23kJ/mol. The stacking energies of base pairs are notsymmetric, i.e. stacking of GC on AU is not the same as the stacking energy ofAU on GC.

The most complex structural level is the tertiary one. The tertiary structureis formed by the spatial organization of the secondary structural elements, i.e.the stem-loop regions. A loop may form hydrogen bonds to another part of the

80 A Textbook of Structural Biology

FA

PO

C5’ OBase

OP

5.9Å3’

2’

PO

C5’O

OP

Base7.0Å

3’

2’

C3’-endo C2’-endo

FIGURE 3.23 The two types of sugar pucker most commonly found in nucleic acids. TheC3′-endo pucker is prevalent in RNA and A-form DNA, whereas the C2′-endo pucker ischaracteristic of B-form DNA. It is seen that the C3′-endo pucker produces a significantlyshorter phosphate-phosphate distance in the backbone, resulting in a more compact helicalconformation.

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 80

Page 24: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

RNA has more base-pairing possibilities

bond acceptors on guanine (N7 and O6), and an acceptor and a donor on adenine(N7 and N6). In reverse Hoogsteen base pairing, the two chains are parallel,meaning that the pyrimidine partner is flipped, and the ribose rings end up in thetrans position.

Figure 3.25 shows a standard Watson–Crick GC base pair (cis WC/WC) com-pared to the GC reverse base pair (trans WC/WC) that can be obtained between par-allel backbone strands. Since parallel strands do not result from the usual stem-loopfolding pattern of RNA, reverse Watson–Crick interactions are tertiary interactions.

Another fact, apparent in three-dimensional structures of RNA, is that the dis-tance between backbones is different in the cis and trans configurations (Fig. 3.26).The distance between backbone strands is shorter in the AU Hoogsteen pair thanin the AU reverse Hoogsteen pair.

The term “wobble” base pairing was proposed by Francis Crick to account forthe non-complementary GU base pairings observed in codon–anticodon interac-tions, manifested in the degeneracy of the genetic code (Chap. 8). Figure 3.27 showsthe GU wobble base pair, which is one of the most common alternative base pair-ing patterns, and the GU reverse wobble, where the uracil group is simply flipped

Basics of Nucleic Acid Structure 83

FA

N

NHN

N

N

O

R

H

H N

N

N

O R

H

H

GC reverse

N

NHN

N

N

O

R

H

H

N

N

N

O R

H

H

GC Watson-Crick

FIGURE 3.25 Left: Canonical Watson–Crick GC base pair (cis). Right: GC reverse Watson–Crickbase pair (trans).

AU Hoogsteen AU reverse Hoogsteen

N

NN

NN

R

H

H

N

NH

O

O R

AU reverse base pair

N

N

N

N

N

R

H

H

N

NH

O

O R

N

N

N

N

N

R

H

H

N

NH

O

O R

FIGURE 3.26 Left: AU Hoogsteen base pair. Center: AU reverse Hoogsteen base pair.Right: AU reverse Watson–Crick base pair. The blue dashed line shows the line of symmetryused to define the cis/trans conformation of the base pair. The AU Hoogsteen base pair is thuscis-H/WC, and the AU reverse Hoogsteen is trans H/WC.

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 83

Page 25: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

RNA has more base-pairing possibilities

bond acceptors on guanine (N7 and O6), and an acceptor and a donor on adenine(N7 and N6). In reverse Hoogsteen base pairing, the two chains are parallel,meaning that the pyrimidine partner is flipped, and the ribose rings end up in thetrans position.

Figure 3.25 shows a standard Watson–Crick GC base pair (cis WC/WC) com-pared to the GC reverse base pair (trans WC/WC) that can be obtained between par-allel backbone strands. Since parallel strands do not result from the usual stem-loopfolding pattern of RNA, reverse Watson–Crick interactions are tertiary interactions.

Another fact, apparent in three-dimensional structures of RNA, is that the dis-tance between backbones is different in the cis and trans configurations (Fig. 3.26).The distance between backbone strands is shorter in the AU Hoogsteen pair thanin the AU reverse Hoogsteen pair.

The term “wobble” base pairing was proposed by Francis Crick to account forthe non-complementary GU base pairings observed in codon–anticodon interac-tions, manifested in the degeneracy of the genetic code (Chap. 8). Figure 3.27 showsthe GU wobble base pair, which is one of the most common alternative base pair-ing patterns, and the GU reverse wobble, where the uracil group is simply flipped

Basics of Nucleic Acid Structure 83

FA

N

NHN

N

N

O

R

H

H N

N

N

O R

H

H

GC reverse

N

NHN

N

N

O

R

H

H

N

N

N

O R

H

H

GC Watson-Crick

FIGURE 3.25 Left: Canonical Watson–Crick GC base pair (cis). Right: GC reverse Watson–Crickbase pair (trans).

AU Hoogsteen AU reverse Hoogsteen

N

NN

NN

R

H

H

N

NH

O

O R

AU reverse base pair

N

N

N

N

N

R

H

H

N

NH

O

O R

N

N

N

N

N

R

H

H

N

NH

O

O R

FIGURE 3.26 Left: AU Hoogsteen base pair. Center: AU reverse Hoogsteen base pair.Right: AU reverse Watson–Crick base pair. The blue dashed line shows the line of symmetryused to define the cis/trans conformation of the base pair. The AU Hoogsteen base pair is thuscis-H/WC, and the AU reverse Hoogsteen is trans H/WC.

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 83

Page 26: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

RNA has more base-pairing possibilities

around the axis of the amine hydrogen bond. The GU wobble base pairing resultsin the loss of a hydrogen bond from the guanine, but the vacant amino group oftenforms hydrogen bonds to other bases nearby, perhaps in concert with the neigh-boring imino group. The GU wobble base pairings can be viewed as a canonicalWatson–Crick pattern, with a shift of the pyrimidine partner.

The GU wobble pair has geometric properties that enables it to fit very wellinto a regular A-form helix, and therefore frequently substitute for regular Watson–Crick pairs. In certain chemical environments, the A base can become protonatedat the N1 position. When this happens, the A base is able to form hydrogen bondswith a C base, forming an (A+):C base pair. The geometry of the (A+):C base pairis isosteric with the GU wobble pair.

3.4.4 Base Triplets: A Prominent Tertiary Structural Motif

The alternative base pairing patterns described in the previous section lead to a richselection of multiple base interactions. Whenever a base pair is formed from twonucleotides, several hydrogen donors or acceptors are still available for alternateinteractions to occur, either from the amino acid side chains of proteins, or fromother nucleotides. The most important of these multi-base interactions are basetriples, important in maintaining the tertiary structure of RNA molecules. Severalexamples of triple base interactions will be presented in the following sections.

The possibility of triple base interactions were first realized in 1957 by Felsenfeld,who demonstrated that a (poly-A):(poly-U) duplex molecule could interact with asecond poly-U strand to create a triple strand complex (Fig. 3.28). The additionalpoly-U strand interacts with the major groove of the duplex by forming Hoogsteenbase pairs with the poly-A strand of the duplex. Later it was found that severalother sequence combinations can lead to triple helix formation, as long as theycontain two pyrimidines and one purine, for example C:G-C. Triple helix forma-tion has been proposed to have a role in gene repair.

84 A Textbook of Structural Biology

FA

N

NHN

N

NH2

O

R

N

NH

O

O R

GU wobble

N

NHN

N

NH2

O

R

N

NH

O

O R

GU reverse wobble

FIGURE 3.27 Left: GU wobble. Right: GU reverse wobble.

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 84

Page 27: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

RNA uses chemically-modified bases

thought to preserve a flanking cleavage site or perhaps to repress and to regulatetranslation of genes in the vicinity.

3.4.5 RNA Contains Modified BasesPosttranscriptional modification of RNA results in an even greater diversity ofmodified bases than in DNA. This is especially true in functional RNAs such astRNA and rRNA, where both ribose and base can be modified. The modified basescan profoundly change the chemical characteristics of the RNA molecule and cancontribute to the stability of the molecule as well as partake in its external inter-actions and reactions.

A telling example of the importance of posttranslational modification is givenby an E. coli isoleucine-specific tRNA2

Ile, specific for the codon AUA. This tRNAcontains a modified base, lysidine, at the first position of the anticodon. The baselysidine is a cytidine that has been posttranslationally modified by the addition ofthe amino acid lysine to the C2 position (Fig. 3.32). If the lysidine residue is replacedby the native cytidine, a marked reduction of isoleucine incorporation is the result,and surprisingly, the appearance of methionine-accepting activity. How can this be?It turns out that in E. coli, the cognate isoleucyl tRNA ligase recognizes and chargestRNAIle only if the lysidine residue is present in the anticodon loop. Furthermore,the anticodon sequence of tRNA2

Ile is CAT, which normally codes for methionine.If lysidine is absent, the tRNA will instead be recognized and mischarged by theCAT-recognizing methionyl tRNA ligase. So a single posttranslational modificationis responsible for both the codon and amino acid specificity of this tRNA.

Basics of Nucleic Acid Structure 87

FA

NH

NH

O

O

H

H

H

H N

NH

S

O

R

N NH

NH2

O

R

N

N

NH2

O

R

N

NH

N

N

O

R

NH

NH

O

O

R

N

N

N

N

NH

R

NH

O O-

NH3

+

N

N+

NH2

R

N

N

N

N

N

O

R

NH2

H3COOC

COOCH3

dihydrouracil (D) 4-thiouracil (S4U) 3-methylcytosine (m3C) 5-methylcytosine (m5C) Inosine (I)

pseudouridine (Ψ) N6-methyladenine (m6A) lysidine (L) wyosine (Y)

FIGURE 3.32 Examples of modified bases in RNA. Modifications are marked in red. R standsfor ribose.

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 87

Page 28: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

RNA motifs: tetraloops

with kissing hairpin duplexes that can form in solution are yeast tRNAAsp (GUC)and E. coli tRNAVal (GAC) complex. In these, the tRNAs have complementaryanticodons.

The RNA backbone has six degrees of freedom for each residue whereas thepolypeptide backbone only has two. This extreme flexibility allows single-strandedRNA regions to adopt a wide range of conformations. Nevertheless, a couple ofsingle-stranded RNA motifs are particularly common. The first is the S-turn,where an S shape is formed by two consecutive bends in the phosphate-sugarbackbone and distinguished by inverted sugar puckers. The S-turn motif is foundin the ribosomal loop E motif and the sarcin-ricin loop (see Sec. 8.3). The otherimportant single-stranded RNA motif is the U-turn, which is a sharp bend in thebackbone between the first and second nucleotides, followed by a distinctivestacking of the second and third nucleotides. Hydrogen bonds between the firstand third residues often stabilize the motif. U-turns are typical of GNRA loopsand are also found in the TΨC loop of tRNA.

3.4.7.1 The GNRA tetraloop

Tetraloops are a particularly common motif in RNA structures. An especiallywell-known case of this hairpin loop is the GNRAa loop motif, which closes the

Basics of Nucleic Acid Structure 91

FA

aR stands for puRine; N stands for aNy; Y stands for pYrimidine.

FIGURE 3.36 Three-dimensional structures of various tetraloop folds. Left : GNRA loop from 5SrRNA (PDB: 1JJ2). The first G in the loop stabilizes the loop by hydrogen bonding to the fourthmember. Middle: ANYA loop from MS2-RNA complex (PDB: 1DZS). Bases one and two form astacking interaction, while bases of three and four of the loop are looped out and poised to inter-act with other species. Right: UNCG tetraloop from 16S rRNA (PDB: 1BYJ). The first U and thelast G in the tetraloop interact via hydrogen bonds, while bases of one and two in the loop forma stacking interaction. The third base in the loop is available for interaction with other species.

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 91

Page 29: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Stems, bulges, loops, internal loops

An example of an internal loop that has gained a lot of attention is the loop Emotif, first recognized in the 1980s when it was found that similar, conservedinternal loops in eukaryotic 5S rRNA and in PSTV viral RNA shared a surprisingdisposition for cross-linking when exposed to UV light (Fig. 3.38). Later, the loopE motif was found in the sarcin-ricin loop of ribosomal 23S rRNA, which isinvolved in the binding of elongation factors EF-Tu and EF-G. In E. coli 5S rRNA,loop E is known to constitute a specific binding site for ribosomal protein L25(Fig. 3.39). It was therefore apparent that this motif was an important site of activ-ity and molecular recognition. Both 5S rRNA and the sarcin-ricin loop have beenintensively studied by both high resolution NMR and crystallography.

Loop E is an asymmetric internal loop characterized by a highly conservedstack of seven non-Watson–Crick base pairs. The hydrogen-bonding pattern isparticularly well conserved. The first base pair is an AG sheared pair, followed bya UA trans Hoogsteen pair, a bulged G base, and finally a trans Hoogsteen AA basepair with locally parallel backbone conformation (Fig. 3.38). The AU reverseHoogsteen base pair observed in loop E is the most abundant AU interaction inthe ribosomal RNAs after the normal Watson–Crick base pair.

94 A Textbook of Structural Biology

FA

CCA

C

C C C C C

CC

C CC C C C

CCCCC

C

C

C

C

CCC

CC C C

CCC

C

C C

C

C

CC

G

GG

G G G G GG

G

G G

G

GGGG

GG

G G G G

G

G

G GG

GGGGG

G

GG

GG

G

G A

A

AA A

A AA A

A A A

A AA

A A A

AAA

A

UU

U UU

UU

U

U

U

U UU

UU

UU

3’ 5’

Loop D Helix 4 Loop E Helix 5 Loop A Helix 2 Loop B Helix 3 Loop C

Helix 1

10

20 3040

5060708090

100 110

120

FIGURE 3.39 Top: Secondary structure of H. marismortui 5S rRNA observed in the crystal struc-ture of large ribosomal subunit. Bottom: Schematic representation of the three-dimensionalstructure of 5S rRNA. The orientation is similar to the figure above.

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 94

Page 30: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Even mRNA has structure

shallow groove (Fig. 3.49). In a fourth, less frequent type 0 motif the ribose ofthe A base interacts with a ribose at the receptor helix backbone (Fig. 3.49). The type0 interaction is not base specific because it is the ribose of the inserted residue thatfills the shallow groove, and in fact, type III is not base specific either. Nevertheless,in type 0 and type III interactions, the contacts between the inserted base and thereceptor helix are optimized when the base is an adenine. The inserted A base ofthe A-minor motif has a strong preference for GC receptor base pairs, which areoptimally complementary in shape and hydrogen bonding pattern.

3.4.8 RNA is a Carrier of Genetic InformationRNA has a crucial role as a facsimile, or messenger, of genetic information fromthe chromosomes to the ribosomes. Messenger RNA (mRNA) is a molecule ofRNA encoding a copy of a single gene or a transcriptional unit. mRNA is firsttranscribed from a DNA template by RNA polymerase enzymes (Chap. 7) andcarries coding information to the ribosomes where protein synthesis takes place(Chap. 8). In some viruses single- or double-stranded RNA constitutes the viralgenome (Chap. 15).

In the 5′ end (the “head” end), the eukaryotic messenger RNA contains amethylguanosine (m7G) cap, a modified guanine nucleotide that has been addedto the mRNA at the time of transcription (Fig. 3.50). The presence of the cap iscrucial for the recognition of the mRNA by the ribosome, and in addition it pro-tects the mRNA from being degraded by ribonucleases. In the 3′ end, the enzymepolyadenylate polymerase adds a sequence of adenine nucleotides to the tail of thepre-mRNA. This poly(A) tail can be up to several hundred nucleotides long.

In addition to the coding region, mRNA contains regions that are not translatedto protein but have a role in mRNA stability, mRNA localization and translationalefficiency. These untranslated regions (UTRs) are present before the start codon(5′ UTRs) and after the stop codon (3′ UTRs). They contain areas of well-defined

102 A Textbook of Structural Biology

FA

AAAAAAAAAAAAAAAAAAAAAAAA

3’ UTR5’UTRm7G cap poly(A) tail

FIGURE 3.50 Schematic representation of eukaryotic mRNA showing the 5′ cap, the codingregion (red), and the 5′ and 3′ UTRs.

b541_Chapter-03.qxd 11/20/2008 10:56 AM Page 102

Page 31: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

More on RNA secondary structure

Secondary Structure: small subunit ribosomal RNA

Saccharomyces cerevisiae:

Mitochondrion

Domain: Bacteria

Eucarya Phyla: Fungi, Ascomycota

April 1994 (V00704)

5’

3’

I

II

III

A G U A A A A A A U U U A U A A G A A U A

U G A

U G U U G G U U C A

G A

U U A A

G C

G C

U

A A A

U A

A G

G A

C A

U G A

C G C

A U A C

G

A

G U C A U

A C G U U U A U U A U U G A U

A A G A U A A U A A A

U A U G U G G U

G

U A

A A C G

U G A

G U A

A U U U

U A U U A G G

A A

U U A A

U G

A A C U A U A G A A U A A G

C U A

A A

U A C U U A A U A U

A U U A U U A U A U A A A A

A U

A A U U U A U A U A A U A A

A

A

A

G G A U A U A U A U A U A A U A U A U A U U U A

U C U A U A G

U

C A A

G

C C A A U A A U

G G

U U

U

A G

G U

A G

U A

G G

U U

U A U

U A A G A G

U U A

A A

C C

U A

G C

C A A C

G A U

C C

A U

A A U C G A

U A A U G A A A

G U U A G A

A C G A U C

A C

G

U U G A C U C U G A A A

U A U A G U C A A U A

U C

U A U A

A G

A U

A C A G

C A G U G A G

G

A A U A

U U

G G A C A A

U G A U C G A A

A G A U U G A

U C C A G U

U A C

U U

A U

U A G

G A U G A U A U A U A A A A A U A U U U U A U U U

U A U U U A G U U C C G G G G C C C G G C C A C G

G A A C C G G A C C C G A A A G G A G A A A U A U

U A A A U A U U U A U A A U A A U A A U A A U A A

U A A U A U A U A U A U A U A A A U U G A U U A A

A A A U A A A A U C C A U A A A U A A U U A A A A

U A A U G A U A U U A A U U A C C A U A U A U A U

U U U A U A U G G A U A U A U A U A U U A A U A A

U A A U A U U A A U U U A U U A U U A U U A A U A

A U A U A U U U U A A

U A G U C C U G A C

U A A U A U

U U G U G

C C

A G C

A G U C G

C G G U

A A

C A C A A A G

A G G G C G

A G

C G

U U

A A

U C

A

U

A A U G G U

U U A

A A

G G

A U

C C

G U A

G A

A U

G A A U U A U A U A U U A U A A U U

U A

G A G U U A A U A A

A A U

U A A U U A A

A G A A

U U A U A A U A G U A

A A G A U G A

A A U A A U A

A U A A U A A U U A U A A

G A C U

A A U

A U

A U G U G

A A

A A U A U U A A U U A A

A U A U U A A C U

G

A

C A

U U G A

G G

G A

U U A A

A A C U A G A G U

A G C G A A

A C G G

A U U

C G A U A C

C C G U

G U A

G

U U C U A G U A G U

A A A C

U A

U G A A U A C A A

U U

A U

U U

A U

A A

U A U A U

A U

U A

U A U

A U

A A

A U

A A U

A A A

U G A

A A

A U

G A

A A G U A U U C C A C C U G A A G A G U A C G U U A G

C A

A U A A U G A A

A C U C A A A A C A A U A G A C G

G U U A C A G A C U U A A

G C

A G

U G

G A A C A U G U U A U U

U A A U

U C

G A

U A A U C C

A C G

A C U A A

C U U U A

C C A U A U U U

U G A A

U A

U U

A U

A A U A A U U A U

U A U A A U U A

U U

A U

A U U

A C A G G

C G U U A C

A U U

G U U

G U C U U U A

G U U C G U G C U G

C A A A G U

U

U

U A G A

U U

A A A U

G U G C

A U A

A A C G A C C

A A A A

C U C C

A U

A U

A U A U A

A U U U U U U A A U A

U

A U A U U A A A A A

U A

U U

A U U U A U U

A A U A U A A A

G A

A A

G G A A U U A A G A C

A A A U C

A U A

A

U G A U

C C U U

A U A

A U A U G G

G U A

A U

A G A C G U G C U A U A A

U A

A A

A U

G A U A

A U A A

A A

U U A

U A U A

A A

A U

A U A U U U A

A U

U A U A U U U A A

U U A A U A A U

A U

A A

A A C A U U U U

A A

U U U U U

A A U A U

A U

U U U

U U U A

U U

A U

A U

A U U

A A

U A

U G A A U

U A U A A U C

U G A

A A U U C G A U U A U A

U G A A A A

A A

G

A

A U

U G

C U

A G

U

A

A U A

C G

U A

A A U

U A

G U A

U G U

U A

C G

G U

G A A U A

U

U

C U A

A C U G U U U C G C A

C U A A U C A C

U

C A

U C

A

C G C G

U U G A

A A C A U A U U A U U

A U C U

U A U U A U U U A U A U A A U A U U U U U U A A U

A

A A

U A U U A

A U

A

A

U U A U U A A U U U A U A U U U A U U U A U A U C

A G A A A U A A U A U G A A

U U A A

U G C G

A A

G U

U G

A A

A U

A C

A

G

U U A C C G U A G G G G

A A C C U G C G G U G G

G C U U A U A A A U A U C U U A A A U A U U C U U A C A

Fortunately, a lot of interestingRNA is shorter than this, or canbe thought of in terms ofdomains

predicting secondarystructure from sequence

the inverse folding problem:finding sequences that arecompatible with structure

making the 2D⇒ 3Dtransition

force field simulations ofnucleic acids

Page 32: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Nearest-neighbor energy function

Ther modynamics of stems and loopsSerra & Tur ner, Meth. Enzymol. 259, 242 (1995)

Lu, Tur ner & Mathews, NAR 34, 4912 (2006)

A AU U

Hairpin: G AC-GG-CG-C

∆G loop = ∆G loop(6 ) + ∆G(CGGA

) + ∆G(GA

) = +2. 9 kcal/mol

∆Gstem = ∆G(GGCC

) + ∆G(GCCG

) = −6. 3 kcal/mol

∆Ghairpin = ∆G loop + ∆Gstem

Similar calculations for ∆H , ∆S , and Tmelt

Page 33: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

A slightly more complex structure:

Predicting RNA secondary structure by free energy minimization 161

CU UA A AC GUUU

G

C

G

C

U

G

UCAC

AUAC

A U CU U G C U

A

G

C

C

G

C

G

U

A

U

A

GG C A

UAAU

AU

CAAA

U

AUUACGGCCG A

CAA

UAAU

UAGUGCUGUGA

AUAC

G

U

A

U

G

C

G

C

A

U

U

A

G

C

G

C

UUU

A

U

G

AGCUGAUCGGC

UAG

GCGGUUAUAGCCG

GG

C

A

U

A

U

C

G

UUC

GAGAU

AG

A

AAU

AU

A

G

U

C

G

U

A

A

U

G

C

G

C

AA

A

CUG

ACCA

AA U G

G

C UCU

U UC

U U A U

CAAA

UA

5

3

Hairpin Loop

Multibranch Loop(Helical Junction)

Double Helix

Internal Loop

Bulge LoopGU

GCAG

AG

AAAC

Fig. 1 A typical RNA secondary structure. This is the secondary structure of the 3′ untranslated region of the Drosophila teissieri R2 element[124,125]. Examples of secondary structure motifs are labeled. Base pairs form helical regions. Unpaired regions are called loops; a hairpin loopchanges the backbone direction by 180, a bulge loop is an interruption in base pairing in one strand, an internal loop is the interruption of basepairing in both strands, and a multibranch loop (helical junction) is a loop from which more than two helices exit. The structure was drawn usingthe XRNA program, available from the Santa Cruz RNA Center at http://rna.ucsc.edu/rnacenter/

G

C

C

G

G

C

A GA

U

G

C

G

C

G CC

GAAAGG

-3.4

-2.4

-2.1

-3.3

-2.5

+5.4-0.2-1.7

∆G˚ = -1.7 - 3.4 - 2.4 - 0.2 - 2.1 - 3.3 - 2.5 + 5.4= -10.2 kcal/mol

5

3‘

Fig. 2 Sample nearest neighbor calculation. The free energy incrementsof each motif are indicated and the total stability is the sum of eachincrement. Stabilizing interactions are provided by base pairs and basestacks, e.g. the 3′ dangling G and the GA mismatch in the hairpin loop.Loops are largely destabilizing because of the entropic cost associatedwith constraining the unpaired nucleotides in the loop. For example, thesix-membered hairpin loop has a +5.4 kcal/mol free energy cost for loopclosure. The 2×2 internal loop is overall stabilizing, however, becauseof the favorability of tandem GA mismatches

ago [36]. In the intervening time, the models for base pairand loop stability have been refined on the basis of opti-cal melting experiments [37–46]. Parameters are chosen toexpress the sequence dependence of stability and the parame-ters have generally become increasingly sequence-dependentas more experimental data have become available. In general,the error of each parameter is less than 0.5 kcal/mol [33–35].The development of the nearest neighbor parameter modelshas been reviewed previously [47,48].

3 Dynamic programming algorithm

For the prediction of RNA secondary structures fromsequence, it is clear that a brute-force method for free energyminimization will never be tractable for anything but veryshort sequences [49]. Consider that it has been estimatedthat the number of possible secondary structures for a givensequence is approximately 1.8N , where N is the length of thesequence [50]. For a short sequence of 100 nucleotides, thisis already 3.4×1025 structures. Given that a single computerprocessor can calculate the free energy for 1×104 structures

Page 34: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

From sequence to secondary structure:

1 A very nice folding program (Windows only) is RNAstructure:http://rna.urmc.rochester.edu/rnastructure.html

1 sequence editing, secondary structure prediction2 dynalign, for multiple sequence alignment3 partition function analysis, gives much improved confidence levels

for the predictions

2 Prediction of secondary structure as a web service:http://www.bioinfo.rpi.edu/~zukerm/ (classical approach) or athttp://sfold.wadsworth.org/ (statistical sampling of the Boltzmanndistribution)

3 Reviews: D.H. Mathews, Theor. Chem. Acc. 116, 160 (2006); J.Mol. Biol. 359, 526 (2006)

Page 35: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

RNAmotif: scanning genomes for secondary structure motifsA simple example: looking for tetraloops

desch5 (minlen=4, maxlen=6) ss (seq ="ˆuucg$" ) h3

retur ns:

AEURR16S 0.000 0 1035 12 tccc ttcg gggaAGIRRDX 0.000 0 180 14 ggagc ttcg gctccAGIRRDX 0.000 0 181 12 gagc ttcg gctcAHNRRDX 0.000 0 181 12 gagc ttcg gctcANNRRO 0.000 0 39 12 c ccc ttcg ggggAOSRR18S 0.000 0 175 12 gccc ttcg gggcAPBRR16SD 0.000 1 1321 12 tacc ttcg ggtaAPSRRE 0.000 0 227 12 cacc ttcg ggtgAQF16SRRN 0.000 1 39 12 a gcg ttcg cgctAQF16SRRN 0.000 1 40 14 c agcg ttcg cgctg

.....

Page 36: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Motif for an artifically generated DNA enzyme

Hypothesis: this motif would self-destruct and never be found in realgenomes

Page 37: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Looking for a analogue of an artificial DNA enzyme

parms

wc += gu;

descr

h5( tag=’h1’, len=5, mispair=1 )

ss( len=1, seq="r" )h5( tag=’h2’, minlen=5, maxlen=8, seq="^y", mispair=1 )

ss( minlen=4, maxlen=200 )

h3( tag=’h2’, seq="r$" )ss( tag=’ggct’, len=15, mismatch=1, seq="^ggctagcnacaacrr$" )

h3( tag=’h1’ )

Results:

Arabidopsis thaliana chr II sect 146/255:AC004747 1.000 1 15537 171 ttgtt a tccgc ttt...(135)...tgggtgga

ggctaTccacaacaa ggtgg

Page 38: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

How can we generate descriptors?

search with strict2o struct descriptor

choose seq withlow T.E. and lookfor conserved nt

include conserved ntin descriptor, searchwith looser 2o structconstraints

This general procedurecan often be used to findconserved nucleotides infamiles of structures, andto search for specifictypes of RNA in genomicdatabases.

Page 39: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Example: the tRNA motif106 A Textbook of Structural Biology

FA

conserved

partly conserved

Anticodon

10

20

40

70

50

30

60

15

25

45

75

55

35

65

D-loop

Variable loop

Anticodon stem

TΨC-loop

Acceptor stem

G

GG

G

G

m2Gm5G

m5C

G

G

GG

G G

G

Gm

GG

G G

G

A

A

AAA

A

A

A

AA

AA

A

A

A AmA

CC

C

CC

CC

C

C

C

Cm

C

C

C

C

UU

UU

U

U

m7G

GA

C U

U

U

U U

U

Y

Ψ

ΨT

DD

m2G2

(a)

FIGURE 3.52 (a) Secondary structure of tRNAPhe from yeast. (b) Schematic representation of thethree-dimensional folding of the tRNA molecule, using the same color scheme as in (a).

b541_Chapter-03.qxd 11/20/2008 10:57 AM Page 106

106 A Textbook of Structural Biology

FA

conserved

partly conserved

Anticodon

10

20

40

70

50

30

60

15

25

45

75

55

35

65

D-loop

Variable loop

Anticodon stem

TΨC-loop

Acceptor stem

G

GG

G

G

m2Gm5G

m5C

G

G

GG

G G

G

Gm

GG

G G

G

A

A

AAA

A

A

A

AA

AA

A

A

A AmA

CC

C

CC

CC

C

C

C

Cm

C

C

C

C

UU

UU

U

U

m7G

GA

C U

U

U

U U

U

Y

Ψ

ΨT

DD

m2G2

(a)

FIGURE 3.52 (a) Secondary structure of tRNAPhe from yeast. (b) Schematic representation of thethree-dimensional folding of the tRNA molecule, using the same color scheme as in (a).

b541_Chapter-03.qxd 11/20/2008 10:57 AM Page 106

Page 40: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Looking for tRNA’s

Page 41: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Generating an initial tRNA descriptor for E. coli

1. Descriptor with no sequence requirement. GU’s are allowed, and nomispairs are allowed.

Result: 26 tRNAs not found, 5 false positives,all with higher Turner energies than true tRNAs.

2. Analyze sequence conservation for 50 hits withlowest Turner energies.Found 11 nucleotides thatare 100% conserved.

3. Include conserved nucleotides in descriptor,but allow a mispair in each helix.

Result: 2 tRNAs not found, no false positives.

NN

N N NNN

NNN

lgth=7

lgth=3-4

lgth=5

lgth=5

lgth=8-11

lgth=7

lgth=4-22

3’

5’ NN NN

NN

N

N

UN

N U UCN

ANN

lgth=7

lgth=3-4

lgth=5

lgth=5

lgth=8-11

lgth=7

lgth=4-22

3’

5’ NC CA

NU

G

C

Page 42: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Generating an optimized tRNA descriptor for E. ColiOptimized tRNA descriptor for E. coli

NN

N

NN

N

N

N UC

NN

NN

lgth=7

lgth=3−4

lgth=5

lgth=5

lgth=8−11

lgth=7

lgth=4−22

helixsingle strand

3’

5’ NCCA

(GU basepairs not allowed; one mispair per helix; no sequence mismatches)Perfor mance for K12 and O157:H7: no false positives,

one missing tRNA for K-12,one previously unidentified tRNA located

Page 43: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

A more general bacterial tRNA descriptor

UN

N

NU

G

C

U UCN

ANN

lgth=7

lgth=3−4

lgth=5

lgth=5

lgth=8−11

lgth=7

lgth=4−22

lgth=4

helixsingle strand

3’

5’

a.a. stem

D arm

a.c. armV loop

T arm

GU pairsgenerally notallowed

One mispairper helix isallowed

Onesequencemismatch isallowed

Page 44: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Results for bacteria using this descriptor

Results using Optimized Descriptor

Organism total false false# tRNA neg. pos.

−−−−−−−− −−−−−− −−−−− −−−−−E.Coli K−12 86 1 15E.Coli O157:H7 95 0 20B. Subtilis 87 1 18Aquifex aeolicus 43 0 4H. influenzae 56 0 7Myc. pneumoniae 36 1 6

False positives can be further distinguished from true tRNAsusing Turner energies.

(We show below how to eliminate false positives.)

Page 45: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Analysis using nearest-neighbor energies

true tRNA−40

−20

0

20

−20

0

20

Turn

er E

nerg

ies

−20

0

20

false pos. true tRNA false pos.

Bacillus subtilis

testE. coli (K−12)

Haemophilus influenzae

E. coli (O157:H7)

Aquifex aeolicus

Mycoplasma pneumoniae

Page 46: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

Moving to eukaryotes

What about eukaryotes?organism tRNAs tRNAs with introns

S. pombe 153 39S. cerevisiae 273 59arabidopsis 620 83drosophila 284 15C. elegans 584 34human 496 28

S. cerevisiae:• tRNAscan-SE found 275 tRNAs, 59 with introns• we modified our descriptor to allow an 8-60 base insert

after the sixth postion of the anticodon loop• rnamotif then found all but 3 of the tRNA’s

identified by tRNAscan-SE• rnamotif also found 1301 false positives,

but all had high Tur ner energies

Page 47: Biophysical Chemistry 1, Fall 2010 Web assignment: http ...casegroup.rutgers.edu/lnotes/dnab.pdf · The sugar-phosphate backbone main categories, E and T. If four atoms of the aldofuranose

“Solving” the inverse folding problem (yeast):

tRNA folding energies for S. cerevisiae

true tRNA

−30

−20

−10

0

10

20

30

Turn

er E

nerg

y

01505−15202530354045505560657075808590951001051101151201251301351401451501551601651701751801851901952002052102152202252302352402452502552602652702752802852902953003053103153203253303353403453503553603653703753803853903954004054104154204254304354404454504554604654704754804854904955005055105155205255305355405455505555605655705755805855905956006056106156206256306356406456506556606656706756806856906957007057107157207257307357407457507557607657707757807857907958008058108158208258308358408458508558608658708758808858908959009059109159209259309359409459509559609659709759809859909951000100510101015102010251030103510401045105010551060106510701075108010851090109511001105111011151120112511301135114011451150115511601165117011751180118511901195120012051210121512201225123012351240124512501255126012651270127512801285129012951300false pos