Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Modeling Nucleic Acid Structures
Prediction of Non-Canonical Base
Pairs in RNA
Dhananjay Bhattacharyya
Computational Science Division
Saha Institute of Nuclear Physics
Kolkata
E-mail: [email protected]
A Biological Cell
Major Biological Processes
• Cell Division (Mitosis or Meioses)
• Conversion of Chemical Energy to
Mechanical Energy (action of muscle)
• Reasoning or Thinking in Brain (through
exchange of Electrical Signal)
• Protein synthesis
• Others
Molecules present in a Cell
• Proteins, (such as Hemoglobin)
• Nucleic Acids (Deoxyribonuceic Acid or Ribonucleic Acid)
• Carbohydrate (Mono-Saccharide, Di-Saccharide, Poly-Saccharide)
• Lipid (in Cell Membrane)
• Water (maximum amount in a cell)
• Salt (Cations: Na+, K+, Mg2+ and Anions: Cl-)
• Others small molecules: Heme, Cholesterol, ATP, etc
OO
O
U U U A G C
G A A A U C G
Na4a3a2a1
mRNA
Gene
RNA polymerase Promoter
sequence
mRNA
Central Dogma of Molecular Biology: DNA RNA Protein
A
T
G
C
DNA as observed
R. Benfante, N. Landsberger, G. Tubiello and G. Badaracco
Nucl. Acids Res. 17, 8273 (1989)
C. Bustamante, J. Vesenka, C.L.Tang, W.
Rees, M. Guthold, and R. Kellers
Biochemistry 31, 22 (1992)
2d sin =n
L. Bragg
J. Kendrew
M. Perutz
L. Pauling
F. Crick
…And many
DNA double
helices without
any ligand
IUPAC-IUB suggested NUPARM
1. Dickerson, Bansal, Calladine, et al. (1989) EMBO J. 8: 1
2. Olson, Bansal, Burley, et al. J. Mol. Biol. (2001) 313:
229-237
Weisstein, Eric W. "Euler Angles." From MathWorld--A Wolfram Web
Resource. http://mathworld.wolfram.com/EulerAngles.html
E( )= 100
0)cos()sin(
0)sin()cos(
zR
)cos()sin(0
)sin()cos(0
001
xR
Quaternion Transformation
Q=a + ib + jc + kd (representing rotation)
with i2= -1, etc
=cos(a/2) + u sin(a/2), u = unit vector
Q-1 = a – ib – jc – kd
Position vector v=0 + ivx+ jvy+kvz
Rotated coordinates v’ = qvq-1
Base Pair Step Parameter (NUPARM v.1)
Y1
Y2
X2
X1
Xm = (X1 + X2) / | (X1 + X2) |
Ym = (Y1 + Y2) / | (Y1 + Y2) |
Zm = Xm x Ym
Tilt = 2 sin-1( Zm . Y1)
Roll = 2 sin-1( Zm . X1)
Twist = cos-1 (( X1 × Zm) . ( X2 × Zm))
Bansal, Bhattachary
ya & Ravi (1995)
CABIOS 11, 281
Shift, Slide and Rise, in
similar way
Definition and Nomenclature of Intra
Base Pair Parameters (IUPAC-IUB)
and NUPARM version 2.0
Mukherjee, Bansal & Bhattacharyya (2006) J Comput. Aided Mol. Des., 20, 629-45.
Base pair parameters Determination
Buckle = 2 sin-1( Zm . Y1)
Opening = 2 sin-1( Zm . X1)
Propeller = cos-1 (( X1 × Zm) . ( X2 × Zm))
Shear = -Xm . M
Stagger = Ym . M
Stretch = Zm . M
Xm = (X1 + X2) / | (X1 + X2) |
Ym = (Y1 + Y2) / | (Y1 + Y2) |
Zm = {(X1 + X2) x (Y1 + Y2)}/ {| (X1 + X2) | | (Y1 + Y2) |}
Mukherjee, Bansal & Bhattacharyya (2006), J Comput Aided Mol
Des, DOI 10.1007/s10822-006-9083-x
Partial list of DNA crystal structures
available at http://ndbserver.rutgers.edu
bd0001 12: A C C G A C G T C G G T
bd0003 12: A C C G G T A C C G G T
bd0004 12: C G C G A A T T C G C G
bd0006 10: G G C C A A T T G G
bd0011 12: C G C A A A T A T G C G
bd0014 12: C G C G A A T T C G C G
bd0015 10: C C G C C G G C G G
bd0017 9: C G C G C G G A G
bd0018 11: G C G A A T T C G C G
bd0019 12: G G C G A A T T C G C G
bd0022 12: A C C G G C G C C A C A
bd0023 10: C C A G T A C T G G
Bd0024 10: C C G A A T G A G G
Standard Reference frame of a Watson-
Crick base pair
Propeller
5’
5’
Calladine & Drew 1982 J. Mol. Biol.)
Basepair parameters of bdl001
C G C G A A T T C G C G
Roll Variation in Crystal Structures
Roll of d(AA).d(TT) doublets
0
10
20
30
-15 -10 -5.4 -0.9 3.72 8.29 12.9
Roll
Oc
cu
rre
nc
e
Roll of d(CG).d(CG) doublets
0
10
20
30
40
-17 -10 -4.3 1.78 7.87 14 20.1
Roll
Oc
cu
rre
nc
e
Roll of d(GC).d(GC) doublets
0
5
10
15
20
-27 -21 -12 -5.4 1.15 7.68 14.2
Roll
Oc
cu
rre
nc
e
Input geometry parameter (Roll, Tilt, Twist etc.)
and (Propeller, Buckle, etc.)
Input Ideal Base Pair Coordinates
Convert Geometry parameters to
Helical Sense (Analytical Relations)
Apply Rotations/Translations to two
Base Pairs in Helical Sense
Repeat the procedure for
polymer generation
100º
1.5Å
5.4Å
Helix:
(R, , z) (x, y, z)
Orientation of nth residue == Orientation
of 1st residue
2/1
221
2/1
221
2/1
2/11221
4)1()1(2
cot2
1sin
4)1()1(2
cot2
1sin
sincos2
sinsin
RTTRTRR
RTTRTRT
TRTR
hh
hh
h
hhh
hhh
hhhhz
hhhhy
hhhhx
B
B
BS
BBS
BBS
coscos]cos1[
sinsincos2
sinsincos2
sin2sincos2
sin2sincos2
3
2
1
32
31
2sin;
2sin 22 TR
Curved DNA models built
from Crystal parameters(A3G7)n
(A6G4)n
(A10)n
Analysis and Generation of Double Helical Structure of
DNA: Sequence Directed DNA Curvature – Prediction of
Promoter Regions in Genomic sequences
Natural sequences having different RL
Synthetic sequence
(CGCAAAAAAG)n
with large RLApplied to predict Promoter Sequences
Parameter Variability
Double Helical DNA Flexibility depends on Base
Sequence
Sequence Persistence Length, P
566 ÅMixed Sequence
Poly(dG).poly(dC) 816 Å (Rigid, expt.)
Poly(dA).poly(dT) 1174 Å (Rigid, expt)
Poly(dA-dT).poly(dA-dT) ~1100 Å
Poly(dA-dG).poly(dC-dT) ~ 800 Å
Poly(dG-dC).poly(dG-dC) 412 Å
Poly(dA-dC).poly(dG-dT) ~600 Å
Poly(dC-dT-dG).
poly(dC-dA-dC)
397 Å (Flexible, expt.)
Poly(dC-dG-dG).
poly(dG-dC-dC)
410 Å (Flexible, expt.)
D. Bhattacharyya, S.
Kundu, A.R. Thakur & R.
Majumdar (1999) J. Biomol.
Struct. Dynam.17, 289.
“A DNA Structural Atlas for E-coli”, Pedersen, A.G. et
al (2000) J. Molecular Biology, 299, 907-930
TATA Box binding to TBP
OO
O
U U U A G C
G A A A U C G
Naaaa
mRNA
RNA polymerase Promoter
sequence
mRNA
Cellular functions: DNA RNA Protein
Intron
Structural Motifs of RNA
Helix
Hairpin
loop
Bulge
loop
Continuous
stack
Kissing
loop
D
H
A ABHA
R1
R2
van der Waals
Approximate Hydrogen Bond
van der Waals + Coulomb
Approximate H-Bond + Coulomb
Ener
gy
(kca
l/m
ol)
Hydrogen Bond
Possibility of Unusual Base
Pairing in RNA
Base Pair Finder
Took a base edge
Identify the H-bonding centers (N3G & N2G)
Look for H-bond partner through distance
calculation (N6A & N7A)
Check linearity of pseudo-angles
C6G-N3G-N6A
N3G-N6A-N1A
N1G-N2G-N7A
N2G-N7A-N9A
Confirm orientation through angle calculationGives rise to:
1822 A:U W-W(C);
6056 G:C W-W(C) and
847 G:U W-W(C) base pairs
Das, Mukherjee, Mitra & Bhattacharyya (2006) J Biomol Struct Dynam, 24, 149-161
127
Variants
with
TWO
H-
bonds
between
the
Bases/
sugars
G:C W:W C
A:U W:W C
G:U W:W C
A:G H: S T
A:U H:W T
A:A H:H T
G:A W:W C
G:A S:W T
A:A W:W T
A:U W:W T
A:A H: W T
A:U H:W C
G:G S:S T
G:G H:W T
A:C W:W T
C:U W:W T
A:C H:W T
G:G H:WC
G:C W:W T
A:G s:s T
AA HHT
AG SST
AG HST
AU HWT
AU HWT
Double helical fragment from
ribosomal RNA (PDB ID: 1N32)
S. Halder and D. Bhattacharyya (2010) J. Phys. Chem. B 114: 14028
Different Non-canonical Basepairing Motifs In RNA
Double Helices
Motif RNA Type Organism
G:A S:HT
A:G H:ST
23S rRNA
Haloarcula marismortui
Thermus thermophilus
Escherichia coli
Deinococcus radiodurans
16S rRNAEscherichia coli
Thermus thermophilus
Riboswitch Synthetic
A:A s:hT
A:U H:WT
A:G H:ST
16S rRNAEscherichia coli
Thermus thermophilus
G:A S:HT
A:G H:ST
A:G H:ST
23S rRNA
Haloarcula marismortui
Escherichia coli
Thermus thermophilus
U:G S:WC
U:U W:WC23S rRNA Haloarcula marismortui
G:A S:HT
A:U H:WT
A:G H:ST
23S rRNA Thermus thermophilus
A:G W:WC
A:G W:WC23S rRNA
Thermus thermophilus
Escherichia coli
Motif RNA Type Organism
G:A S:HT
G:A S:HT
A:G H:ST
U:U W:WC
16S rRNAThermus thermophilus
Escherichia coli
G:A S:HT
G:A S:HT
A:G H:ST
23S rRNA Thermus thermophilus
U:U W:WC
U:U W:WC
23S rRNA Haloarcula marismortui
IRES RNA Cricket paralysis virus
G:A S:HT
A:G H:ST
G:G H:zT
23S rRNA Escherichia coli
G:G z:HT
U:A W:HT
A:G H:ST
16S rRNAThermus thermophilus
(in very few structures)
U:C W:WC
U:U W:WC23S rRNA
Deinococcus radiodurans
(in very few structures)
B-DNA:
1BNA.pdb
A-DNA:
1ZEX.pdb
Regular RNA
oligonucleotide:
1QCU.pdb
Backbone generated by CHARMM through
Restrained Energy Minimization
RNA fragments with Non
Watson-Crick Base Pairs:
fragments from 2AW4.pdb and
1N32.pdb
Helix
RMSD with regenerated
structure
Average RMSD with
similar crystal structures
All-atom Base-atom All-atom Base-atom
B-DNA (1BNA) 0.535 0.148 0.438 0.318
A-DNA (1ZEX) 0.399 0.175 x x
A-RNA (1QCU) 0.332 0.110 x x
U:U W:WC
(1J5A)0.475 0.357 1.082 1.035
A:G W:WC
(1FJG)0.497 0.396 1.035 0.904
G:U W:WC
(1N33)0.529 0.251 1.007 0.988
G:A S:HT
A:G H:ST
(2AW4)
0.560 0.169 0.391 0.377
A:A s:hT
A:U H:WT
A:G H:ST
(1N32)
0.602 0.249 0.225 0.218
1N32-Helix
Buckle Open Propeller Stagger Shear Stretch
1G:18U W:WC-8.47 3.77 -12.25 0.07 -2.23 2.85
-8.19 3.80 -12.53 0.08 -2.20 2.91
2C:17G W:WC-4.28 2.66 -19.84 0.00 0.56 2.80
-4.16 2.62 -19.93 -0.01 0.57 2.80
3A:16A s:hT-19.05 0.97 3.15 -0.16 2.75 2.83
-19.04 0.63 3.40 -0.15 2.70 2.84
4A:15U H:WT-4.18 3.54 -7.30 -0.25 -0.41 2.87
-3.73 3.47 -7.45 -0.24 -0.36 2.93
5A:14G H:ST-15.18 14.72 -10.81 0.13 2.77 3.10
-11.74 15.56 -14.48 0.13 3.43 3.05
6C:13G W:WC15.09 1.93 -17.29 -0.35 -0.13 2.72
15.04 1.90 -16.94 -0.36 -0.11 2.73
7C:12G W:WC8.40 3.17 -1.20 0.01 -0.05 2.88
7.84 3.24 -2.92 0.01 -0.13 3.04
8G:11C W:WC-3.30 0.87 -7.51 -0.47 -0.06 2.68
-3.33 0.77 -7.35 -0.47 -0.07 2.69
9G:10C W:WC-22.23 0.40 -9.15 -0.14 0.19 2.82
-22.07 0.34 -9.34 -0.14 0.18 2.82
Electrostatic Potential
RNA Secondary
Structure Prediction –
Modern Biology is
based on it !....
Structural Motifs of RNA
Helix
Hairpin
loop
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
-12 -8 -4 0 4 8 12 16 20 24 28 32 36
Roll
Fre
qu
en
cy
0
0.05
0.1
0.15
0.2
0.25
0.3
-4 0 4 8 12 16 20 24 28 32 36 40 44
Twist
Fre
qu
en
cy
Non-canonical
base step
Canonical
base step
A:A s:hT
G:C W:WC
Tk
FF
P
P
B
nwcwc
iwc
inwc exp
U:A W:HT
G:A H:ST
RNA 2d-structure
prediction – Free-
energy of stacking
MD Simulations from Crystal Structure
S. Halder and D. Bhattacharyya (2010) J. Phys. Chem. B 114: 14028
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11
Fre
qu
en
cy
C1'-C1' distance in Å
30S subunit of ribosome
Requirements:
Free-energy of Stacking
Modeling base pair steps
1FJG 1J5A 1N33 1N32 2AW4
6.08 5.15 5.53 5.27 5.35
5.75 5.67 5.62 5.05 5.80
5.38 5.09 5.52 7.31 5.50
5.27 6.39 5.33 5.34 5.21
5.33 5.16 5.30 5.00 6.10
5.89 5.39 5.65 5.90 4.99
5.54 5.72 5.47 5.19 5.49
5.30 4.99 5.13 5.86 5.50
6.25 6.29 5.57 5.92 5.45
4.79 5.26 5.60 5.78 5.76
5.54 5.29 5.70 5.40 6.40
5.62 5.36 5.62 5.32 5.67
5.28 5.85 6.37 5.57
5.56 5.35 5.36 5.14
5.34 4.98 4.68
5.45 5.11 7.17
5.51 5.27
5.45 5.64
5.72 6.39
5.60 4.90
5.86
5.19
5.90
5.00
5.34
7.31 5.055.27
5.92
5.78
5.40
5.32
6.37
5.36
4.98
5.11
Overlap between Successive Base Pairs
N1 surface dots
N2 surface dots
N surface dots
Overlap area = {(N1+N2)-N}/5
Implemented in NUPARM v2
3
5
7
9
11
18
20
22
24
26
28
30
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
C1
'-C
1' d
ista
nce
in Å
Ove
rlap
in Å
2OverlapC1'-C1' 1st strandC1'-C1' 2nd strand
A:A H:HTC:G W:WC
Twist
5
7
9
11
13
12
14
16
18
20
22
24
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
C1
'-C
1' d
ista
nce
in Å
Ove
rlap
in Å
2
OverlapC1'-C1' 1st strandC1'-C1' 2nd strand
A:G W:ST
Twist
-20,-2.5 +20,-2.5
+20,+2.5-20,+2.5
Generated Structures for different
combinations of Roll and Slide
Calculated Interaction Energy by
quantum Chemical Method (DFT, etc)
ΔE=Ec-∑Em
Stacking Energy using Dispersion Corrected DFT
Mukherjee, SenthilKumar, Bansal & Bhattacharyya (2013) Bioopolymers (Epublished)
Summary
• Analyze Structures of DNA double helix, get their
avg. structural parameters Predict Structure and
Flexibility of chromosomal sequence
• Obtain structures of different non Watson-Crick base
pairs stacked on canonical ones
• Obtain Feasible Structures with good stacking
• Obtain Stacking Interaction energy using DFT
• Predict Relative Free Energy of different sequences
• Predict Secondary Structure of RNA ????
Funded and Supported by
CSIR, DBT and CDAC-Pune & DAE
Dr. Shayantani Mukherjee
Sukanya Halder
Prof. Manju Bansal (IISc)
Jhuma Das
Sankar Basu
Sanchita Mukherjee
SenthilKumar, D.K. (IISc)
Prof. Rahul Banerjee
Prof. Devapriya Choudhuri (JNU)
Prof. Abhijit Mitra (IIIT-H)
Purshottam Sharma (IIIT-H)
Arvind Marathe (IISc)
Sudhanshu Sankar (JNU)
Dr. Sudipta Samanta
Rahul Pal
Thank You