View
224
Download
2
Embed Size (px)
Citation preview
Free energy calculations
General methods
Free energy is the most important quantity that characterizes a dynamical process.
Two types of free energy calculations:
1. Path independent methods for calculation of relative binding free energies (e.g. free energy perturbation (FEP) , thermodynamic integration(TI).
2. Path dependent methods for calculation of absolute binding free energies ,e.g. umbrella sampling (US) with weighted histogram analysis method (WHAM), steered MD with Jarzynski’s equation (JE) 1
Example : Binding of a K+ ion to gramicidin A (gA)
Initial state (A): K+ ion in bulk, a water molecule at the binding
site.
Final state (B): K+ ion at the binding site, water in place of the ion.
In the path independent method, we calculate the binding energy
of the K+ ion from the free energy difference between the two
states:
gA
(bulk) (bulk)
A B 2
)()()( AGBGBAG
W K+
+ K+ gA + W
In the path dependent method, we first choose a continuous
path from the binding site to bulk water (reaction coordinate). In
the case of gramicidin, the channel axis is the obvious choice for
the reaction coordinate. The free energy profile of the K+ ion
along this path is calculated using a method such as umbrella
sampling. The free energy of binding is given by the difference
in free energy between the binding site and the bulk.
3
K+ K+
gramicidin bulk B A
)()()( AGBGBAG
Free energy perturbation (FEP)
Free energy differences can be calculated relatively easily and
several methods have been developed for this purpose. The
starting point for most approaches is Zwanzig’s perturbation
formula for the free energy difference between two states A and B:
The equality should hold if there is sufficient sampling.
However, if the two states are not similar enough, this is difficult to
achieve and there will be a large hysteresis effect (i.e. the forward
and backward results will be very different).
)()(
expln)(
expln)(
/)(
/)(
ABGBAG
kTGGABG
kTGGBAG
B
kTHHBA
A
kTHHAB
BA
AB
4
Derivation of the perturbation formula
From statistical mechanics, the Helmholtz free energy is given by
(we will assume it is the same as Gibss free energy and use G for
it)
(Z: partition function)
Where it is assumed that the states A and B are similar
A
kTHH
A
kTH
B
kTH
AB
AB
AB
kT
ekTekT
GGBAG
/)(
//
expln
lnln
)(
5
kTHkTH
kTHkTH
ekTdpdqe
dpdqeekT
ZkTG
//
//
lnln
1ln
FEP with alchemical transformation
To obtain accurate results with the perturbation formula, the
energy difference between the states should be < 2 kT, which is
not satisfied for most biomolecular processes. To deal with this
problem, one introduces a hybrid Hamiltonian
and performs the transformation from A to B gradually by
changing the parameter from 0 to 1 in small steps. That is, one
divides [0,1] into n subintervals with {i, i = 0, n}, and for each i
value, calculates the free energy difference from the ensemble
average
BA HHH )1()(
ikTHHkTG iiii /))]()((exp[ln)( 11
6
1
01)()10(
n
iiiGG
The total free energy change is then obtained by summing the
contributions from each subinterval
The number of subintervals is chosen such that the free energy
change at each step is < 2 kT, otherwise the method may lose
its validity. Points to be aware of:
1.Most codes use equal subintervals for i. But the changes in
Gi
are usually highly non-linear. One should try to choose i
such that Gi remains around 1-2 kT for all values.
2.The simulation times (equilibration + production) have to be
chosen carefully. It is not possible to extend them in case of
non-convergence (have to start over).
7
1
0
)(
d
HG
Thermodynamic integration (TI)
Another way to obtain the free energy difference is to integrate
the derivative of the hybrid Hamiltonian H(:
This integral is evaluated most efficiently using a Gaussian
quadrature.
In typical calculations for ions, 7-point quadrature is sufficient.
(But check that 9-point quadrature gives the same result for
others)
The advantage of TI over FEP is that the production run can be
extended as long as necessary and the convergence of the free
energy can be monitored (when the cumulative G flattens, it has
converged).
8
H
dpdqe
dpdqeH
d
dGkTH
kTH
/
/
Example: Free energy change in mutation of a ligand
A very common question is how a mutation in a ligand (or
protein) changes the free energy of the protein-ligand complex.
+ AGA
Gbulk(AB) Gbs(AB)
+ GB B
Thermodynamic cycle9
)()( BAGBAGGG bulkbsAB
Applications
1.Ion selectivity of potassium channels
2. Selectivity of amino acid transporters (e.g. glutamate transporter)
3. Free energy change when a sidechain is mutated in a bound ligand.
Similar calculation as above. Important in developing drug
leads from peptides.
10
)()(
)()()(
NaKGNaKG
KGNaGNaKG
bulkbs
bbsel
)()(
)()()(
GluAspGGluAspG
AspGGluGGluAspG
bulkbs
bbsel
2. Path dependent methods
Consider the previous example of binding of a K+ ion to the gramicidin channel. In the path dependent method, K+ ion is moved from bulk to the binding site in small steps and the free energy profile, W(z)
(also called potential of mean force or PMF), is constructed.
The relative binding free energy is given by
The binding constant and the absolute binding free energy are
determined from the PMF by invoking a 1D approximation
11
)()()( bulkWbsWbsbulkG
0
)(23)(
ln CKkTG
dzeRrdeK
eqbind
bs
bulk
kTzW
Vol
kTrWeq
Calculation of PMF from umbrella samplingOne samples the ligand position along a reaction coordinate and
determines the potential of mean force (PMF) from the Boltzmann
eq.
)(
)(ln)()()()(
00
/)]()([0
0
z
zkTzWzWezz kTzWzW
Here z0 is a reference point, e.g. a point in bulk where W vanishes.
In general, a particle cannot be adequately sampled at high-
energy
points. To counter that, one introduces harmonic potentials, which
restrain the particle at desired points, and then unbias its effect.
For convenience, one introduces umbrella potentials at regular
intervals along the reaction coordinate (e.g. ~0.5 Å). The PMF’s
obtained in each interval are unbiased and optimally combined
using the Weighted Histogram Analysis Method (WHAM).
12
Points to consider in umbrella samplingTwo main parameters in umbrella sampling are the force constant, k Two main parameters in umbrella sampling are the force constant, k
and the distance between windows, d. In bulk, the position of the and the distance between windows, d. In bulk, the position of the
ligand will have a Gaussian distribution given byligand will have a Gaussian distribution given by
The overlap between two Gaussian distributions separated by dThe overlap between two Gaussian distributions separated by d
The parameters should be chosen such that 10% > % overlap > 5%The parameters should be chosen such that 10% > % overlap > 5%
If the overlap is too small, PMF will have discontinuities If the overlap is too small, PMF will have discontinuities
If it is too large, simulations are not very efficient.If it is too large, simulations are not very efficient.
)8/(1% derfoverlap
kTkzzezP Bzz /,,
21
)( 02)( 22
0
Steered MD (SMD) simulations and Jarzynski’s
equationSteered MD is a more recent method where a harmonic force is
applied to an atom on a peptide and the reference point of this
force is pulled with a constant velocity. It has been used to
study unfolding of proteins and binding of ligands. The discovery
of Jarzynski’s equation in 1997 enabled determination of PMF
from SMD, which has boosted its applications.
)]([,. 0
//
tkW
ee
f
i
kTWkTF
vrrFdsF
Jarzynski’s equation:
Work done by the harmonic force
This method seems to work well in simple systems and when G is large
but beware of its applications in complex systems! 14
Beware of: 1) Problems with force fields
The force fields that are commonly used in MD simulations (e.g.,
CHARMM, AMBER, GROMOS) neglect the polarization interaction.
While the effects of induced polarization have been included in a
mean
field sense by boosting the partial charges, such an approximation
is
expected to work only in the environment where the force field
has
been optimized but not in a different situation.
The most relevant case is the force fields for proteins, which are
optimized for bulk water. One has to be wary of using the same
force fields for membrane proteins because lipid molecules have a
very
different polarization characteristic compared to water (dielectric
constants are 2 and 80, respectively)
Other cases that require caution are: interfaces and highly
charged ions.
15
2) Problems with samplingAt zero temperature, the potential function U is sufficient to
characterize the system completely. At room temperature, the
fundamental quantity is the free energy, F = U TS, which
creates the
sampling problem. Example: F=24, U=41, and TS=17
(kJ/mol)
for liquid water at STP.Statistical weight:
kTxUexP /)(~)(
But if S2 >> S1
we may have
F2 < F1
16
• Dimer formed by two right-
handed β helices
• Each monomer consists of 16
amino acid residues
• Pore is 26 Å long, 4 Å in
diamet.
• Structure is stabilized by
hydrogen bonds
• Occupied by a single-file
water chain (~7)
• Water dipoles are aligned
with the channel axis
• Conducts monovalent cations
at diffusion rates (divalent ions
bind and block)
Examples from gramicidin A channel
17
1. Potential energy profile for a K+ ion in gramicidin A
BD simulations – inverting data gives | MD simulations – Pot. mean
force
Uw = 8 kT, Ub = 5 kT, Uw = 5 kT, Ub = 22 kT
18
Free energy calculations
Free energy differences are calculated using the thermodynamic
integration (TI) and free energy perturbation (FEP) methods.
e.g. a K+ ion in bulk is translocated to the gA center while the water
molecule at that position is translocated to the ion’s position.
Two step process (via a neutral water, W0) to minimize fluctuations:
W W0 K+ (gA center)
K+ W0 W (bulk)
To check hysteresis effects, free energy
differences are calculated both in
forward (G+) and backward (-G-)
directions.
FE (kT) G+ G_ Gav
TI 11.2 12.2 11.7
FEP 13.2 14.0 13.6
PMF 13.0
19
Free energy of translocating a K+ ion to the gA center
Running average
for 700 ps
Solid: forward
(bulk to gA)
Dashed: backward
1. Convergence:
The free energy plot
should become flat
2. No hysteresis:
The two results
should agree within 1
kcal/mol
Distribution of water dipole
moments in bulk and in
gramicidin
In the presence of a K+ ion,
the dipole moment of
hydration waters decreases
in bulk but increases in the
gramicidin A channel.
Ab initio simulations in
gramicidin show the
importance of polarization
int.
21
Electrostatic energy of a K+ ion + 6 waters
22
Each window is
simulated for 400 ps
Well depths:
Ub(K) ~ 7 kT
Ub(Ca) ~ 2 kT
Ca2+ binding to gA
and blocking of K+
ions cannot be
explained.
*** Problems with
divalent ions ***
PMF results for K+, Ca2+ and Cl ions
23
Lessons from the gramicidin simulations
1. Current force fields which ignore polarization are not expected
to work in narrow pores where water and ions form a single file.
Ab initio MD calculations indicate that hydration waters of a K+
ion are more polarized in gA than in bulk.
2. Hydration waters around a divalent ion are more polarized than
those of a monovalent ion.
Example: dipole moment of water from ab initio calculations:
Bulk water: 3.0 Debye
Hydration shell of K+ ion: 2.8 Debye
Hydration shell of Ca2+ ion: 3.4 Debye
Thus the current force fields, which are optimised for
monovalent ions, cannot work well for divalent ions.
24
2. Sampling problem in a simple vs complex system:
Test of Jarzynski’s EquationCarbon nanotube Gramicidin A channel
25
Comparison of K+ ion PMF’s obtained from umbrella sampling &
WHAM
and from Jarzynski’s equality using steered MD simulations
Carbon nanotube Gramicidin A channel
v(A/ns)
26
Sampling is more difficult in non-equilibrium methods
1. In a carbon nanotube, interaction of the K+ ion (and the
hydration waters) with the C atoms on the wall are short range,
hence equilibration of the system is quite fast.
In such a situation, Jarzynski’s Equation works as well as
umbrella sampling, and because it is simpler to implement, it
would be the method of choice
2. In the gramicidin channel, the K+ ion (and the hydration waters)
interact with the charged atoms on the protein wall. Because
Coulomb interaction is long range, equilibration takes more
time.
In such cases, Jarzynski’s Equation is not very reliable, and
umbrella sampling should be preferred for accurate results.27
Equilibration and convergence issues in PMF
calculations
• Finite resources means we need to make optimal choices for
equilibration and production times in free energy calculations.
• Equilibration is the initial simulation data, where the system is
still evolving (not equilibrated yet) and must be thrown away.
Choosing it too short will blemish the result and too long will
waste computing time.
• During production, the system is fluctuating around
equilibrium. It must be run long enough to allow the system
to sample all energetically important states. Otherwise the
calculations will not be accurate. Convergence tests can be
used for this purpose but note that there are no absolute
criteria that one can use (running longer is the only choice if
you are in doubt).
28
The ion-ion potentials in force fields are determined from
combination rules with no direct experimental input . This is not
satisfactory and any guidance from ab inito calculations would be
very useful.
In the examples below the PMF’s for the dissociation of Na-Cl and
Ca-Cl ion pairs are calculated from ab initio MD (Car-Parrinello MD)
simulations using the constraint-force method (faster than umbr.
samp). The average force needed to keep the ions at a fixed
distance, r, is calculated for a range of r values at 0.1 - 0.2 A
intervals and these are integrated to determine the PMF.
Note that ion-water dynamics is fast which makes these
picosecond
ab initio calculations feasible. They would not be feasible for
proteins.
Example: ab initio calculation of PMF’s for Na-Cl and Ca-Cl
Example: PMF for dissociation of Na-Cl
Total run is 6 ps. The data is divided into 1 ps blocks to check
equilibration
Here 2 ps of data are dropped and the PMF is obtained from the last 4
ps.1-2 ps equilibration3-6 ps production
(black line)
Another way to check the equilibration is to drop successively more
data for equilibration and see if the result changes.
r = 3.1 A
r = 3.9 A
r = 4.7 A
Comparison of ab initio and classical PMF’s for Na-Cl
None of the classical force fields can match the ab initio PMF. In
particular AMBER has a deep contact min. which leads to
crystallization
Accelerated MD for speeding up convergence
Using biasing potentials in the low energy regions of the potential
energy surface, barriers can be lowered, leading to faster
convergence.
For the Na-Cl PMF considered here, this leads to ~4-fold speed up.
Example: PMF for dissociation of Ca-Cl
Convergence could not be obtained after 23 ps of ab initio
simulation.
Inspection of the forces shows large variations according n(Ca).
Ca hydration
numbers
n(Ca) = 5
r<3.7
n(Ca) = 5 or 6
for r 3.7 – 4.9
n(Ca) = 6
r>4.9
Ca-Cl PMF and its dependence on n(Ca)
The PMF with n(Ca) = 5 in the intermediate region is
unphysical
Switching to n(Ca) = 6 yields a reasonable PMF .
Comparison of ab initio and classical PMF’s for Na—Cl
The CHARMM force field does a better job than that of Dang &
Smith but it still needs to be improved.
Example: PMF for a K+ ion in the Kv1.2 potassium channel
The trigger for permeation of K+
ions is the entry of the K+ ion at
cavity to the S4 binding site.
To find out whether the K+ ion
can bind to S4 while S1 – S3 are
occupied, or they have to move
to S0 - S2 to enable binding, two
PMF’s are constructed with the
final states S1 – S3 – S4 and
S0 – S2 – S4.
The first check in PMF
calculations is whether
there are sufficient
overlaps between the
neighbouring windows.
This can be achieved by
visually checking the
density plots for all the
windows (top) or by directly
calculating the overlaps
and plotting them in a bar
graph (bottom).
Next decide on the
equilibration time and start
collecting density data. How
long do we run?
An efficient way to decide is to
run in small blocks and check
for convergence in the
accumulated data.
In the example here, the total
run is 600 ps and 100 ps is
dropped for equilibration.
To show convergence, data are
added in 100 ps blocks and a
PMF is constructed from the
accumulated data at every 100
ps
One acumulates a great deal
of trajectory data during the
PMF calculations, and it would
be pity if all one extracts from
it is the reaction coordinate of
the ion.
A detailed picture of the
reaction process can be
obtained using visualisation
methods (e.g. making a
video!)
Here is an example showing
that in the S1 – S3 – S4 PMF,
the cavity ion does not trap
the water at S4 as it moves
in.
The K+ ions in the filter move
together, e.g., from S1 – S3 to
S0 – S2 or S2 – S4.
This can be studied by
constructing the PMF for the
center of mass of the ions.
But how do we know that this
is a reliable reaction
coordinate?
A simple way to show this is
to plot the distribution of ion-
ion distances and show that it
remains Gaussian as the pair
moves across the filter (as if
they are connected by a
spring)
Example: PMF for binding of charybdotoxin to K+ channel
From the previous examples, we have seen that ions equilibrate
quite fast (~100 ps) and < 1 ns production run is sufficient for
PMF. For complex ligands, the
situation is obviously more
complicated.
For one thing, the ligand may be
distorted, which will lead to
erroneous results.
One also requires much longer
equlibration of the system
(typically > 1 ns), and longer
production runs ( > 1 ns).
Convergence of the toxin PMF
Force constant
k=20 kcal/mol/A2
Umbrella
windows
at 0.5 A
Each color
represents 400
ps of sampling.
The first 1.2 ns is
dropped for
equilibration and
PMF is obtained
from the last 2
ns
(black line)
Equilibration and convergence issues in FEP & TI
1. FEP calculations
• In FEP, one has to decide on the number of windows and the
equilibration time in advance. The windows are created
serially, so if the equilibration time is inadequate, it has to be
repeated using longer equilibration time and the initial data
are wasted.
• A second potential problem in FEP calculations is the
requirement that Gi remains around 1-2 kT for all windows.
Because the change in the free energy is nonlinear, it is very
difficult to guess the number of windows one should use. For
the same reason, using fixed intervals is not optimal.
Exponentially spaced intervals would reduce the required
number of windows by half.
44
Example: Na+ binding energy in glutamate transporter
Window G(Na+; b.s. bulk)
40 eq. 22.9
60 eq. 26.3
65 exp. 27.1
Free energy change G at each step of FEP calculation
Exponential versus equal spacing for
The interval [0, 0.5] is mapped to an exponential for 40 windows.
(Fold it over to get the interval [0.5, 1] )
exp.
equal
2. TI calculations
• In TI , one only need to specify the number of windows in
advance. The data can be divided into equilibration and
production parts later. Moreover, one can continue
accumulating data if there is a problem with convergence,
thus there is no wastage of data.
• Convergence can be monitored by plotting the running
average of the free energy. Flattening out of the curve is
usually taken as a sign for convergence.
• Because small number of windows are used in TI,
equilibration may prove difficult in some systems. An initial
FEP calculation with large number of windows can resolve this
problem (choose the TI windows from the nearest FEP
window).
48
Example: Na+ and Asp binding energies in glut.
transporterTI calculation of the
binding free energy of
Na+ ion to the binding
site 1 in Gltph.
Integration is done
using Gaussian
quadrature with 7
points.
Thick lines show the
running averages,
which flatten out as
the data accumulate.
Thin lines show
averages over 50 ps
blocks of data.
Asp binding energy in glutamate transporter
TI calculation of the
binding free energy of
Asp to the binding site
in Gltph.
Asp is substituted with
5 water molecules.
First 400 ps data
account for
equilibration and the 1
ns of data are used in
the production.