Free energy calculations General methods Free energy is the most important quantity that characterizes a dynamical process. Two types of free energy calculations:

Free energy calculations

General methods

Free energy is the most important quantity that characterizes a dynamical process.

Two types of free energy calculations:

1. Path independent methods for calculation of relative binding free energies (e.g. free energy perturbation (FEP) , thermodynamic integration(TI).

2. Path dependent methods for calculation of absolute binding free energies ,e.g. umbrella sampling (US) with weighted histogram analysis method (WHAM), steered MD with Jarzynski’s equation (JE) 1

Example : Binding of a K+ ion to gramicidin A (gA)

Initial state (A): K+ ion in bulk, a water molecule at the binding

site.

Final state (B): K+ ion at the binding site, water in place of the ion.

In the path independent method, we calculate the binding energy

of the K+ ion from the free energy difference between the two

states:

gA

(bulk) (bulk)

A B 2

)()()( AGBGBAG

W K+

+ K+ gA + W

In the path dependent method, we first choose a continuous

path from the binding site to bulk water (reaction coordinate). In

the case of gramicidin, the channel axis is the obvious choice for

the reaction coordinate. The free energy profile of the K+ ion

along this path is calculated using a method such as umbrella

sampling. The free energy of binding is given by the difference

in free energy between the binding site and the bulk.

3

K+ K+

gramicidin bulk B A

)()()( AGBGBAG

Free energy perturbation (FEP)

Free energy differences can be calculated relatively easily and

several methods have been developed for this purpose. The

starting point for most approaches is Zwanzig’s perturbation

formula for the free energy difference between two states A and B:

The equality should hold if there is sufficient sampling.

However, if the two states are not similar enough, this is difficult to

achieve and there will be a large hysteresis effect (i.e. the forward

and backward results will be very different).

)()(

expln)(

expln)(

/)(

/)(

ABGBAG

kTGGABG

kTGGBAG

B

kTHHBA

A

kTHHAB

BA

AB

4

Derivation of the perturbation formula

From statistical mechanics, the Helmholtz free energy is given by

(we will assume it is the same as Gibss free energy and use G for

it)

(Z: partition function)

Where it is assumed that the states A and B are similar

A

kTHH

A

kTH

B

kTH

AB

AB

AB

kT

ekTekT

GGBAG

/)(

//

expln

lnln

)(

5

kTHkTH

kTHkTH

ekTdpdqe

dpdqeekT

ZkTG

//

//

lnln

1ln

FEP with alchemical transformation

To obtain accurate results with the perturbation formula, the

energy difference between the states should be < 2 kT, which is

not satisfied for most biomolecular processes. To deal with this

problem, one introduces a hybrid Hamiltonian

and performs the transformation from A to B gradually by

changing the parameter from 0 to 1 in small steps. That is, one

divides [0,1] into n subintervals with {i, i = 0, n}, and for each i

value, calculates the free energy difference from the ensemble

average

BA HHH )1()(

ikTHHkTG iiii /))]()((exp[ln)( 11

6

1

01)()10(

n

iiiGG

The total free energy change is then obtained by summing the

contributions from each subinterval

The number of subintervals is chosen such that the free energy

change at each step is < 2 kT, otherwise the method may lose

its validity. Points to be aware of:

1.Most codes use equal subintervals for i. But the changes in

Gi

are usually highly non-linear. One should try to choose i

such that Gi remains around 1-2 kT for all values.

2.The simulation times (equilibration + production) have to be

chosen carefully. It is not possible to extend them in case of

non-convergence (have to start over).

7

1

0

)(

d

HG

Thermodynamic integration (TI)

Another way to obtain the free energy difference is to integrate

the derivative of the hybrid Hamiltonian H(:

This integral is evaluated most efficiently using a Gaussian

quadrature.

In typical calculations for ions, 7-point quadrature is sufficient.

(But check that 9-point quadrature gives the same result for

others)

The advantage of TI over FEP is that the production run can be

extended as long as necessary and the convergence of the free

energy can be monitored (when the cumulative G flattens, it has

converged).

8

H

dpdqe

dpdqeH

d

dGkTH

kTH

/

/

Example: Free energy change in mutation of a ligand

A very common question is how a mutation in a ligand (or

protein) changes the free energy of the protein-ligand complex.

+ AGA

Gbulk(AB) Gbs(AB)

+ GB B

Thermodynamic cycle9

)()( BAGBAGGG bulkbsAB

Applications

1.Ion selectivity of potassium channels

2. Selectivity of amino acid transporters (e.g. glutamate transporter)

3. Free energy change when a sidechain is mutated in a bound ligand.

Similar calculation as above. Important in developing drug

leads from peptides.

10

)()(

)()()(

NaKGNaKG

KGNaGNaKG

bulkbs

bbsel

)()(

)()()(

GluAspGGluAspG

AspGGluGGluAspG

bulkbs

bbsel

2. Path dependent methods

Consider the previous example of binding of a K+ ion to the gramicidin channel. In the path dependent method, K+ ion is moved from bulk to the binding site in small steps and the free energy profile, W(z)

(also called potential of mean force or PMF), is constructed.

The relative binding free energy is given by

The binding constant and the absolute binding free energy are

determined from the PMF by invoking a 1D approximation

11

)()()( bulkWbsWbsbulkG

0

)(23)(

ln CKkTG

dzeRrdeK

eqbind

bs

bulk

kTzW

Vol

kTrWeq

Calculation of PMF from umbrella samplingOne samples the ligand position along a reaction coordinate and

determines the potential of mean force (PMF) from the Boltzmann

eq.

)(

)(ln)()()()(

00

/)]()([0

0

z

zkTzWzWezz kTzWzW

Here z0 is a reference point, e.g. a point in bulk where W vanishes.

In general, a particle cannot be adequately sampled at high-

energy

points. To counter that, one introduces harmonic potentials, which

restrain the particle at desired points, and then unbias its effect.

For convenience, one introduces umbrella potentials at regular

intervals along the reaction coordinate (e.g. ~0.5 Å). The PMF’s

obtained in each interval are unbiased and optimally combined

using the Weighted Histogram Analysis Method (WHAM).

12

Points to consider in umbrella samplingTwo main parameters in umbrella sampling are the force constant, k Two main parameters in umbrella sampling are the force constant, k

and the distance between windows, d. In bulk, the position of the and the distance between windows, d. In bulk, the position of the

ligand will have a Gaussian distribution given byligand will have a Gaussian distribution given by

The overlap between two Gaussian distributions separated by dThe overlap between two Gaussian distributions separated by d

The parameters should be chosen such that 10% > % overlap > 5%The parameters should be chosen such that 10% > % overlap > 5%

If the overlap is too small, PMF will have discontinuities If the overlap is too small, PMF will have discontinuities

If it is too large, simulations are not very efficient.If it is too large, simulations are not very efficient.

)8/(1% derfoverlap

kTkzzezP Bzz /,,

21

)( 02)( 22

0

Steered MD (SMD) simulations and Jarzynski’s

equationSteered MD is a more recent method where a harmonic force is

applied to an atom on a peptide and the reference point of this

force is pulled with a constant velocity. It has been used to

study unfolding of proteins and binding of ligands. The discovery

of Jarzynski’s equation in 1997 enabled determination of PMF

from SMD, which has boosted its applications.

)]([,. 0

//

tkW

ee

f

i

kTWkTF

vrrFdsF

Jarzynski’s equation:

Work done by the harmonic force

This method seems to work well in simple systems and when G is large

but beware of its applications in complex systems! 14

Beware of: 1) Problems with force fields

The force fields that are commonly used in MD simulations (e.g.,

CHARMM, AMBER, GROMOS) neglect the polarization interaction.

While the effects of induced polarization have been included in a

mean

field sense by boosting the partial charges, such an approximation

is

expected to work only in the environment where the force field

has

been optimized but not in a different situation.

The most relevant case is the force fields for proteins, which are

optimized for bulk water. One has to be wary of using the same

force fields for membrane proteins because lipid molecules have a

very

different polarization characteristic compared to water (dielectric

constants are 2 and 80, respectively)

Other cases that require caution are: interfaces and highly

charged ions.

15

2) Problems with samplingAt zero temperature, the potential function U is sufficient to

characterize the system completely. At room temperature, the

fundamental quantity is the free energy, F = U TS, which

creates the

sampling problem. Example: F=24, U=41, and TS=17

(kJ/mol)

for liquid water at STP.Statistical weight:

kTxUexP /)(~)(

But if S2 >> S1

we may have

F2 < F1

16

• Dimer formed by two right-

handed β helices

• Each monomer consists of 16

amino acid residues

• Pore is 26 Å long, 4 Å in

diamet.

• Structure is stabilized by

hydrogen bonds

• Occupied by a single-file

water chain (~7)

• Water dipoles are aligned

with the channel axis

• Conducts monovalent cations

at diffusion rates (divalent ions

bind and block)

Examples from gramicidin A channel

17

1. Potential energy profile for a K+ ion in gramicidin A

BD simulations – inverting data gives | MD simulations – Pot. mean

force

Uw = 8 kT, Ub = 5 kT, Uw = 5 kT, Ub = 22 kT

18

Free energy calculations

Free energy differences are calculated using the thermodynamic

integration (TI) and free energy perturbation (FEP) methods.

e.g. a K+ ion in bulk is translocated to the gA center while the water

molecule at that position is translocated to the ion’s position.

Two step process (via a neutral water, W0) to minimize fluctuations:

W W0 K+ (gA center)

K+ W0 W (bulk)

To check hysteresis effects, free energy

differences are calculated both in

forward (G+) and backward (-G-)

directions.

FE (kT) G+ G_ Gav

TI 11.2 12.2 11.7

FEP 13.2 14.0 13.6

PMF 13.0

19

Free energy of translocating a K+ ion to the gA center

Running average

for 700 ps

Solid: forward

(bulk to gA)

Dashed: backward

1. Convergence:

The free energy plot

should become flat

2. No hysteresis:

The two results

should agree within 1

kcal/mol

Distribution of water dipole

moments in bulk and in

gramicidin

In the presence of a K+ ion,

the dipole moment of

hydration waters decreases

in bulk but increases in the

gramicidin A channel.

Ab initio simulations in

gramicidin show the

importance of polarization

int.

21

Electrostatic energy of a K+ ion + 6 waters

22

Each window is

simulated for 400 ps

Well depths:

Ub(K) ~ 7 kT

Ub(Ca) ~ 2 kT

Ca2+ binding to gA

and blocking of K+

ions cannot be

explained.

*** Problems with

divalent ions ***

PMF results for K+, Ca2+ and Cl ions

23

Lessons from the gramicidin simulations

1. Current force fields which ignore polarization are not expected

to work in narrow pores where water and ions form a single file.

Ab initio MD calculations indicate that hydration waters of a K+

ion are more polarized in gA than in bulk.

2. Hydration waters around a divalent ion are more polarized than

those of a monovalent ion.

Example: dipole moment of water from ab initio calculations:

Bulk water: 3.0 Debye

Hydration shell of K+ ion: 2.8 Debye

Hydration shell of Ca2+ ion: 3.4 Debye

Thus the current force fields, which are optimised for

monovalent ions, cannot work well for divalent ions.

24

2. Sampling problem in a simple vs complex system:

Test of Jarzynski’s EquationCarbon nanotube Gramicidin A channel

25

Comparison of K+ ion PMF’s obtained from umbrella sampling &

WHAM

and from Jarzynski’s equality using steered MD simulations

Carbon nanotube Gramicidin A channel

v(A/ns)

26

Sampling is more difficult in non-equilibrium methods

1. In a carbon nanotube, interaction of the K+ ion (and the

hydration waters) with the C atoms on the wall are short range,

hence equilibration of the system is quite fast.

In such a situation, Jarzynski’s Equation works as well as

umbrella sampling, and because it is simpler to implement, it

would be the method of choice

2. In the gramicidin channel, the K+ ion (and the hydration waters)

interact with the charged atoms on the protein wall. Because

Coulomb interaction is long range, equilibration takes more

time.

In such cases, Jarzynski’s Equation is not very reliable, and

umbrella sampling should be preferred for accurate results.27

Equilibration and convergence issues in PMF

calculations

• Finite resources means we need to make optimal choices for

equilibration and production times in free energy calculations.

• Equilibration is the initial simulation data, where the system is

still evolving (not equilibrated yet) and must be thrown away.

Choosing it too short will blemish the result and too long will

waste computing time.

• During production, the system is fluctuating around

equilibrium. It must be run long enough to allow the system

to sample all energetically important states. Otherwise the

calculations will not be accurate. Convergence tests can be

used for this purpose but note that there are no absolute

criteria that one can use (running longer is the only choice if

you are in doubt).

28

The ion-ion potentials in force fields are determined from

combination rules with no direct experimental input . This is not

satisfactory and any guidance from ab inito calculations would be

very useful.

In the examples below the PMF’s for the dissociation of Na-Cl and

Ca-Cl ion pairs are calculated from ab initio MD (Car-Parrinello MD)

simulations using the constraint-force method (faster than umbr.

samp). The average force needed to keep the ions at a fixed

distance, r, is calculated for a range of r values at 0.1 - 0.2 A

intervals and these are integrated to determine the PMF.

Note that ion-water dynamics is fast which makes these

picosecond

ab initio calculations feasible. They would not be feasible for

proteins.

Example: ab initio calculation of PMF’s for Na-Cl and Ca-Cl

Example: PMF for dissociation of Na-Cl

Total run is 6 ps. The data is divided into 1 ps blocks to check

equilibration

Here 2 ps of data are dropped and the PMF is obtained from the last 4

ps.1-2 ps equilibration3-6 ps production

(black line)

Another way to check the equilibration is to drop successively more

data for equilibration and see if the result changes.

r = 3.1 A

r = 3.9 A

r = 4.7 A

Comparison of ab initio and classical PMF’s for Na-Cl

None of the classical force fields can match the ab initio PMF. In

particular AMBER has a deep contact min. which leads to

crystallization

Accelerated MD for speeding up convergence

Using biasing potentials in the low energy regions of the potential

energy surface, barriers can be lowered, leading to faster

convergence.

For the Na-Cl PMF considered here, this leads to ~4-fold speed up.

Example: PMF for dissociation of Ca-Cl

Convergence could not be obtained after 23 ps of ab initio

simulation.

Inspection of the forces shows large variations according n(Ca).

Ca hydration

numbers

n(Ca) = 5

r<3.7

n(Ca) = 5 or 6

for r 3.7 – 4.9

n(Ca) = 6

r>4.9

Ca-Cl PMF and its dependence on n(Ca)

The PMF with n(Ca) = 5 in the intermediate region is

unphysical

Switching to n(Ca) = 6 yields a reasonable PMF .

Comparison of ab initio and classical PMF’s for Na—Cl

The CHARMM force field does a better job than that of Dang &

Smith but it still needs to be improved.

Example: PMF for a K+ ion in the Kv1.2 potassium channel

The trigger for permeation of K+

ions is the entry of the K+ ion at

cavity to the S4 binding site.

To find out whether the K+ ion

can bind to S4 while S1 – S3 are

occupied, or they have to move

to S0 - S2 to enable binding, two

PMF’s are constructed with the

final states S1 – S3 – S4 and

S0 – S2 – S4.

The first check in PMF

calculations is whether

there are sufficient

overlaps between the

neighbouring windows.

This can be achieved by

visually checking the

density plots for all the

windows (top) or by directly

calculating the overlaps

and plotting them in a bar

graph (bottom).

Next decide on the

equilibration time and start

collecting density data. How

long do we run?

An efficient way to decide is to

run in small blocks and check

for convergence in the

accumulated data.

In the example here, the total

run is 600 ps and 100 ps is

dropped for equilibration.

To show convergence, data are

added in 100 ps blocks and a

PMF is constructed from the

accumulated data at every 100

ps

One acumulates a great deal

of trajectory data during the

PMF calculations, and it would

be pity if all one extracts from

it is the reaction coordinate of

the ion.

A detailed picture of the

reaction process can be

obtained using visualisation

methods (e.g. making a

video!)

Here is an example showing

that in the S1 – S3 – S4 PMF,

the cavity ion does not trap

the water at S4 as it moves

in.

The K+ ions in the filter move

together, e.g., from S1 – S3 to

S0 – S2 or S2 – S4.

This can be studied by

constructing the PMF for the

center of mass of the ions.

But how do we know that this

is a reliable reaction

coordinate?

A simple way to show this is

to plot the distribution of ion-

ion distances and show that it

remains Gaussian as the pair

moves across the filter (as if

they are connected by a

spring)

Example: PMF for binding of charybdotoxin to K+ channel

From the previous examples, we have seen that ions equilibrate

quite fast (~100 ps) and < 1 ns production run is sufficient for

PMF. For complex ligands, the

situation is obviously more

complicated.

For one thing, the ligand may be

distorted, which will lead to

erroneous results.

One also requires much longer

equlibration of the system

(typically > 1 ns), and longer

production runs ( > 1 ns).

Convergence of the toxin PMF

Force constant

k=20 kcal/mol/A2

Umbrella

windows

at 0.5 A

Each color

represents 400

ps of sampling.

The first 1.2 ns is

dropped for

equilibration and

PMF is obtained

from the last 2

ns

(black line)

Equilibration and convergence issues in FEP & TI

1. FEP calculations

• In FEP, one has to decide on the number of windows and the

equilibration time in advance. The windows are created

serially, so if the equilibration time is inadequate, it has to be

repeated using longer equilibration time and the initial data

are wasted.

• A second potential problem in FEP calculations is the

requirement that Gi remains around 1-2 kT for all windows.

Because the change in the free energy is nonlinear, it is very

difficult to guess the number of windows one should use. For

the same reason, using fixed intervals is not optimal.

Exponentially spaced intervals would reduce the required

number of windows by half.

44

Example: Na+ binding energy in glutamate transporter

Window G(Na+; b.s. bulk)

40 eq. 22.9

60 eq. 26.3

65 exp. 27.1

Free energy change G at each step of FEP calculation

Exponential versus equal spacing for

The interval [0, 0.5] is mapped to an exponential for 40 windows.

(Fold it over to get the interval [0.5, 1] )

exp.

equal

2. TI calculations

• In TI , one only need to specify the number of windows in

advance. The data can be divided into equilibration and

production parts later. Moreover, one can continue

accumulating data if there is a problem with convergence,

thus there is no wastage of data.

• Convergence can be monitored by plotting the running

average of the free energy. Flattening out of the curve is

usually taken as a sign for convergence.

• Because small number of windows are used in TI,

equilibration may prove difficult in some systems. An initial

FEP calculation with large number of windows can resolve this

problem (choose the TI windows from the nearest FEP

window).

48

Example: Na+ and Asp binding energies in glut.

transporterTI calculation of the

binding free energy of

Na+ ion to the binding

site 1 in Gltph.

Integration is done

using Gaussian

quadrature with 7

points.

Thick lines show the

running averages,

which flatten out as

the data accumulate.

Thin lines show

averages over 50 ps

blocks of data.

Asp binding energy in glutamate transporter

TI calculation of the

binding free energy of

Asp to the binding site

in Gltph.

Asp is substituted with

5 water molecules.

First 400 ps data

account for

equilibration and the 1

ns of data are used in

the production.

Documents

Free energy calculations General methods Free energy is the most important quantity that characterizes a dynamical process. Two types of free energy calculations: