Lecture5

1. MIT 10.637 Lecture 5 Classical molecular dynamics Heather J Kulik 09-18-14

2. MIT 10.637 Lecture 5 Why molecular dynamics? Protein folding: how proteins fold and misfold (Prof. Vijay Pande) Voelz, Bowman, Beauchamp, Pande. JACS (2010).

3. MIT 10.637 Lecture 5 Molecular dynamics F = ma Classical particles can be simulated by solving Newtons second equation: The force is the derivative of the potential energy at position r: r is a vector containing the coordinates for all particles in cartesian coordinates (i.e. of length 3Natom). The potential energy V function (for now) comes from our force field parameters.

4. MIT 10.637 Lecture 5 Structure of an MD code 1. Initialize positions and velocities, temperature, density, etc. 2. Compute forces 3. Integrate equations of motion 4. Move atoms 5. Repeat 2-4 until equilibrated (desired properties are stable, potential energy and kinetic energy are stable). 6. Continue 2-4 as production run -> collecting data to average over.

5. MIT 10.637 Lecture 5 Initialization Avoid random initialization. Dont want energy divergence. Initial positions generated from a structure avoid overly short distances between molecules/inside molecules. Velocities start out small or zero. Can slowly heat up the system, giving more and more temperature (velocity) to the particles. Randomizing initial velocities equipartition theorem relates temperature to the velocity. Choose a random number from a uniform distribution, make sure the net velocity results in a total momentum of zero, scale velocities until we get a kinetic energy that matches the initial temperature.

6. MIT 10.637 Lecture 5 Statistical mechanics: ensembles Ways in which a fixed volume can be described with statistical mechanics: Microcanonical ensemble: Fixed number of particles (N), fixed energy (E) - NVE. Equal probability for each possible state with that energy/composition. Canonical ensemble: Fixed composition (N), in thermal equilibrium with a heat bath of a given temperature (T). Energy can vary but same number of particles probability of a state depends on its energy (origin of the Boltzmann distribution). Grand canonical ensemble (mVT): Variable composition - thermal and chemical equilibrium with a reservoir. Fixed temperature reservoir with a chemical potential for each particle. States can vary energy and number of particles. Macroscopic properties of these ensembles can be calculated as weighted averages based on the partition function.

7. MIT 10.637 Lecture 5 Ergodic hypothesis We assume the average obtained by following a small number of particles over a long time is the same as averaging over a large number of particles for a short time. Time-averaging is equivalent to ensemble- averaging. Or alternatively: no matter where a system is started it can get to another point in phase space.

8. MIT 10.637 Lecture 5 Choosing an ensemble Ensemble menu: Choose one from each row Particle number N Chemical potential m Volume V Pressure P Energy E Temperature T Most common combinations: Microcanonical ensemble (NVE): Conserves the total energy , S has maximum in equilibrium state. Canonical ensemble (NVT): Also called constant temperature molecular dynamics. Requires thermostats for exchanging energy. A has minimum in equilibrium. Isothermal-isobaric ensemble (NPT): Requires both a thermostat and barostat, corresponds most closely to laboratory conditions. G has minimum.

9. MIT 10.637 Lecture 5 Molecular dynamics Conformation Energy In molecular dynamics we can sample parts of the potential energy surface that are accessible with the energy supplied to the system

10. MIT 10.637 Lecture 5 Integration Integration algorithms need to be fast, require little memory. Should allow us to choose a long timestep. Stay close to the exactly integrated trajectory. Conserve momentum and energy Be time-reversible. Be straightforward to implement.

11. MIT 10.637 Lecture 5 Typical timescales 10-15 femto 10-12 pico 10-9 nano 10-6 micro 10-3 milli 100 seconds Bond vibration Bond Isomerization Water dynamics Helix forms Fast conformational change long MD run where we need to be MD step where wed love to be Slow conformational change Chemistry and protein dynamics occur on a relatively slow timescale: The MD timestep is limited by the highest frequency vibration in the system, typically to 1/10 of the period of that vibration. X-H bonds are typically the highest frequency vibration (3000 cm-1 with a period of 10 fs) and a typical timestep in classical MD will be 1 fs.

12. MIT 10.637 Lecture 5 Choosing a time step Too short - computation needlessly slow Too long - errors result from approximations Just right - errors acceptable, maximum speed

13. MIT 10.637 Lecture 5 Euler method Taylor expansion for particle position and velocity at time t+Dt with truncation after first term: Recall a is from the forces.

14. MIT 10.637 Lecture 5 Euler method Problems persist with this method: First order method, local error scales with square of the timestep. Global errors are larger. Not time-reversible. Sensitive, easy to make unstable.

15. MIT 10.637 Lecture 5 Leap-frog method This method minimizes some of the error present in the Euler method by calculating velocities at timestep offsets second order method. Step 1: Solve for acceleration/forces Step 2: Update velocities Step 3: Update positions Repeat

16. MIT 10.637 Lecture 5 The Verlet algorithm Taylor expansion for particle position at time t+Dt: Taylor expansion for particle position at time t-Dt: Add expressions: v a b ( or a)

17. MIT 10.637 Lecture 5 The Verlet algorithm Positions evaluated: Approximation for the first timestep: Acceleration is from potential: Advantages: Simple to program, conserves energy (and time reversible). Disadvantages for Verlet algorithm: differences between large numbers can lead to finite precision issues, velocities would be calculated based on difference between positions at t+dt vs t-dt (velocity extension) so dont know instantaneous velocities/temperatures. Need new positions before velocity.

18. MIT 10.637 Lecture 5 The Velocity Verlet algorithm Regular Verlet has no explicit dependence on velocities, only on acceleration would be better to depend on velocity. This is solved with Velocity Verlet algorithm. Taylor expand position, velocity: Taylor expand acceleration, then rearrange and multiply by Dt/2:

19. MIT 10.637 Lecture 5 The Velocity Verlet algorithm Substitute in expression for second derivative of velocity: We get this expression, then simplify:

20. MIT 10.637 Lecture 5 The Velocity Verlet procedure Step 1: Evaluate new positions Step 2: Evaluate forces (acceleration) at t+Dt. Step 3: Evaluate new velocities Repeat procedure

21. MIT 10.637 Lecture 5 Predictor-corrector approach 1. Predict r, v, and a at time t+Dt using second order Taylor expansions. 2. Calculate forces (and accelerations) from new positions r(t+Dt) 3. Calculate difference in predicted versus actual accelerations: 4. Correct positions, velocities, accelerations using new accelerations Da(t+Dt) 5. Repeat

22. MIT 10.637 Lecture 5 Updated formulas Coefficients chosen to maximize stability of algorithm, e.g. Gear Predictor-Corrector has c0=1/6 c1=5/6 c2=1 c3=1/3

23. MIT 10.637 Lecture 5 Pros and cons of predictor- corrector Pros Positions and velocities are corrected to Dt4 Very accurate for small Dt Cons Not time reversible Not symplectic (area/energy preserving) Takes more time two force evaluations per step. High memory requirements (15N instead of 9N).

24. MIT 10.637 Lecture 5 Use of constraints to increase the integration step SHAKE algorithm fixes X-H bonds and allows increase of timesteps from 1fs to 2fs. Also, hydrogen mass repartitioning: take mass from neighboring atoms and increase mass of hydrogen to ~4 au: timesteps ~4fs d Unconstrained update d Project out forces along the bond l Correct for rotational lengthening d p

25. MIT 10.637 Lecture 5 Lyapunov instability Trajectories are sensitive to initial conditions! Position of Nth particle at time t depends on initial position and momentum plus elapsed time: Perturbing initial conditions of the momentum: Difference diverges exponentially, l is the Lyapunov exponent.

26. MIT 10.637 Lecture 5 Lyapunov instability Example: two particles out of 1000 in a Lennard-Jones simulation have velocities in x-component changed by +10-10 and -10-10. Monitor the sum of the squares of differences in positions of all particles: Gets very large very quickly! (After only about 1000 steps).

27. MIT 10.637 Lecture 5 Periodic boundary conditions Can simulate the condensed phase with a limited number of particles if we use periodic boundary conditions. Needed to eliminate surface effects Particle interacts with closest images of other molecules. A number of options in AMBER: cubic box, truncated octahedron, spherical cap. rcut < L/2

28. MIT 10.637 Lecture 5 Periodic boundary conditions van der Waals interactions are usually treated with a finite distance cutoff. Ewald summation treats long range electrostatics accurately and efficiently using real space (short range) and reciprocal space (long range but short range in inverse space) summations-> converges quickly. Particle Mesh Ewald uses FFT and converges O(N log N). Choose a large enough simulation cell to avoid contact between periodic images e.g. protein-protein interactions. Need cutoffs of interactions to be no more than half the shortest box dimension. Need to neutralize the simulation cell with counter-ions. a b b Cutoff approaches (better than abrupt truncation):

29. MIT 10.637 Lecture 5 Speeding up MD calculations Lookup tables: pre-compute interaction energies at various distances and interpolate to get value. Neighbor lists: lists of atoms to calculate interactions for, then only update the list when atoms move a certain distance (about every 10-20 timesteps for liquids, infrequent for solids). Storage issues for very large systems. Cell-index method: discretize simulation cell into sub-cells. Search only the sub-cells within a certain distance (e.g. nearest neighbors). Multiple timestep dynamics (e.g. Bernes RESPA method): evaluate and update forces due to different interactions on different timescales long range interactions like electrostatics get updated most slowly, bond constants get updated most quickly. Rigid bonds/mass repartitioning (covered earlier).

30. MIT 10.637 Lecture 5 Temperature in MD Equipartition energy theorem relates temperature to the average kinetic energy of the system. Instantaneous temperature is: Thermostats may be used to control temperature (e.g. in NPT and NVT ensembles).

31. MIT 10.637 Lecture 5 Berendsen thermostat Suppresses fluctuations in kinetic energy so not truly producing canonical ensemble. If t is same as timestep, then simply velocity rescaling. A form of velocity rescaling with weak coupling to an external bath. Velocities get multiplied by a proportionality factor (l) to move the temperature (T) closer to the set point (T0). Proportionality factor: Revised equations of motion: Typically t = 0.1-0.4ps

32. MIT 10.637 Lecture 5 Andersen thermostat Correctly samples NVT. Cannot be used to sample time-dependent properties e.g. diffusion, hydrogen bond lifetimes. Each atom at each integration step is subject to small, random probability of collision with a heat bath. This is a stochastic process. Probability of a collision event: For small timesteps, , and each particle is assigned a random number between 0 and 1. If that number is smaller than then the momentum of the particle is reset. New momentum follows a Gaussian distribution around the set point temperature.

33. MIT 10.637 Lecture 5 Langevin dynamics In Langevin dynamics, all particles experience a random force from particles outside the simulation as well as a friction force that lowers velocities. The friction force and random force are related in a way that guarantees NVT statistics. Standard force Friction force with coefficient g Random force with random number R(t) and related to friction force through g. Recommended values for g are around 2-5 ps-1. Langevin is susceptible to synchronization artifacts so its important to use a random seed when initializing velocities. In some cases, Langevin can allow for longer time steps.

34. MIT 10.637 Lecture 5 Nose-Hoover thermostat Extended system method: introduce additional artificial degrees of freedom and mass: Stretched timescale Artificial mass Kinetic energy and potential energy terms for heatbath degree of freedom (s). Sample microcanonical ensemble in extended system variables, but there are fluctuations of s, resulting in heat transfer between system and bath sample canonical ensemble in real system.

35. MIT 10.637 Lecture 5 Thermostat review Thermostat Description True NVT? Stochastic? Velocity rescaling/Berendse n KE (velocities) revised to produce desired T No No Nose-Hoover Extra degrees of freedom act as thermal reservoir Yes No Langevin Noise and friction give correct T Yes Yes Andersen Momenta re-randomized occasionally Yes Yes

36. MIT 10.637 Lecture 5 Pressure in MD Clausius virial equation is used to obtain pressure from a molecular dynamics system: where r is the position of particle i and F is the force. Barostats may be used to control pressure (e.g. in NPT ensemble).

37. MIT 10.637 Lecture 5 Berendsen barostat Used in Amber code for NPT dynamics. Does not strictly sample from NPT ensemble. Positions and volume are rescaled: Scaling factor: Where P0 is target pressure and P is instantaneous pressure. t is the pressure coupling time (typically 1-5 ps) and b is the isothermal compressibility (44.6x10-6 bar-1 for water).

38. MIT 10.637 Lecture 5 Properties from MD runs Autocorrelation functions: Autocorrelation functions (ACFs) can be defined and calculated for any particle quantity (e.g. vi ) or any system quantity (e.g. U, T, P, r). Starts at 1 and decays usually exponentially with time. Diffusion coefficient: t(ps) Solid Liquid 0.0 t (ps)

39. MIT 10.637 Lecture 5 Properties from MD runs Radial distribution function: g(r) separation (r) 1.0 R D R

40. MIT 10.637 Lecture 5 Summary Well-equilibrated molecular dynamics gives us access to thermodynamic properties We need to choose the right ensemble, thermostat/barostat, simulation cell, timestep, cutoffs, force fields for the job. Direct, unbiased molecular dynamics are limited to sampling the potential energy surface weve given it enough energy to sample and by the timescale accessible with the timestep weve selected. Hydrogens (flexible or rigid) are the limiting factor in describing molecular dynamics of organic systems. Adaptive sampling approaches are required to efficiently sample rare events higher energy portions of the potential energy surface, slower processes.

41. MIT 10.637 Lecture 5 Survey! bit.ly/lec5

Science

Lecture5