Multiscale implementation of infinite-swap replica ... · Multiscale implementation of infinite-swap replica exchange molecular dynamics Tang-Qing Yua, Jianfeng Lub,c,d, Cameron F

Multiscale implementation of infinite-swap replicaexchange molecular dynamicsTang-Qing Yua, Jianfeng Lub,c,d, Cameron F. Abramse, and Eric Vanden-Eijndena,1

aCourant Institute of Mathematical Sciences, New York University, New York, NY 10012; bDepartment of Mathematics, Duke University, Durham, NC 27708;cDepartment of Physics, Duke University, Durham, NC 27708; dDepartment of Chemistry, Duke University, Durham, NC 27708; and eDepartment of Chemicaland Biological Engineering, Drexel University, Philadelphia, PA 19104

Edited by Michael L. Klein, Temple University, Philadelphia, PA, and approved August 9, 2016 (received for review March 28, 2016)

Replica exchange molecular dynamics (REMD) is a popular methodto accelerate conformational sampling of complex molecular systems.The idea is to run several replicas of the system in parallel at differenttemperatures that are swapped periodically. These swaps are typicallyattempted every fewMD steps and accepted or rejected according to aMetropolis–Hastings criterion. This guarantees that the joint distribu-tion of the composite system of replicas is the normalized sum of thesymmetrized product of the canonical distributions of these replicas atthe different temperatures. Here we propose a different implementa-tion of REMD in which (i) the swaps obey a continuous-time Markovjump process implemented via Gillespie’s stochastic simulation algo-rithm (SSA), which also samples exactly the aforementioned joint dis-tribution and has the advantage of being rejection free, and (ii) thisREMD-SSA is combined with the heterogeneous multiscale method toaccelerate the rate of the swaps and reach the so-called infinite-swaplimit that is known to optimize sampling efficiency. Themethod is easyto implement and can be trivially parallelized. Here we illustrate itsaccuracy and efficiency on the examples of alanine dipeptide in vac-uum and C-terminal β-hairpin of protein G in explicit solvent. In thislatter example, our results indicate that the landscape of the protein isa triple funnel with two folded structures and one misfolded structurethat are stabilized by H-bonds.

importance sampling | REMD | SSA | HMM | protein-folding

All-atom molecular dynamics (MD) simulations are increasinglyused as a tool to study the structure and function of large

biomolecules and other complex molecular systems from first prin-ciples. By allowing us to scrutinize these systems at spatiotemporalresolutions that are typically beyond experimental reach, MD sim-ulations are changing the way we understand chemical and biologicalprocesses, and their impact in areas such as chemical engineering,synthetic biology, drug design, and so on is bound to increase in thecoming years (1–3). However, exploiting the full potential of MDsimulations remains a major challenge, primarily due to approxi-mations in all-atom force fields and the limitation of accessibletimescales available to standard MD. Force fields continue to beimproved (4–7), and hardware developments involving special-pur-pose high-performance computers (8), high-performance graphicsprocessing units (9), or massively parallel simulations (10) also per-mit us to enlarge the scope of MD simulations. However, it has beenargued (11) that algorithm developments will be the key to accessbiologically relevant timescales with MD. In particular, there is agreat need for methods capable of accelerating the exploration andsampling of the conformational space of complex molecular systems.Among the various techniques that have been introduced to

this end in recent years, replica exchange MD (REMD) (12)remains one of the most popular. The reasons for this success aresimple: REMD does not require much prior information on thesystem (in particular, it does not require one to prescribe col-lective variables beforehand), it is fairly versatile, and it is easy toparallelize. The basic idea behind REMD is to introduce severaladditional copies, or replicas, of the system that have temperatureshigher than the one(s) of interest and are therefore expected toexplore its conformational space faster—control parameters otherthan the temperature can also be used, as long as adjusting their

values permits accelerated sampling. These replicas are thenevolved in parallel using standard MD, and their temperatures areperiodically swapped. If the swaps are performed in a thermody-namically consistent way, the correct canonical averages at any ofthese temperatures can be sampled from the pieces of the replicatrajectories during which they feel this temperature. In the stan-dard implementation of REMD, this is done by attempting swapsat a predefined frequency and accepting or rejecting them accordingto a Metropolis–Hastings criterion. This guarantees that the jointdistribution of the replicas is the symmetrized product of their ca-nonical distributions at the different temperatures. The frequency ofswap attempts is an input parameter in the method, and recentsimulations argue that higher frequencies generally provide bettersampling (13–17). This was justified rigorously through largedeviation theory (18). How to implement high-frequency swap-attempt REMD is not obvious, however, because increasing theattempt frequency of the swap also increases the computationalcost, sometimes with little gain because this also leads to morerejections. One way to get around this difficulty is to take the so-called infinite-swap limit analytically and simulate the resultinglimiting equations. This solution was first advocated in ref. 19 andlater revisited in ref. 20. Unfortunately, this solution cannot beimplemented directly if the number of temperatures is large, be-cause the limiting equation involves coefficients whose calculationrequires computing sum over all possible permutations of thetemperatures, and if there are N temperatures, there are of courseN! such permutations. Further approximations are therefore re-quired in practice (19, 20).In this paper, we propose a different solution to this problem

that involves no approximation. It is based on the observationthat REMD can be reformulated in such a way that the tem-perature swaps satisfy a continuous-time Markov jump process(MJP) rather than a discrete-time Markov chain. This MJP can

Significance

Efficiently sampling the conformational space of complex mo-lecular systems is a difficult problem that frequently arises in thecontext of molecular dynamics (MD) simulations. Here we pre-sent a modification of replica exchange MD (REMD) that is assimple to implement and parallelize as standard REMD but isshown to perform better in terms of sampling efficiency. Themethod is used here to investigate the multifunnel structure ofthe C-terminal β-hairpin of protein G in explicit solvent, which todate has required much longer REMD simulations to produceconverged predictions of conformational statistics. In particular,we identify a potentially important folded β structure previouslyundetected by other importance sampling methods.

Author contributions: T.-Q.Y., J.L., and E.V.-E. designed research; T.-Q.Y., J.L., C.F.A., and E.V.-E.performed research; J.L. and E.V.-E. contributed new reagents/analytic tools; T.-Q.Y., C.F.A.,and E.V.-E. analyzed data; and T.-Q.Y., J.L., C.F.A., and E.V.-E. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.1To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1605089113/-/DCSupplemental.

11744–11749 | PNAS | October 18, 2016 | vol. 113 | no. 42 www.pnas.org/cgi/doi/10.1073/pnas.1605089113

Dow

nloa

ded

by g

uest

on

Oct

ober

5, 2

020

http://crossmark.crossref.org/dialog/?doi=10.1073/pnas.1605089113&domain=pdf

mailto:[email protected]

http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1605089113/-/DCSupplemental

http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1605089113/-/DCSupplemental

www.pnas.org/cgi/doi/10.1073/pnas.1605089113

be simulated via Gillespie’s stochastic simulation algorithm (SSA)(21), which is rejection-free: Given the current assignments of thetemperatures across the replicas, the method computes directly thetime at which the next swap occurs and proceeds with the MD untilthat time—this is in contrast with standard REMD, in which theswaps are proposed (and rejected or accepted) at fixed time lags.This REMD-SSA can then be combined with multiscale simulationsschemes such as the heterogeneous multiscale method (HMM)(22–24) to effectively compute at the infinite-swap limit. Here, wediscuss the theoretical and computational aspects of this refor-mulation of REMD involving a multiscale SSA, termed REMD-MSSA for short, and apply it to compute free-energy surfaces(FESs) of alanine dipeptide (AD) and a 16-residue β-hairpin fromprotein G. In both applications, we observe that REMD-MSSA ismore efficient than standard REMD: In particular, the rapid con-vergence of the free energy calculation in the context of β-hairpinallows one to identify a β structure.

MethodologyREMD-SSA. Let us propose a reformulation of REMD such that (i) the tem-perature swaps are replaced by swaps in factors multiplying the forces actingon the replicas and (ii) these swaps occur via a continuous-time MJP—thegeneralization to other control parameters is straightforward and is con-sidered briefly in Using Parameters Other than Temperature. To this end, it isuseful to take as a starting point the probability distribution that a REMDsimulation is designed to sample. Suppose we use N replicas with positionsdenoted collectively as X = fx1;⋯; xNg and let β1 > β2⋯> βN be the N inversetemperatures that are being swapped over these replicas. Denote alsoρiðxÞ= Z−1

βie−βiVðxÞ the canonical distribution at inverse temperature βi over the

atomic potential VðxÞ (with Zβi =Re−βiVðxÞ dx). Then REMD samples the sym-

metrized equilibrium probability density (18, 19):

ϱðXÞ= 1N!

Xσ

ρσð1Þðx1Þ⋯ρσðNÞðxNÞ, [1]

where the sum is taken over all of the permutation σ of the indices f1;⋯;Ng[with σðiÞ denoting the index onto which i is mapped by the permutation σ]. Thesymmetrized density in Eq. 1 can be thought of as the marginal density on thepositions X alone of the following joint distribution for X and the permutation σ:

ϱðσ;XÞ= 1N!

ρσð1Þðx1Þ⋯ρσðNÞðxNÞ. [2]

Performing temperature swaps is equivalent to evolving the permutation σconcurrently with the replica configurations X in a way that is consistentwith Eq. 2. In standard REMD this is done by proposing a new permutationσ′≠ σ after a fixed time lag and accepting or rejecting it according to aMetropolis–Hastings criterion. However, it is easy to modify the method andmake both X and σ continuous-time Markov processes in which the updatesof σ occur at random times. Introducing the symmetrized (mixture) potential

VðX; σÞ=−β−1 log ϱðσ;XÞ= β−1XNi=1

βσðiÞVðxiÞ+ cst, [3]

where we used β≡ β1 as reference temperature, and noting that ∇xjVðX; σÞ=β−1βσðjÞ∇VðxjÞ, this amounts to imposing that:

i) the replica positions evolve via standard MD (using, e.g., Langevin’sthermostat with friction coefficient γ) over the potential (Eq. 3),

_xj =m−1pj ;

_pj =−β−1βσðjÞ∇V�xj�− γpj +

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2γmβ−1

p ηj ;

[4]

where ηj is white noise with mean zero and covariance ÆηjðtÞηTk ðt′Þæ=δj;kδðt − t′ÞId, and

ii) the permutation σ is updated via the continuous-time MJP with rateqσ;σ′ðXÞ with σ′≠ σ given by

qσ;σ′ðXÞ= νaσ;σ′e−12 βðVðX;σ′Þ−VðX;σÞÞ, [5]

where ν> 0 is a constant that controls the swapping rate (the higher ν,the more frequently the jumps occur), and the symmetric matrix aσ;σ′ = aσ′;σindicates whether the permutation σ′ is accessible from σ, that is, aσ;σ′ = 1 ifwe allow the jump from σ to σ′ and aσ;σ′ = 0 otherwise—below we will

restrict ourselves to transposition moves in which two random indices areswapped and for which the total number of accessible states σ′ from σ isNðN− 1Þ=2.

These dynamics can also be reformulated as follows. Given that σðtÞ= σ attime t, the probability that it jumps out of σ for the first time in the in-finitesimal time interval ½t′; t′+dt� with t′> t and reaches σ′≠ σ is given by

exp

−Xσ′′

Z t′

tqσ;σ′′ðXðsÞÞds

!qσ;σ′ðXðt′ÞÞdt, [6]

where XðsÞ is the solution to Eq. 4 for s∈ ½t; t′� with σðtÞ= σ fixed. Thisreformulation of the process can be used to design straightforward variantsof Gillespie’s SSA to simulate it—for this reason we will refer to it as REMDwith SSA, or REMD-SSA for short. How to implement REMD-SSA with a finiteν is explained in Supporting Information, and how to reach the limit ν→∞ isdealt with below. In both cases, these implementations are rejection-freeand akin to kinetic Monte-Carlo schemes.

For any swapping rate ν, the dynamics described in Eqs. 4 and 5 satisfiesdetailed balance with respect to Eq. 2 and therefore has Eq. 2 as its equi-librium distribution (see Supporting Information for details). As a result, theexpectation of an observable A at any temperature 1=βj can be calculated as

ÆAæj =Z

AðxÞρjðxÞdx

=Z X

σ

XNi=1

AðxiÞ1j=σðiÞ!ϱðσ;XÞdX

≈1T

Z T

0

XNi=1

AðxiðtÞÞ1j=σði;tÞ dt,

[7]

where 1j=σðiÞ = 1 if j= σðiÞ and 1j=σðiÞ = 0 otherwise, and similarly for 1j=σði, tÞ:Here σði, tÞ denotes the index onto which i is mapped at time t by the time-dependent permutation σðtÞ.

Infinite-Swap REMD. It can be shown that the sampling efficiency of REMD-SSA increases monotonically with ν (Supporting Information). This suggestsone should operate the scheme in the infinite-swap limit ν→∞, and so let usconsider briefly this limit next—how to operate within it in practice will bedescribed in the next section. As ν→∞, the permutation σ evolves infinitelyfaster than X, meaning that σ is always at equilibrium conditional on thecurrent value of XðtÞ, and XðtÞ only feels the average effect of σ. In otherwords, the dynamics of X is captured by the limiting equation

_x j =m−1pj ,

_pj =−RjðXÞ∇V�xj�− γpj +

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2γmβ−1

pηj .

[8]

Here

RjðXÞ= β−1Xσ

βσ−1ðjÞωX ðσÞ, [9]

where σ−1ðjÞ denotes the index mapped onto j by the permutation σ, is theaveraged rescaling parameter of the force, with the average taken withrespect to the equilibrium distribution of σ given X:

ωX ðσÞd e−βVðX;σÞPσ′e−βVðX;σ′Þ

=ϱðσ;XÞPσ′ϱðσ′;XÞ

. [10]

We note that Eq. 8 is exactly the infinite-swap REMD (ISREMD) formulated inref. 20.

The equilibrium distribution sampled by the limiting equations (Eq. 8) isthe mixed distribution ϱðXÞ in Eq. 1. Therefore, the canonical average of A atβj can be estimated by

ÆAæj =Z

AðxÞρjðxÞdx

=Z XN

i=1

AðxiÞηi;jðXÞϱðXÞdX

≈1T

Z T

0

XNi=1

AðxiðtÞÞηi;jðXðtÞÞdt,

[11]

where

ηi;jðXÞ=Xσ

1j=σðiÞωX ðσÞ [12]

is the probability that the ith replica is at the jth temperature conditional onthe replica positions being at X.

Yu et al. PNAS | October 18, 2016 | vol. 113 | no. 42 | 11745

CHEM

ISTR

YBIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

Dow

nloa

ded

by g

uest

on

Oct

ober

5, 2

020

http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1605089113/-/DCSupplemental/pnas.201605089SI.pdf?targetid=nameddest=STXT




In ISREMD, the permutation σ is no longer a dynamical variable: Weonly need to evolve XðtÞ according to Eq. 8. Unfortunately, this involvescalculating the factor RjðXÞ defined in Eq. 9, and the cost of these calcu-lations increases very rapidly with the number of replicas N, because thenumber of permutations grows as a factorial of N. On top of this, to es-timate averages we also need to compute ηi,jðXÞ, which is costly too. Oneway to address this problem is to introduce additional approximations,for example, partial swapping (18, 19). Next, we propose an alternativestrategy that permits us to reach the infinite-swap limit while workingdirectly with the original dynamics defined in Eqs. 4 and 5, without in-troducing additional approximations.

Implementation: REMD with Multiscale SSA. To simulate the process defined byEqs. 4 and 5 at large ν, that is, in a regime where the evolution of slowvariables X is effectively captured by the limiting equation (Eq. 8), we will usea variant of the HMM (22–24). HMM’s basic idea is to compute on-the-fly thecoefficients in the limiting equation (Eq. 8) for the slow variables X, as theyare needed to evolve these variables, via short bursts of simulation of the fastprocess σ—these simulations are performed in what is called the microsolverin HMM, whereas the routine used to evolve the slow variables is referred toas the macrosolver; the scheme is made complete by an estimator specifyinghow the data generated in the microsolver are used within the macrosolver(i.e., how the unknown coefficients in the limiting equation for X are beingcomputed). In the present context, the HMM scheme we will use worksas follows.

Choose a time step Δt appropriate for the simulation via MD of thelimiting equation (Eq. 8) and pick a frequency ν such that ν � 1=Δt (forexample ν= 102=Δt − 103=Δt). Start the simulation with X0 =Xð0Þ andσ0 = σð0Þ, and set t0 = 0. Then for k≥0, use the following three-step iterationalgorithm:

i) Microsolver: Evolve σk via SSA from tk to tk+1dtk +Δt using the rate inEq. 5, that is: Set σk;0 = σk, tk;0 = tk, and for l≥ 1, do:

(a) Compute the lag to the next reaction via

τl =−ln rqσk;l−1

, [13]

where r is a random number uniformly picked in the interval ð0,1Þ andqσ =

Pσ′≠σ

qσ≠σ′ðXkÞ;

(b) pick σk,l with probability

pσk;l =qσk;l−1 ;σk;lðXkÞ

qσk;l−1

. [14]

(c) Set tk;l = tk;l−1 + τl and repeat till the first L such that tk;L > tk +Δt;then set σk+1 = σk;L and reset τL = tk +Δt − tk;L−1.

ii) Estimator: Given the trajectory of σ, estimate ηi,jðXkÞ and RjðXkÞ via

η̂i,jðXkÞ= 1Δt

Z tk+Δt

tk

1j=σði;sÞds

=1Δt

XLl=1

1j=σk;l ðiÞτl ;

[15]

and

R̂jðXkÞ= β−1Xi

βiΔt

Z tk+Δt

tk

1j=σði, sÞds

= β−1Xi

βi η̂i;jðXkÞ.[16]

iii) Macrosolver: Evolve Xk to Xk+1 using one time-step of size Δt in theMD integrator for Eq. 4 with RjðXkÞ replaced by the factor R̂jðXkÞ calcu-lated in the estimator. Then repeat the three steps above for each timestep k.

We will refer to this scheme as REMD with multiscale SSA, or REMD-MSSAfor short. It gives an accurate approximation of the solution to the limitingequation (Eq. 8) if ν � 1=Δt. Indeed, in this regime the sampled R̂jðXÞ andη̂i,jðXÞ will be close to their averaged values in Eqs. 9 and 12. We also notethat a different implementation of the microsolver, based on a Metropolis–Hastings algorithm can also be used; this implementation is discussed in

Supporting Information. It is closer to what is done in standard REMD, ex-cept that many swaps are attempted at each MD step instead of one swapevery few MD steps (i.e., this implementation still permits one to effectivelycompute in the infinite-swap limit).

Like standard REMD, REMD-MSSA is easily parallelizable. For example, theMD simulations of each replica can be performed on different nodes with anadditional node used to evolve σ. At each MD step, the energies VðxjÞ of theN replicas need to be communicated to the node evolving σ, and this nodesends back to those evolving Xk the N values of R̂jðXkÞ. The calculation ofaverages can be handled separately and requires the N values of AðxjÞ andthe NðN−1Þ values of η̂i,jðXkÞ.

A version REMD-MSSA has been implemented into Desmond (v3.4.0.2)(25) and used in the numerical examples presented next. To benchmarkthe method on a simple example, we also tested REMD-MSSA on a 1Dasymmetric double-well potential. These results are reported in Figs. S1–S3and show that REMD-MSAA combined with HMM permits one to reach theinfinite-swap limit and is more efficient than standard REMD.

Numerical ResultsAD in Vacuum. As a first test of REMD-MSSA, we studied blockedAD (ACE-ALA-CBX) in vacuum under the CHARMM forcefield with CMAP corrections (26, 27). The free energy profilealong the two dihedral angles ϕ and ψ for this system has beenmapped (e.g., in ref. 28). We performed tests of REMD-MSSAusing either 4 or 12 replicas, using the values of ν specified insimulation details, and running the simulations for 50 ns. Wecompared these results with those from an ISREMD simulation of50 ns with four replicas (in which there are only 24 permutations)using the analytic weights in ref. 10—such a calculation withISREMD is not feasible with 12 replicas because the number ofpermutations (≈ 4.8× 108) becomes too large to manage. Finally,we compared the results with those from a standard REMD with12 replicas and swapping rate of 0.5 ps−1.The reconstructed free energy profiles at 300 K are shown in Fig.

1A. The profiles calculated using REMD-MSSA with 4 and 12replicas (Fig. 1A, Left, Upper and Lower) and ISREMD with fourreplicas (Fig. 1A, Upper Right) are in close agreement. Theseprofiles also match those obtained with standard REMD (Fig. 1A,Lower Right) and they are in close agreement with those estimatedby other enhanced sampling methods under the same force field,including those calculated with 100 ns of metadynamics and 250-nsaccelerated MD (28). In particular, we correctly identified the fourknown local minima [using the same terminology as in Mackerellet al. (27)]: C5 located at ð−154,160Þ, C7eq at ð−81,66Þ, C7ax atð80, − 63Þ, and αR at ð−99,12Þ.To assess the sampling efficiency of our scheme, we first looked

at the times the temperatures of each replicas take to make around trip between their highest and lowest values; equivalently,this amounts to monitoring the values of σði, tÞ versus time foreach replica index i. It has been argued (16, 29–31) that theshorter these round-trip times, the faster the scheme will converge.A similar conclusion was reached in ref. 17, using a quantityclosely related to round-trip times: the lifetime, which gives theaverage time a temperature stays at a given value before beingupdated to another. In the context of ISREMD, the round-triptimes (and the lifetime) should be zero because this methodoperates in the infinite-swap limit: This means that we can use thistest to quantify how close REMD-MSSA is from this limit. In Fig.1B we show a typical trajectory σði, tÞ from REMD-MSSA and onefrom standard REMD (with swap rate at 0.5 ps−1) for the case of12 replicas (the full set of trajectories can be found in Fig. S4). Itcan be seen that the round-trip times are indeed much shorter inREMD-MSSA than in standard REMD. Similarly, the lifetime inREMD-SSA is roughly 10 as =10−17 s compared with 102 ps=10−10s for standard REMD. Another more stringent measure ofefficiency is the flatness of the temperature distributions at eachreplica, or equivalently that of σði, tÞ for each replica index i. In-deed, it is known (30, 31) that these distributions should be uni-form at equilibrium [i.e., for each i σði, tÞ should spend the sameproportion of time at every index value j]. Unlike the round-triptimes, which can be short even if the replicas have not equilibratedyet, these distributions can only be flat if the replicas are themselves

11746 | www.pnas.org/cgi/doi/10.1073/pnas.1605089113 Yu et al.

Dow

nloa

ded

by g

uest

on

Oct

ober

5, 2

020


http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1605089113/-/DCSupplemental/pnas.201605089SI.pdf?targetid=nameddest=SF1




at equilibrium. A representative distribution from REMD-MSSAand one from standard REMD, both calculated via time averagingof σði, tÞ over 0.8 ns of simulations, are shown in Fig. 1C (the full setof distributions can also be found in Fig. S5). As can be seen, thedistribution from REMD-MSSA is almost uniform and significantlyflatter than that from standard REMD, indicating that the formerhas converged after 0.8 ns, whereas the latter has not.

Folding of Protein G β-Hairpin in Explicit Solvent.As a second test weconsidered a β-hairpin peptide in explicit water, whose folding be-havior resembles that of larger protein, and which has been in-vestigated by many techniques, including standard REMD (32–37). Specifically, we studied the C-terminal fragment of the Igbinding domain B1 of protein G [Protein Data Bank (PDB) IDcode 2gb1]. This capped peptide sequence contains 16 residues,1-Ace-GEWTYDDATKTFTVTE-NMe-16, with 256 atoms.This β-hairpin is known as a hard-to-fold protein. In our REMD-MSSA simulations, the initial structures for the replicas are chosenfrom an unfolded ensemble that is generated from a high-tem-perature MD run. We then run REMD-MSSA and check how fastfolded state emerges and reasonable free energy profiles in variousorder parameters are obtained. In Fig. S6, we show traces of twoorder parameters, namely the number of β-strand H-bonds NH(defined in Eq. S18) and the rmsd of the Cα s from the nativestructure (PDB ID code 2gb1), obtained by monitoring the evo-lution for a few representative replicas in the REMD-MSSA

simulation. These traces indicate that our REMD-MSSA simula-tion experienced several folding/unfolding events within 50 ns. Thisis a significant speed-up compared with a bare MD simulation, inwhich folding event are expected to take place every 500 ns to 1 μs(38). We used the REMD-MSSA simulation data to generate FESalong the two order parameters, β-strand H-bonds NH and back-bone radius of gyration RG, from a 100-ns REMD-MSSA run (Fig.2A and Fig. S7). This FES captures the main features present inFES obtained from previous longer REMD simulations (32, 34, 39,40). Specifically, we observe basins corresponding to conformationsthat form no β-strand H-bonds, form one or two H-bonds aspartially folded β-sheet, and fully β-sheet with more than threeH-bonds. We also compare this FES with the one from 120-nsstandard REMD starting from the same initial structures (Fig. 2B).It can be seen that the folded basins are populated in REMD-MSSA whereas only a few samples are seen in standard REMD.The FES from REMD-MSSA is in better agreement with previouslonger simulation studies, indicating that REMD-MSSA can give amore converged FES than standard REMD within a 100-ns rundue to higher sampling efficiency.To test convergence, in Fig. 2 C and D we show the trajectories

and the distributions of σði, tÞ for one representative replica, forboth REMD-MSSA and standard REMD (more representativetrajectories and distributions are shown in Figs. S8 and S9). Theround-trip times and the lifetime observed in REMD-MSSA areagain shorter than those in REMD with swap rate 1 ps−1, withvalues similar to those reported for AD, and the temperature dis-tributions are also a significantly flatter in REMD-MSSA than inREMD. To measure how flat these distributions are across all ofthe replicas, we calculated the relative entropy of each, that is,SðpiÞ=

PjpiðjÞlogpiðjÞ=prefðjÞ, where piðjÞ is the distribution σði, tÞ

and prefðjÞ= 1=60 is the target uniform distribution: If piðjÞ= prefðjÞ,SðpiÞ= 0, otherwise it is positive, and the higher the value ofSðpiÞ= 0, the less flat piðjÞ is. These relative entropies are shownafter ordering from smallest to largest in Fig. 2E: They are signifi-cantly smaller for REMD-MSSA than for standard REMD, in-dicative of faster mixing by the former than by the latter.Remarkably, careful investigation of the FES in Fig. 2 leads us

to find a folded state that is sampled in REMD-MSSA but not instandard REMD. For reference, the native state’s structure andH-bond registry are shown in Fig. 3A; the native state is char-acterized by the β-strand H-bond pairs (E14,T1), (T12,T3), and(D10,T5). We detect three metastable states whose representativestructures are depicted in Fig. 3 as insets and labeled as βN, β2, andM. The respective backbone H-bond contact maps of these statesare shown in Fig. 3B. From these H-bond registries, we recognizeβN as the native state, whereas β2 represents a distinct, nonnativeβ-hairpin state that is not sampled in our 100 ns of standardREMD, and M a misfolded state. Fig. 3C shows the FES at 270 Kin the space of RG and NH, with states βN, β2, and M indicated. Inthis space, the three states are not well-distinguished, indicatingthat the (RG, NH) space is likely not optimal for understandingthe conformational distribution of the folded states. We show inFig. 3D the FES at 270 K in the space of two new collectivevariables, dCMAP

1 and dCMAP2 , respectively, built from all samples in

the REMD-MSSA at 270 K for which NH > 2. Here, dCMAP1

measures the L1 distance from state β2 and dCMAP2 from state βN

(see Supporting Information for their definition). In this space, thethree states are easily distinguishable. To the best of our knowl-edge, the β-sheet state we observe here (β2) has never been de-tected in any other simulations of this peptide. To verify that β2 ismetastable, we launched a 220-ns standard MD simulation at270 K from a β2 sample. The state remained stable for the durationof that simulation, as can be seen from the driftless traces of severalcollective variables that feature the conformational structure(Fig. 3E and Fig. S10). We also calculated the average potentialenergy difference between β2 and βN to be 1.4± 7.7 kcal/mol,which indicates they are energetically indistinguishable.Even though state β2 looks similar to βN, its H-bond registry is

different, as evidenced by the O–H contact maps in Fig. 3B. Theirdistinction is revealed in the (dCMAP

1 ,dCMAP2 ) space. To access β2 from

A

B C

Fig. 1. (A) Free energy surfaces along the two dihedral angles ϕ and ψ forAD in vacuum at 300 K. (Upper Left) Fifty-nanosecond REMD-MSSA simu-lation with four replicas. (Upper Right) Fifty-nanosecond ISREMD simula-tion with four replicas using the analytical weights calculated from Eq. 10.(Lower Left) Fifty-nanosecond REMD-MSSA simulation with 12 replicas.(Lower Right) Fifty-nanosecond standard REMD simulation with 12 replicasand a swapping rate of 0.5 ps−1. All plots have 60× 60 bins and filtered bystandard Gaussian kernel. Same level sets are used in the contour plots.(B) Trajectories σði, tÞ of one replica. (C) Distributions of σði, tÞ for the samereplica after 0.8 ns of simulation. In B and C, the orange curve is from REMD-MSSA and the blue one is from standard REMD, both with 12 replicas.


CHEM

ISTR

YBIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

Dow

nloa

ded

by g

uest

on

Oct

ober

5, 2

020



http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1605089113/-/DCSupplemental/pnas.201605089SI.pdf?targetid=nameddest=eqs18






βN, the amino-terminal strand must flip 180° so that its outside Oand H can point inside to form a hydrogen bond with H and O onthe carboxyl-terminal strand (C-strand). This implies that either statemust unfold first to break all H-bonds to get a chance to fold into theother. We indeed observe that β2 is populated predominantly bytransitions from the unfolded pool rather than from the βN pool.Another difference between the two folded states is the turnstructure. βN has a large π-turn (41) composed of D10, D9, A8,T7, and K6, whereas β2 has a small γ-turn (42) only involving D9,A8, and T7. Previous studies proposed folding of this β-hairpinuses either a “zipper” mechanism (36, 43–46) where hydrogenbonds form sequentially from the turn, or a “hydrophobic col-lapse” in which the folded state arises from a collapsed globule(32, 33, 35, 39, 47, 48). It is not clear from our simulation resultswhich of these is preferred, but the γ-turn of β2, being so tight, maybe unlikely to form before a few H-bonds first lock in a registrybetween the two strands. We therefore suggest a more detailedmechanistic study based on string method (49, 50) or transitionpath sampling (51) in the future. In addition to two folded β-sheetstates, we observed one extremely stable misfolded state, labeledM. Its contact map and conformation are shown in Fig. 3. Thismisfolded state is stabilized by three β-strand hydrogen bonds andtwo salt bridges between the K6 ammonium group and the car-boxylates on D10 and D9. Neither folded state displays any suchside-chain salt bridging, and we therefore suggest that state Mserves as a kinetic trap that slows the acquisition of authenticβ-sheet structure, either in the native or secondary conformation.The picture that arises in light of these results is of a funnel-like

landscape for the folding thermodynamics of this β-hairpin that hasat least two distinct and deep minima and a misfolded kinetic trap.

Concluding RemarksWe have proposed a multiscale variant of REMD in which theswaps are performed via SSA, which effectively permits one toreach the infinite-swap limit known to optimize sampling efficiency.This REMD-MSSA is as simple to implement and parallelize asstandard REMD and can be used with any set of temperatures,including sophisticated choices like those advocated, for example,in ref. 29 that would most likely improve the efficiency even more.It can also be used with parameters other than temperature (Figs.S11 and S12). Here its usefulness and efficiency were benchmarkedon test cases of various complexity. In particular, in the context offolding the G β-hairpin, our REMD-MSSA method was shown tooutperform standard REMD and other enhanced sampling tech-niques, as evidenced by the fact that it allowed us to identify afolded β structure previously undetected by these other methods.These results indicate that REMD-MSSA can be an efficient andaccurate alternative to existing enhanced sampling methods andshould be useful in a broad variety of MD applications.

Simulation DetailsAD. We used REMD-MSSA runs with temperatures in geo-metric progression between 300 K and 1,200 K; other, perhapsmore efficient, choices of temperatures could be used as well(29), but this one proved sufficient to our purpose. We used atime step of 2 fs for the Langevin dynamics and chose γ = 5ps−1.The bond lengths between hydrogen and heavy atoms were keptconstant via M-SHAKE (52). We choose ν to obtain about 104jumps of σ on average between two consecutive MD steps. Thisled to using ν= 10000=Δt with 4 replicas and ν= 500=Δt with 12replicas—νmust be larger with 4 rather than 12 replicas becausethe energy gaps between the replicas are larger in the formercase, implying that the jump rate in Eq. 5 is smaller. In the

A

C

E

B

D

Fig. 2. Free energy surfaces of β-hairpin at 270 K along the two order pa-rameters β-strand HB number and backbone radius of gyration RG (A) from120 ns of standard REMD at swapping rate 1 ps−1 and (B) from 100 ns of REMD-MSSA. (C) Trajectories of σði, tÞ for a given replica i. (D) Distributions of σði, tÞ forthis replica, from a 36-ns-long run. (E) Relative entropy of the distributions ofσði, tÞ (in an ascending order). In C–E the orange curve is for REMD-MSSA and theblue one for standard REMD.

Fig. 3. (A) Native β-hairpin structure with H-bond registry indicated. (B) H–Ocontact map of for βN, β2, and M; in each map, the horizontal axis indicates theresidue number of the amide H and the vertical axis the residue number of thecarbonyl O. (C) FES at 270 K in the space of number of β-strand H-bonds NH andthe backbone radius of gyration RG. (D) FES at 270 K in the space of two con-tact-map distances for conformations with more than two β-strand H-bonds.The insets in C and D are representative conformations of the native foldedstate, βN; the folded state β2; and a misfolded state, M. The FESs in C and D arebuilt from a 100-ns-long run of REMD-MSSA. (E) Traces of NH and RG from a220-ns-long bareMD simulation, where the blue is for β2 and the green is for βN.

11748 | www.pnas.org/cgi/doi/10.1073/pnas.1605089113 Yu et al.

Dow

nloa

ded

by g

uest

on

Oct

ober

5, 2

020





standard REMD simulation with 12 replicas under same tem-perature setting, we use the same MD parameters as above witha swapping rate 0.5 ps−1. All MD simulations were carried outwith Desmond (v3.4.0.2) (25).

β-Hairpin. We solvated the protein with 1,549 water molecules,neutralized with three Na+ ions, resulting in a total system size of4,906 atoms and box dimensions of 36.3 × 36.3 × 36.3 Å3. OPLS-AA is the force field for protein (53) and TIP3P for water (54).Long-range electrostatics are handled via particel-mesh Ewaldsummation with a 64× 64× 64 mesh. The cutoff for short-rangenonbonded interactions was set to 12 Å. The integration timestep is 2 fs with the bonds between hydrogen and heavy atomsconstrained via M-SHAKE (52). The system is equilibrated at270 K and 1 atm via 1-ns NPT Langevin dynamics then a 50-nsNVT simulation. REMD-MSSA simulations have 60 replicas

with temperatures ranging from 270 K to 690 K in a geometricprogression (55)—here, too, other choices of temperatures couldbe used (29). The initial structures for these replicas were takenfrom the 4-ns NVT simulation at 700 K. We used a friction co-efficient of 5 ps−1 in the Langevin thermostat and choose ν= 3=Δtto get about 1,000 temperature swaps per MD step. In the stan-dard REMD simulation with 60 replicas under the same tem-perature setting we use random-neighbor swapping at the rate of1 ps−1 and the sameMD parameters as above. All MD simulationswere carried out with Desmond (v3.4.0.2) (25).

ACKNOWLEDGMENTS. This work was supported by National Science Foun-dation Grant DMR-1207389 (to C.F.A., E.V.-E., and T.-Q.Y.), National Institutesof Health Grant GM100472, and National Science Foundation Grant DMS-1454939 (to J.L.). T.-Q.Y. acknowledges the technical support from ShenglongWang (New York University).

1. Giberti F, Salvalaglio M, Mazzotti M, Parrinello M (2015) Insight into the nucleation ofurea crystals from the melt. Chem Eng Sci 121:51–59.

2. Dror RO, et al. (2013) Structural basis for modulation of a G-protein-coupled receptorby allosteric drugs. Nature 503(7475):295–299.

3. Yu T-Q, Lapelosa M, Vanden-Eijnden E, Abrams CF (2015) Full kinetics of CO entry,internal diffusion, and exit in myoglobin from transition-path theory simulations.J Am Chem Soc 137(8):3041–3050.

4. Freddolino PL, Harrison CB, Liu Y, Schulten K (2010) Challenges in protein foldingsimulations: Timescale, representation, and analysis. Nat Phys 6(10):751–758.

5. Lindorff-Larsen K, et al. (2010) Improved side-chain torsion potentials for the Amberff99SB protein force field. Proteins 78(8):1950–1958.

6. Robertson MJ, Tirado-Rives J, Jorgensen WL (2015) Improved peptide and protein tor-sional energetics with the OPLSAA force field. J Chem Theory Comput 11(7):3499–3509.

7. Lopes P, Guvench O, MacKerell J, Alexander D (2015) Current status of protein forcefields for molecular dynamics simulations. Molecular Modeling of Proteins, Methodsin Molecular Biology, ed Kukol A (Springer, New York), Vol 1215, pp 47–71.

8. Shaw DE, et al. (2010) Atomic-level characterization of the structural dynamics ofproteins. Science 330(6002):341–346.

9. Stone JE, et al. (2007) Accelerating molecular modeling applications with graphicsprocessors. J Comput Chem 28(16):2618–2640.

10. Voelz VA, Bowman GR, Beauchamp K, Pande VS (2010) Molecular simulation of ab initioprotein folding for a millisecond folder NTL9(1-39). J Am Chem Soc 132(5):1526–1528.

11. Holdren JP, et al. (2010) Designing a digital future: Federally funded research anddevelopment in networking and information technology (The President’s Council ofAdvisors on Science and Technology, Washington, DC).

12. Sugita Y, Okamoto Y (1999) Replica-exchange molecular dynamics method for pro-tein folding. Chem Phys Lett 314(1–2):141–151.

13. Sindhikara D, Meng Y, Roitberg AE (2008) Exchange frequency in replica exchangemolecular dynamics. J Chem Phys 128(2):024103.

14. Abraham MJ, Gready JE (2008) Ensuring mixing efficiency of replica-exchange mo-lecular dynamics simulations. J Chem Theory Comput 4(7):1119–1128.

15. Rosta E, Hummer G (2009) Error and efficiency of replica exchange molecular dy-namics simulations. J Chem Phys 131(16):165102.

16. Sindhikara DJ, Emerson DJ, Roitberg AE (2010) Exchange often and properly in replicaexchange molecular dynamics. J Chem Theory Comput 6(9):2804–2808.

17. Zhang BW, et al. (2016) Simulating replica exchange: Markov state models, proposalschemes, and the infinite swapping limit. J Phys Chem B.

18. Dupuis P, Liu Y, Plattner N, Doll JD (2012) On the infinite swapping limit for paralleltempering. SIAM J Multiscale Model Simul 10:986–1022.

19. Plattner N, et al. (2011) An infinite swapping approach to the rare-event samplingproblem. J Chem Phys 135(13):134111.

20. Lu J, Vanden-Eijnden E (2013) Infinite swapping replica exchange molecular dynamicsleads to a simple simulation patch using mixture potentials. J Chem Phys 138(8):084105.

21. Gillespie DT (1977) Exact stochastic simulation of coupled chemical reactions. J PhysChem 81(25):2340–2361.

22. Weinan E, Engquist WEB (2003) The heterogeneous multiscale methods. CommunMath Sci 1(1):87–132.

23. Vanden-Eijnden E (2003) Numerical techniques for multi-scale dynamical systems withstochastic effects. Commun Math Sci 1(2):385–391.

24. Engquist WEB, Li X, Ren W, Vanden-Eijnden E (2007) Heterogeneous multiscalemethods: A review. Commun Comput Phys 2(3):367–450.

25. Bowers KJ, et al. (2006) Scalable algorithms for molecular dynamics simulations oncommodity clusters. Proceedings of the 2006 ACM/IEEE Conference on Super-computing (Assoc for Computing Machinery, New York).

26. MacKerell AD, et al. (1998) All-atom empirical potential for molecular modeling anddynamics studies of proteins. J Phys Chem B 102(18):3586–3616.

27. Mackerell AD, Jr, Feig M, Brooks CL, 3rd (2004) Extending the treatment of backboneenergetics in protein force fields: limitations of gas-phase quantum mechanics inreproducing protein conformational distributions in molecular dynamics simulations.J Comput Chem 25(11):1400–1415.

28. Wang Y, Harrison CB, Schulten K, McCammon JA (2011) Implementation of acceleratedmolecular dynamics in NAMD. Comput Sci Discov 4(1):015002.

29. Katzgraber HG, Trebst S, Huse Da, Troyer M (2006) Feedback-optimized parallel tem-pering Monte Carlo. J Stat Mech Theory Exp 2006, 10.1088/1742-5468/2006/03/P03018.

30. Doll JD, Plattner N, Freeman DL, Liu Y, Dupuis P (2012) Rare-event sampling: Occu-pation-based performance measures for parallel tempering and infinite swappingMonte Carlo methods. J Chem Phys 137(20):204112.

31. Doll JD, Dupuis P (2015) On performance measures for infinite swapping Monte Carlomethods. J Chem Phys 142(2):024111.

32. Zhou R, Berne BJ, Germain R (2001) The free energy landscape for beta hairpinfolding in explicit water. Proc Natl Acad Sci USA 98(26):14931–14936.

33. Nguyen PH, Stock G, Mittag E, Hu CK, Li MS (2005) Free energy landscape and foldingmechanism of a beta-hairpin in explicit water: A replica exchange molecular dynamicsstudy. Proteins 61(4):795–808.

34. Bussi G, Gervasio FL, Laio A, Parrinello M (2006) Free-energy landscape for betahairpin folding from combined parallel tempering and metadynamics. J Am Chem Soc128(41):13435–13441.

35. Best RB, Mittal J (2010) Balance between alpha and beta structures in ab initio proteinfolding. J Phys Chem B 114(26):8790–8798.

36. Best RB, Mittal J (2011) Free-energy landscape of the GB1 hairpin in all-atom explicitsolvent simulations with different force fields: Similarities and differences. Proteins79(4):1318–1328.

37. Hatch HW, Stillinger FH, Debenedetti PG (2014) Computational study of the stabilityof the miniprotein trp-cage, the GB1 β-hairpin, and the AK16 peptide, under negativepressure. J Phys Chem B 118(28):7761–7769.

38. Thukral L, Smith JC, Daidone I (2009) Common folding mechanism of a beta-hairpinpeptide via non-native turn formation revealed by unbiased molecular dynamicssimulations. J Am Chem Soc 131(50):18147–18152.

39. Bonomi M, Branduardi D, Gervasio FL, Parrinello M (2008) The unfolded ensembleand folding mechanism of the C-terminal GB1 beta-hairpin. J Am Chem Soc 130(42):13938–13944.

40. Ardevol A, Tribello GA, Ceriotti M, Parrinello M (2015) Probing the unfolded con-figurations of a β-hairpin using sketch-map. J Chem Theory Comput 11(3):1086–1093.

41. Rajashankar KR, Ramakumar S (1996) Pi-turns in proteins and peptides: Classification,conformation, occurrence, hydration and sequence. Protein Sci 5(5):932–946.

42. Némethy G, Printz MP (1972) The γ turn, a possible folded conformation of thepolypeptide chain. Comparison with the β turn. Macromolecules 5:755–758.

43. Muñoz V, Thompson PA, Hofrichter J, Eaton WA (1997) Folding dynamics andmechanism of beta-hairpin formation. Nature 390(6656):196–199.

44. Muñoz V, Henry ER, Hofrichter J, Eaton WA (1998) A statistical mechanical model forbeta-hairpin kinetics. Proc Natl Acad Sci USA 95(11):5872–5879.

45. Du D, Zhu Y, Huang CY, Gai F (2004) Understanding the key factors that control therate of beta-hairpin folding. Proc Natl Acad Sci USA 101(45):15915–15920.

46. Du D, Tucker MJ, Gai F (2006) Understanding the mechanism of beta-hairpin foldingvia phi-value analysis. Biochemistry 45(8):2668–2678.

47. Klimov DK, Thirumalai D (2000) Mechanisms and kinetics of beta-hairpin formation.Proc Natl Acad Sci USA 97(6):2544–2549.

48. Juraszek J, Bolhuis PG (2009) Effects of a mutation on the folding mechanism of abeta-hairpin. J Phys Chem B 113(50):16184–16196.

49. Weinan E, Ren W, Vanden-Eijnden E (2002) String method for the study of rareevents. Phys Rev B 66:52301.

50. Maragliano L, Fischer A, Vanden-Eijnden E, Ciccotti G (2006) String method in collectivevariables: Minimum free energy paths and isocommittor surfaces. J Chem Phys 125(2):24106.

51. Bolhuis PG, Chandler D, Dellago C, Geissler PL (2002) Transition path sampling: Throwingropes over rough mountain passes, in the dark. Annu Rev Phys Chem 53:291–318.

52. Lambrakos S, Boris J, Oran E, Chandrasekhar I, Nagumo M (1989) A modified shakealgorithm for maintaining rigid bonds in molecular dynamics simulations of largemolecules. J Comput Phys 85(2):473–486.

53. Jorgensen WL, Tirado-Rives J (1988) The OPLS [optimized potentials for liquid simu-lations] potential functions for proteins, energy minimizations for crystals of cyclicpeptides and crambin. J Am Chem Soc 110(6):1657–1666.

54. Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML (1983) Comparisonof simple potential functions for simulating liquid water. J Chem Phys 79:926–935.

55. Kofke DA (2002) On the acceptance probability of replica-exchange Monte Carlotrials. J Chem Phys 117(15):6911–6914.


CHEM

ISTR

YBIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

Dow

nloa

ded

by g

uest

on

Oct

ober

5, 2

020

Documents

Multiscale implementation of infinite-swap replica ... · Multiscale implementation of infinite-swap replica exchange molecular dynamics Tang-Qing Yua, Jianfeng Lub,c,d, Cameron F