11
Computer Physics Communications 170 (2005) 175–185 www.elsevier.com/locate/cpc Parallel implementation of molecular dynamics simulation for short-ranged interaction Jong-Shinn Wu , Yu-Lin Hsu, Yun-Min Lee Department of Mechanical Engineering, National Chiao-Tung University,Hsinchu 30050, Taiwan Received 13 January 2005; received in revised form 17 March 2005; accepted 24 March 2005 Available online 13 June 2005 Abstract A parallel molecular dynamics simulation method, designed for large-scale problems, employing dynamic spatial domain decomposition for short-ranged molecular interactions is proposed. In this parallel cellular molecular dynamics (PCMD) sim- ulation method, the link-cell data structure is used to reduce the searching time required for forming the cut-off neighbor list as well as for domain decomposition, which utilizes the multi-level graph-partitioning technique. A simple threshold scheme (STS), in which workload imbalance is monitored and compared with some threshold value during the runtime, is proposed to decide the proper time for repartitioning the domain. The simulation code is implemented and tested on the memory-distributed parallel machine, e.g., PC-cluster system. Parallel performance is studied using approximately one million L-J atoms in the condensed, vaporized and supercritical states. Results show that fairly good parallel efficiency at 49 processors can be obtained for the condensed and supercritical states (60%), while it is comparably lower for the vaporized state (40%). 2005 Elsevier B.V. All rights reserved. Keywords: Parallel molecular dynamics; Short-ranged interaction; Dynamic domain decomposition; Parallel efficiency; Simple threshold scheme 1. Introduction Classical molecular dynamics (MD) has been wide- ly used to simulate properties of liquids, solids, and molecules in several research disciplines, including material science [1–4], biological technology [5–7], and heat and mass transfer [8–10] among others. It be- * Corresponding author. Tel.: +886-3-573-1693; fax: +886-3- 572-0634. E-mail address: [email protected] (J.-S. Wu). comes increasingly important due to the burgeoning of nanoscience and nanotechnology, in which the size is often too small to characterize experimentally or to simulate properly by continuum methods. In applying the MD method to model these nanoscale phenomena, two main issues need to be resolved. The first is how to choose a physically correct potential model to truly represent the inter-particle interaction. The second is how to reduce the often time-consuming computation to an acceptable level. In this paper, we focus on re- solving the second issue by developing an efficient parallel molecular dynamics method. 0010-4655/$ – see front matter 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.cpc.2005.03.110

Parallel implementation of molecular dynamics … implementation of molecular dynamics simulation for short-ranged interaction Jong-Shinn Wu∗, Yu-Lin Hsu, Yun-Min Lee Department

Embed Size (px)

Citation preview

Page 1: Parallel implementation of molecular dynamics … implementation of molecular dynamics simulation for short-ranged interaction Jong-Shinn Wu∗, Yu-Lin Hsu, Yun-Min Lee Department

c

domainD) sim-

hbor listschemeposed toistributed

s in thee obtained

old

Computer Physics Communications 170 (2005) 175–185

www.elsevier.com/locate/cp

Parallel implementation of molecular dynamics simulationfor short-ranged interaction

Jong-Shinn Wu∗, Yu-Lin Hsu, Yun-Min Lee

Department of Mechanical Engineering, National Chiao-Tung University, Hsinchu 30050, Taiwan

Received 13 January 2005; received in revised form 17 March 2005; accepted 24 March 2005

Available online 13 June 2005

Abstract

A parallel molecular dynamics simulation method, designed for large-scale problems, employing dynamic spatialdecomposition for short-ranged molecular interactions is proposed. In this parallel cellular molecular dynamics (PCMulation method, the link-cell data structure is used to reduce the searching time required for forming the cut-off neigas well as for domain decomposition, which utilizes the multi-level graph-partitioning technique. A simple threshold(STS), in which workload imbalance is monitored and compared with some threshold value during the runtime, is prodecide the proper time for repartitioning the domain. The simulation code is implemented and tested on the memory-dparallel machine, e.g., PC-cluster system. Parallel performance is studied using approximately one million L-J atomcondensed, vaporized and supercritical states. Results show that fairly good parallel efficiency at 49 processors can bfor the condensed and supercritical states (∼60%), while it is comparably lower for the vaporized state (∼40%). 2005 Elsevier B.V. All rights reserved.

Keywords: Parallel molecular dynamics; Short-ranged interaction; Dynamic domain decomposition; Parallel efficiency; Simple threshscheme

de-nding

-

-3-

ingsizeto

ingena,ow

ulyd isionre-ent

1. Introduction

Classical molecular dynamics (MD) has been wily used to simulate properties of liquids, solids, amolecules in several research disciplines, includmaterial science[1–4], biological technology[5–7],and heat and mass transfer[8–10]among others. It be

* Corresponding author. Tel.: +886-3-573-1693; fax: +886572-0634.

E-mail address: [email protected](J.-S. Wu).

0010-4655/$ – see front matter 2005 Elsevier B.V. All rights reserveddoi:10.1016/j.cpc.2005.03.110

comes increasingly important due to the burgeonof nanoscience and nanotechnology, in which theis often too small to characterize experimentally orsimulate properly by continuum methods. In applythe MD method to model these nanoscale phenomtwo main issues need to be resolved. The first is hto choose a physically correct potential model to trrepresent the inter-particle interaction. The seconhow to reduce the often time-consuming computatto an acceptable level. In this paper, we focus onsolving the second issue by developing an efficiparallel molecular dynamics method.

.

Page 2: Parallel implementation of molecular dynamics … implementation of molecular dynamics simulation for short-ranged interaction Jong-Shinn Wu∗, Yu-Lin Hsu, Yun-Min Lee Department

176 J.-S. Wu et al. / Computer Physics Communications 170 (2005) 175–185

as afor

thergychem.les,hus-berre-

dsheto

ofd toad-areize

e.g.,oner--at-m-orsllym-iono-es

od;thein-

sue

th-is-pre-ob-nsnur-in

thehehenveryel

ofetter

hmu-inal-ain-ised,

ial-oalblees-h ashers

odheof

henma-

themwn-

Molecular dynamics simulates theN -particle(atoms or molecules) system treating each particlepoint mass, and Newton’s equations are integratedall particles to compute the motion. The physics ofmodeled system is contained in the potential enefunctional from which the Newton’s equation for eaparticle is derived, assuming a conservative systBeing averaged from the trajectories of the particvarious useful static and dynamic properties can tbe derived[11]. As mentioned previously, MD simulation is very time-consuming due to the large numof time steps and possibly large number of atomsquired to complete a meaningful simulation. In liquiand solids, MD simulation is required to resolve tvibration of the atoms, which limits the time stepbe on the order of fentoseconds. Many hundredsthousands or even millions of time steps are needesimulate a nanosecond on the “real” time scale. Indition, hundreds of thousands or millions of atomsneeded in the MD simulation, even for a system son the nanometer scale.

In the past, there has been considerable effort ([12]) that concentrated on parallelizing MD simulation memory-distributed machines by taking the inhent parallelism[13,14]. Generally, parallel implementation of the MD method can be divided into three cegories, including atom decomposition, force decoposition or domain decomposition among process[12]. The atom decomposition method is generasuitable for small-scale problems. In the force decoposition method, it is based on a block-decompositof the force matrix rather than a row-wise decompsition in the atom-decomposition method. It improvthe O(N) scaling to O(N/

√P ). It generally performs

much better that the atom decomposition methhowever, there exist some disadvantages. First,number of processors has to be the square of anteger. Second, load imbalance may become an isFrom previous experience[12], it is suitable for small-and intermediate-size problems.

In the spatially static domain decomposition meod, simulation domains are physically divided and dtributed among processors. This method so far resents the best parallel algorithm for large-scale prlems in MD simulation for short-ranged interactio[12]; however, it only works well for a system, iwhich the atoms move only a very short distance ding simulation or possibly are distributed uniformly

.

space. MD simulation of solids represents one oftypical examples. In contrast, if the distribution of tatoms has much variation in configuration space, tthe load imbalance among processors developsquickly during simulation, which lowers the parallperformance. Thus, a parallel MD method capableadaptive domain decomposition may represent a bsolution for resolving this difficulty.

In the current study, we present a parallel algoritfor MD simulation, named parallel cellular moleclar dynamics (PCMD), employing dynamic domadecomposition to address the issue of load imbance among processors in the spatially static domdecomposition method. This parallel algorithmtested using one million L-J atoms in the condensvaporized and supercritical state, which all are initized from a FCC-arranged atomic structure. Our gis to develop a parallel MD method that is scalafor very large problems and large number of procsors on memory-distributed parallel machines, suca PC-cluster system, which is accessible to researcin general.

This paper is organized as follows. The MD methis described next, followed by the description of tproposed parallel implementation in detail. Resultsparallel performance for the different test cases is tpresented and discussed. Finally, the paper is sumrized with some important conclusions.

2. Numerical method

2.1. Molecular dynamics simulation

Molecular dynamics simulation is used to solvedynamics of theN -particle (atom or molecule) systeby integrating Newton’s equation of motion, as shoin Eqs.(1) and (2), for each particle to obtain the position of the particle as a function of time,

(1)

mi

d �vi

dt=

j

F2(�ri , �rj )

+∑

j

k

F3(�ri , �rj , �rk) + · · · ,

(2)d�ridt

= �vi,

wheremi is the mass of atomi, �ri and �vi are its po-sition and velocity vectors, respectively.F is a force

2
Page 3: Parallel implementation of molecular dynamics … implementation of molecular dynamics simulation for short-ranged interaction Jong-Shinn Wu∗, Yu-Lin Hsu, Yun-Min Lee Department

J.-S. Wu et al. / Computer Physics Communications 170 (2005) 175–185 177

ms,ac-d ifriv-

s aer

edtal

isithe-tep.

er aer

ionstedicin-

on.er-D

arex-

as-o-

esso bert-in-

ngunt

liste

chhichgh-

weime

theingthe

ga-mork

a-gee

n aornd-forr 26f istheat-

st.oreachc-ultsffi-

dy-

ar-si-asnlyupba-

iedon

gasn-

term describing pair-wise interactions between atoF3 is a force term describes the three-body intertions, and many-body interactions can be addeneeded. The force on each atom is the spatial deative of potential energy that is generally written afunction of the position of the atom itself and othatoms.

In practice, only one or a few terms in Eq.(1) arekept to simplify the problem. They are constructfrom either fitting some properties to experimendata or a quantum computation. Thus, classical MDintrinsically cheaper in computation as compared wab initio electronic structure calculations, which rquire solving Schrodinger’s equation at each time s

2.1.1. Interaction potential modelPotential energy can also be categorized as eith

short- or long-ranged interaction in nature. The form(e.g., the L-J potential) only considers the interactbetween the atoms geometrically nearby the intereatom, while the latter (e.g., Coulomb potential in ionsolids or biological systems) needs to consider theteraction far away from the atom under consideratiBecause of the simplicity of the short-ranged intaction, it has been applied extensively in many Msimulations in the past[15], especially for liquids andsolids. In contrast, long-ranged interaction modelsnot commonly used in classical MD simulations ecept for polarized molecules, such as water[16]. Inthis paper, we are only interested in dealing with clsical MD using short-ranged interactions. The L-J ptential will be used throughout the current study unlotherwise specified. The proposed method can alsapplied to other potential interactions if they are shoranged. Strategies employed for the short-rangedteraction in the current study are introduced next.

2.1.2. Concept of neighbor listIn principle, all potentials on each atom resulti

from all other atoms have to be taken into accowhen computing the force—this scales as O(N2). Inpractice, each atom stores an array containing the(or neighbor list) [11] of atoms located within somcutoff distance(rc = 2.5σ). Note thatσ is the zero-potential distance of L-J potential. Basically, eaneighbor list has to be updated at each time step, wis very time-consuming. Thus, the radius of this neibor list is often extended to ber + δ, whereδ = 0.3σ ,

c

to reduce the frequency of updating. In contrast,only have to update each neighbor list every 8–10 tsteps using the above choice. This reduces greatlyfrequency of neighbor atom search. However, buildthe neighbor list through searching all the atoms insystem is also very time-consuming.

2.1.3. Concept of link-cell + concept of neighbor listLink-cell data structure provides a means of or

nizing information (position and velocities) of atointo a form that avoids most of the unnecessary wand thus reduces the computational cost to O(N). Thegeneral idea of the link cell is to divide the simultion domain into a lattice of small cells having edlength exceedingrc. In the current study, we take thcell-edge length asrc + δ, which coincides with thedesignated radius of neighbor list. Thus, atoms icell only interact with atoms either in the same cellcells nearby. Details of implementation can be fouin Rapaport[11] and only a brief description is provided here. During the update of each neighbor listeach atom, we only search the atoms in the otheneighboring cells and the cell where the atom itsellocated, rather than search through all the atoms incomputational domain. This can also reduce dramically the time required to build up the neighbor liThe price we have to pay is to add an array to stthe atoms in each cell and to update this array etime step, which is straightforward in the cell struture having equal edge length for all cells. Past resshow that this combination represents the most ecient technique nowadays in classical molecularnamics simulation[11].

2.2. Parallel molecular dynamics simulation

In this current study, we focus on developing a pallel MD method using dynamic domain decompotion by taking advantage of the existing link-cellsmentioned earlier. In this proposed method, not oare the cells used to reduce the cost for buildingthe neighbor list, but also are used to serve as thesic partitioning units. A similar idea has been applin the parallel implementation of the direct simulatiMonte Carlo (DSMC) method (e.g.,[19]), which is aparticle simulation technique often used in rarefieddynamics. Note that in the following IPB stands for iterprocessor boundary. General procedures (Fig. 1) inthe sequence include:

Page 4: Parallel implementation of molecular dynamics … implementation of molecular dynamics simulation for short-ranged interaction Jong-Shinn Wu∗, Yu-Lin Hsu, Yun-Min Lee Department

178 J.-S. Wu et al. / Computer Physics Communications 170 (2005) 175–185

Fig. 1. Proposed flow chart for parallel molecular dynamics simulation using dynamic domain decomposition.

ses-

d,y

ors,

nd

the

1. Initialize the positions and velocities of all atomand equally distribute the atoms among procsors;

2. Check if load balancing is required. If requirethen first repartition the domain, followed b

communicate cell/atom data between process

renumber the local cell and atom numbers, a

update the neighbor list for each atom due to

data migration;

Page 5: Parallel implementation of molecular dynamics … implementation of molecular dynamics simulation for short-ranged interaction Jong-Shinn Wu∗, Yu-Lin Hsu, Yun-Min Lee Department

J.-S. Wu et al. / Computer Physics Communications 170 (2005) 175–185 179

s in

list

and

le

x-ula-ent

om8. If

dedle.saryandim-as

nehese

of

forasin

ur-n--c-edh-ta-ver-thissh-the

art-AterxtISwnthesor

eithlyingd tolistcher-thetan,

rat-t betina-dataso-

ir-reas

oadginges.

eshect

p atl, itetoohef

nsIn-ed

de-

3. Receive positions and velocities of other atomthe neighbor list for all cells near the IPB;

4. Compute force for all atoms;5. Send force data to other atoms in the neighbor

for all cells near the IPB;6. Integrate the acceleration to update positions

velocities for all atoms;7. Apply boundary conditions to correct the partic

positions if necessary;8. Check if preset total runtime is exceeded. If e

ceeded, then output the data and stop the simtion. If not, check if it is necessary to rebuild thneighbor list of all atoms using the most receatom information;

9. If it is necessary to rebuild the neighbor list (N =8 in the current study), then communicate atdata near the IPB and repeated the steps 2–not necessary, then repeat steps 3–8.

Note that the subroutines included in the sharegion (Fig. 1) represent the load-balancing moduIn the above procedures, in addition to the necesdata communication when atoms cross the IPBatom/cell data near the IPB, there are two moreportant steps in the proposed parallel MD methodcompared with the serial MD implementation. Ois how to repartition the domain effectively and tother is the decision policy for repartitioning. Thetwo steps are described next, respectively.

2.2.1. Repartitioning schemeUnder the framework of graph theory, centers

each link-cell are considered as thevertices and thelines connecting them are considered asedges. Eachvertex and edge can be assigned with a weightthe purpose of partitioning. Graph partitioning hbeen found very useful for an unstructured meshthe computing community in the past. In the crent study, a parallel multilevel graph-partitioning rutime library, PMETIS[18], is used as the repartitioning tool in our PCMD code. Thus, the data struture of the link-cell is reconfigured as unstructurfor graph-partitioning purpose. The multilevel grappartitioning scheme uses the multilevel implemention that matches and combines pairs of adjacenttices to define a new graph and recursively iterateprocedure until the graph size falls under some threold. The coarsest graph is then partitioned and

partition is successively refined on all the graphs sting with the coarsest and ending with the original.evolution of levels, the final partition of the coarsgraph is used to give the initial partition for the nefiner level. A corresponding parallel version, PMET[18], uses an iterative optimization technique knoas relative gain optimization, which both balancesworkload and attempts to minimize the inter-procescommunication overhead.

This parallel multilevel graph partition runs on thsingle program multiple data (SPMD) paradigm wmessage passing in the expectation that the undermesh will do the same. Each processor is assignea physical sub-domain and stores a double-linkedof the vertices within that sub-domain. However, eaprocessor also maintains a “halo” of neighboring vtices in other sub-domains. For the serial version,migration of vertices simply involves transferring dafrom one linked-list to another. In the parallel versiothis process is far more complicated than just miging vertices. The newly created halo vertices muspacked into messages as well, sent off to the destion processor, unpacked, and the pointer basedstructure recreated there. This provides a possiblelution to the problem of adaptive load balancing[18].

2.2.2. Decision policy for repartitioningMD represents a typical dynamic (or adaptive)

regular problem, i.e. the workload distributions aknown only at runtime, and can change dramaticallysimulation proceeds, leading to a high degree of limbalance among the processors. This load-chansituation is even obvious in simulating liquids or gasThus, the partitioning runtime library, PMETIS[18],described in the above is used to repartition the mbased on some sort of decision policy. In the dirsimulation Monte Carlo (DSMC) simulation[17], ithas been shown that a decision policy based on storise (SAR)[19] works well for improving the paralleperformance. However, from our preliminary studydoes not work very well in the MD simulation sincthe domain repartition is too frequent and, thus,costly in practice. In addition, the data locality of tMD simulation using link-cells is lower than that othe DSMC simulation, which only considers collisio(interaction) among atoms within the same cell.stead, a simple threshold-like decision policy, term“simple threshold scheme” (STS), is designed to

Page 6: Parallel implementation of molecular dynamics … implementation of molecular dynamics simulation for short-ranged interaction Jong-Shinn Wu∗, Yu-Lin Hsu, Yun-Min Lee Department

180 J.-S. Wu et al. / Computer Physics Communications 170 (2005) 175–185

plye

e.g.msphni-orsataingingnowngonem-ions-thee

heiores-theics

aat

ingMPI)

tedblethats-

areof

la-

e,-

in

por-

forer-fnter,la-

tialedon

dur-theula-

esen-

ngForhellelsed.25

, pe-he

al

riti-

c-

thec-

th a

esid-po-

cide the proper time to repartition. This scheme simasks for domain repartitioning if the workload in somprocessor is detected over the specified threshold (±20% of the average workload). The number of atoin each link-cell is used as the weighting for grapartitioning. Right after the repartition, the commucation between geometrically neighboring processis required to transfer the cell number and particle dto the destination processor, followed by renumberof the cell and particle data into the local numberin the destination processor. Since all processors kthe geometrical information of all cells, renumberiof the received cells in some specific processor is dsimply by adding up sequentially the local cell nubers for the new cells (as shown in the shaded regin Fig. 1). Similarly, the cells sent out to other procesors are simply removed sequentially by copyinginformation of the final cell onto the memory of thsent cells. This decision policy for repartitioning tdomain is inherently advantageous in which no prknowledge of the evolution of the problem is necsary to determine the repartitioning interval, andrepartitioning can be expected to follow the dynamof the problem.

The current PCMD code is implemented on64-bit Itanium PC-cluster system running Linux OSthe National Center for High-performance Computin Taiwan (64-node, dual processor and 4 GB RAper node). Standard message-passing interface (Mis used for data communication. It is thus expecthat the current PCMD code should be highly portaamong the memory-distributed parallel machinesis running with Linux (or equivalent) operating sytem.

3. Results and discussions

3.1. Problem descriptions and simulation conditions

Three test problems, including L-J(12,6) atomsin condensed, vaporized and supercritical states,chosen to test the current parallel implementationthe molecular dynamics simulation. Related simution conditions are summarized inTable 1. Densities(ρ∗ = Nσ 3/V ) are all taken to be 0.7 for conveniencwhile temperatures(T ∗ = kT /ε) are varied to represent different states. Note that the number of cells

,

Table 1Simulation conditions for three different cases (condensed, vaized and supercritical states)

Temp.(T ∗) Density(ρ∗) No. of link-cells

Condensed 0.7 0.7 75× 75× 75Vaporized 1.1 0.7 75× 75× 75Supercritical 0.7 0.7 39× 39× 39

Table 1represents the number of link-cells usedsimulation. Note that the simulation volume is genally much larger (∼10 times) than the initial volume oFCC-arranged atomic structure near the system ceexcept the case of supercritical state, which simution volume is approximately the same as the iniFCC volume. Each atom of an initially FCC-arrangatomic structure is given random velocities basedthe desired temperature and starts to run for 105 timesteps. Rescaling the kinetic energy of the systeming runtime enforces the desired temperature ofsimulation system for all three test cases. The simtion time step is 0.005 of the dimensionless MD timscale (or about 0.005× 10 fs). Current test problemrepresent a more severe test for the parallel implemtation of the MD method due to the rapidly changiworkload among processors during the simulation.example, it is rather difficult to parallel compute tcondensed state efficiently if a conventional paraparadigm such as static domain decomposition is uAll results presented below were obtained usingprocessors, unless otherwise specified. In additionriodic boundary conditions are used to simplify tanalysis in the current study.

3.2. Snapshots of MD evolution

Snapshots of the atomic distribution at the fintime step (105 time steps) of the L-J atoms (∼1 millionatoms) for the condensed, vaporized and superccal states are illustrated, respectively, inFigs. 2(b)–(d).Note thatFig. 2(a) shows the initial FCC atomic struture and the simulation volume (75× 75× 75 cells)for both the condensed and vaporized states. Forsupercritical state, the size of the initial FCC struture is the same as the other two cases, but wimuch smaller simulation volume (39× 39× 39 cells).It is clear that the initial and final atomic structurdiffer to a large extent for all three cases consered. This renders the static spatial domain decom

Page 7: Parallel implementation of molecular dynamics … implementation of molecular dynamics simulation for short-ranged interaction Jong-Shinn Wu∗, Yu-Lin Hsu, Yun-Min Lee Department

J.-S. Wu et al. / Computer Physics Communications 170 (2005) 175–185 181

densed

(a) (b)

(c) (d)

Fig. 2. Atomic structures of three test cases at 105 time steps: (a) initial FCC structure for the condensed and vaporized states; (b) final constate; (c) final vaporized state; (d) final supercritical state.

ense

n-ra-er

izedly

setriti-guta-orthe

mu-

ind,thecut

eseionfi-

e, inheeys-ainred

tion

thenalofo-

ntsheionslaindy,

sition very inefficient or even useless, which is oftused for the large-scale MD simulation. In the caof the condensed state (Fig. 2(b)), a spherical liquiddroplet (∼25 nm in diameter) is formed at the ceter of the simulation system, around which compatively few vaporized atoms spread, due to the lowpreset temperature (0.7). In the case of the vaporstate (Fig. 2(c)), the atoms spread rather randomuniform in the simulation domain due to the prehigher temperature (1.1). In the case of the superccal state (Fig. 2(d)), several irregular cavities havinvaporized atoms inside are present in the comptional domain, in which the state is neither liquid ngas in essence. Note that the spatial distribution ofcavities does not remain unchanged during the silation.

3.3. Evolution of domain decomposition

Fig. 3(a)–(b), (c)–(d), and (e)–(f) show the domadecompositions (500�t , and final) for the condensevaporized and supercritical states, respectively, onsystem surface and on some special cross sections

-

ting through the system center. By comparing thfigures, we can find that the domain decompositchanges to a large extent from the initial to thenal state, except the case of the supercritical statwhich the initial FCC structure almost occupies tsimulation volume. InFig. 3(b) (condensed state), thdomain size of the droplet near the center the stem center is very small as compared with the domsize in other regions since the atoms are clustenear the system center. In contrast, the distribuof the domain size is relatively uniform inFig. 3(d)(vaporized state) in the simulation volume sinceatoms are spreading randomly in the computatiodomain.Fig. 3(f) shows the domain decompositionthe supercritical state, in which the distribution of dmain is similar to the vaporized state (Fig. 3(d)) butwith higher ordered structure. The above argumeabout the evolution of domain decomposition in ttest cases, along with the time-dependent distributof the number of atoms in each processor, can expthe parallel performance obtained in the current stuwhich are introduced next.

Page 8: Parallel implementation of molecular dynamics … implementation of molecular dynamics simulation for short-ranged interaction Jong-Shinn Wu∗, Yu-Lin Hsu, Yun-Min Lee Department

182 J.-S. Wu et al. / Computer Physics Communications 170 (2005) 175–185

nal statesteps (super-

(a) (b)

(c) (d)

(e) (f)

Fig. 3. Evolution of domain decomposition simulation domain on the system surface and special sections for the different atomic fi(25 processors): (a) at 500 steps (condensed); (b) final (condensed); (c) at 500 steps (vaporized); (d) final (vaporized); (e) at 500 scritical); (f) final (supercritical).

ofr ofed,ors

ber).

pe-nlyessral

3.4. Distribution of time history of atom numbers

Fig. 4(a)–(c) show the distribution of the numberatoms in each processor as a function of the numbesimulation time steps, respectively, for the condensvaporized and supercritical states using 25 process

,

along with the upper/lower thresholds of the numof atoms (±20%, dotted lines, in the current studyIn these figures, we do not intend to identify any scific processor used in the simulation. We are ointerested in showing the general trend of effectivenof the dynamic domain decomposition in a gene

Page 9: Parallel implementation of molecular dynamics … implementation of molecular dynamics simulation for short-ranged interaction Jong-Shinn Wu∗, Yu-Lin Hsu, Yun-Min Lee Department

J.-S. Wu et al. / Computer Physics Communications 170 (2005) 175–185 183

sed state;

(a) (b)

(c)

Fig. 4. Distribution of the number of atoms in each processor as a function of simulation time steps (25 processors): (a) conden(b) vaporized state; (c) supercritical state.

e-rv-ral,lyes-ors.w)m-any

n inberer

ivelti-

par-atcanbutm

or-

he-for

sense. In addition, we can roughly identify how frquent the repartition is from these figures by obseing the abrupt change of number of atoms. In genethe repartition in PCMD is rather effective in evenredistributing the number of atoms in each procsor, or approximately, the workload among processHowever, the load balancing in some (relatively feprocessors is not so effective in bringing up the nuber of atoms to the averaged value due to too matoms in a cell (e.g.,Figs. 4(a) and (c)). The situationis much better for the vaporized-state case as showFig. 4(b), where the repartition can keep the numof atoms roughly within the range of upper and low

threshold values. Another reason for this ineffectdomain decomposition may be related to the mulevel graph-partitioning tool (PMETIS[18]) used inthe current study, which assumes it optimizes thetition heuristically, rather than theoretically. Note tha more severe criterion of the threshold valuesresult in better load balance among processors,possibly computational inefficiency can result frothe frequent repartition of the domain. Another imptant observation fromFigs. 4(a)–(c) is that it requiresmost frequent repartition during the simulation for tvaporized-state case (Fig. 4(b)) due to the fast moving atoms, while it requires much less repartition

Page 10: Parallel implementation of molecular dynamics … implementation of molecular dynamics simulation for short-ranged interaction Jong-Shinn Wu∗, Yu-Lin Hsu, Yun-Min Lee Department

184 J.-S. Wu et al. / Computer Physics Communications 170 (2005) 175–185

tical states).

Fig. 5. Parallel speedup as a function of the number of processors for three different test cases (condensed, vaporized and supercri

romtterion-undced

-upthetedp isl-

llelthetedes-

onughthee

uentin,in

m-ofrti-

odle-of

ors,ng a

m-dy-ge-sterro-gom-eme

the supercritical-state case (Fig. 4(c)). Thus, it is notnecessary that a better load balancing resulting feffective domain decomposition can lead to a beparallel speed-up since the repartition is computatally expensive. The above observation has a profoinfluence on the parallel speed-up, which is introdunext.

3.5. Parallel performance

Fig. 5 shows the corresponding parallel speedfor all three test cases (up to 49 processors) incurrent study along with the ideal speed-up (dotline). Using 49 processors, the parallel speed-u32 (∼65% parallel efficiency) for the supercriticastate case and 28 (∼57% parallel efficiency) for thecondensed-state case, while it is 17 (or 35% paraefficiency) for the vaporized-state case. Note thatabove parallel speed-up/efficiency are all compuassuming the value of speed-up as two for two procsors. It is attributed to the too frequent repartitiof the domain in the vaporized-state case, althoslightly better load balancing is observed than inother two cases (Fig. 4(b)). This could further degrad

the speed-up due to the rapid increase of the freqcommunication required for repartitioning the domain addition to the large number of processors, whichturn complicates and slows down the network comunication. This can be shown by the increasespeed-up to 21 (49 processors) if we do not repation the domain after the thermal equilibration peri(∼30 000 time steps). Using the current parallel impmentation of the MD code, approximately 70–80%parallel efficiency can be achieved with 25 processwhich may be accessible for most researchers usipractical PC-cluster system.

4. Conclusions

In the current study, a parallel molecular dynaics simulation for short-ranged interactions usingnamic domain decomposition is developed for larscale problems on the memory-distributed PC-clusystem, which uses MPI as the communication ptocol. In the method, a multi-level graph-partitioninscheme is used to dynamically re-decompose the cputational domain based on a simple threshold sch

Page 11: Parallel implementation of molecular dynamics … implementation of molecular dynamics simulation for short-ranged interaction Jong-Shinn Wu∗, Yu-Lin Hsu, Yun-Min Lee Department

J.-S. Wu et al. / Computer Physics Communications 170 (2005) 175–185 185

msoldlleles,icals,

e.heevedy ifem--let

and

nalhenksnalci-tor-

olla-

la-m,

icsale,

n7–

B-

uterme-108

m-3)

csr 46

s. 96

la-01)

n,

ular

tion

K.nt

88.m-

16.csand996)

dy-p.

ntre

ityber

al-ns.

(STS), which is expected to keep the number of atoin each processor within the range of the threshvalues. Parallel performance of the current paraMD method is studied using three different test casincluding the condensed, vaporized and supercritstates, using approximately one million L-J atomwhich all are initialized from a FCC atomic structurResults show that fairly good parallel efficiency in trange of 40–65% using 49 processors can be achifor the three test cases, compared with low efficiencstatic domain decomposition or other methods areployed. Application of this parallel MD code to compute the dynamical processes involved with dropevaporations and collisions is currently in progresswill be reported in the near future.

Acknowledgements

This investigation was supported by the NatioScience Council, Grant No. 93-2212-E-009-015. Tauthors also would like to express their sincere thato the computing resources provided by the NatioCenter for High-speed Computing of National Sence Council of Taiwan. Also, sincere thanks goProf. Karypis at University of Minnesota for geneously providing the partitioning library, PMETIS.

References

[1] R. Komanduri, N. Chandrasekaran, L.M. Raff, Effect of togeometry in nanometric cutting: a molecular dynamics simution approach, Wear 219 (1998) 84–97.

[2] R. Komanduri, N. Chandrasekaran, L.M. Raff, MD simution of indentation and scratching of single crystal aluminuWear 240 (2000) 113–143.

[3] T.H. Fang, C.I. Weng, Three-dimensional molecular dynamanalysis of processing using a pin tool on the atomic scNanotechnology 11 (2000) 148–153.

[4] K. Cheng, X. Luo, R. Ward, et al., Modeling and simulatioof the tool wear in nanometric cutting, Wear 255 (2003) 1421432.

[5] Y.M. Pan, T.J. Hou, M.J. Ji, et al., MD simulations of PTP 1inhibitor complex, Acta Chim. Sinica 62 (2004) 148–152.

[6] L.J. Smith, H.J.C. Berendsen, W.R. van Gunsteren, Compsimulation of urea–water mixtures: a test of force field paraters for use in biomolecular simulation, J. Phys. Chem. B(2004) 1065–1071.

[7] P.J. Connolly, A.S. Stern, C.J. Turner, et al., Molecular dynaics of the long neurotoxin LSIII, Biochemistry-US 42 (20014443–14451.

[8] L. Consolini, S.K. Aggarwal, S. Murad, A molecular dynamisimulation of droplet evaporation, Int. J. Heat Mass Transfe(2003) 3179–3188.

[9] L.N. Long, M.M. Micci, B.C. Wong, Molecular dynamicsimulations of droplet evaporation, Comput. Phys. Comm(1996) 167–172.

[10] J.H. Walther, P. Koumoutsakos, Molecular dynamics simutions of nanodroplet evaporation, J. Heat Transfer 123 (20741–748.

[11] D.C. Rapaport, The Art of Molecular Dynamics SimulatioCambridge University Press, UK, 1995.

[12] S. Plimpton, Fast parallel algorithms for short-range molecdynamics, J. Comput. Phys. 117 (1995) 1–19.

[13] B.M. Boghosian, Computational physics on the connecmachine, Comput. Phys. (1990).

[14] G.C. Fox, M.A. Johnson, G.A. Lyzenga, S.W. Otto, J.Salmon, D.W. Walker, Solving Problems on ConcurreProcessors, vol. 1, Prentice-Hall, Englewood Cliffs, NJ, 19

[15] S. Matsunaga, Structural study on liquid Au–Cs alloys by coputer simulations, J. Phys. Soc. Japan 69 (2000) 1712–17

[16] D. DiCola, A. Deriu, M. Sampoli, et al., Proton dynamiin supercooled water by molecular dynamics simulationsquasielastic neutron scattering, J. Chem. Phys. 104 (11) (14223–4232.

[17] J.-S. Wu, K.-C. Tseng, Concurrent DSMC method usingnamic domain decomposition, in: Proc. 23rd Internat. Symon Rarefied Gas Dynamics, Whistler Conference CeWhistler, British Columbia, July 2002, pp. 20–25.

[18] G. Karypis, K. Schloegel, V. Kumar, ParMetis, Universof Minnesota, Department of Computer Science, Septem1998.

[19] D.M. Nicol, J.H. Saltz, et al., Dynamic remapping of parlel computations with varying resource demands, IEEE TraComput. 37 (1988) 1073–1087.