MD Bringa Comahue 2012 4 GPU

8162019 MD Bringa Comahue 2012 4 GPU

httpslidepdfcomreaderfullmd-bringa-comahue-2012-4-gpu 129

GPUs for Molecular Dynamics (MD) Simulations

a userrsquos perspective

Eduardo M Bringa amp Emmanuel Millaacuten

CONICET Instituto de Ciencias Baacutesicas Universidad Nacional de Cuyo Mendoza

(ebringayahoocom)

Universidad Nacional del Comahue Neuqueacuten

Marzo 2012

Collaborators

C Ruestes C Garcia Garino (UN Cuyo)

A Higginbotham (Oxford)



Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS

bull Performancea) singlemixeddouble precision

b) CPUGPU neighbor lists

c) Load balancing static amp dynamic

bull Examples

bull Summary



Many MD codes can now use GPU acceleration

AMBER ( Assisted Model Building with Energy Refinement ) httpambermdorggpus

Ross Walker (keynote) MPI for several GPUscores TIP3P PME ~106 atoms max Tesla C2070)

HOOMD-Blue ( Highly Optimized Object-oriented Many-particle Dynamics)

httpcodeblueumicheduhoomd-blueindexhtml OMP for several GPUs in single board

LAMMPS ( Large-scale AtomicMolecular Massively Parallel Simulator )

httplammpssandiagov MPI ofr several GPUscores (LJ 12 ~107 atoms max Tesla C2070)

GPULAMMPS httpcodegooglecompgpulammps CUDA + OpenCL

DL_POLY

httpwwwcsescitechacukccgsoftwareDL_POLY F90+MPI CUDA+OpenMP port

GROMACS httpwwwgromacsorgDownloadsInstallation_InstructionsGromacs_on_GPUs

Uses OpenMM libs (httpssimtkorghomeopenmm) No paralelization ~106

atoms max

NAMD (ldquo Not anotherrdquo MD) httpwwwksuiuceduResearchnamd GPUCPU clusters

VMD (Visual MD) httpwwwksuiuceduResearchvmd

GTC 2010 Archive videos and pdfrsquos httpwwwnvidiacomobjectgtc2010-presentation-archivehtmlmd

1000000+ atom Satellite Tobacco Mosaic Virus

Freddolino et al Structure 14437-449 2006Many more



Highly Optimized Object-oriented Many-particle Dynamics -HOOMD-Blue

httpcodeblueumicheduhoomd-blueindexhtml

bullGeneral purpose molecular dynamics simulations

fully implemented on graphics processing units

Joshua A Anderson Chris D Lorenz and AlexTravesset Journal of Computational Physics 227

(2008) 5342-5359

bull Molecular Dynamics on Graphic Processing Units

HOOMD to the Rescue Joshua A Anderson and

Alex Travesset Computing in Science amp Engineering

10(6) (2008)

httpwwwnvidiacomobjecthoomd_on_teslahtml

Host GPUPolymerTPS

LJ

liquidTPS

Amd

Athlon II

X428GHz

GTX

480 60228 70635

AMDOpteron

2356 23

GHz

S2050

(14)49607 56074

Intel

Core2Q9300

250GHz

GTX460

38365 43213

AMD

Opteron2356 23GHz

Tesla

S1070(14)

26406 30189

983112983119983119983117983108983085983106983148983157983141 983088983086983097983086983090

(64K particles)



Dump file formats HOOMDs XML input format

MOL2 DCD PDB

Pair Potentials - cutoff (ldquosmoothrdquo option)

CGCMM

DPD (dissipative particle dynamics)EAM (embedded atom method)Gaussian

Lennard-Jones

MorseUser-specified (tabulated)

Shifted Lennard-JonesYukawa

Pair Potentials - long range

Electrostatics via PPPM

Bond Potentials

FENE

HarmonicAngle Potentials

Harmonic

CGCMM

DihedralImproper Potentials

Harmonic

Wall PotentialsLennard-Jones

CUDACPU

OMP for multiple cores O multiple GPUscore

Single double precision

Integrators NVE NPT NVT Brownian dynamics NVT

Energy minimization FIREOther features

bullSupports Linux Windows and Mac OS X

bullSimple and powerful Python script interface for defining

simulationsbullPerforms 2D and 3D simulations

bullAdvanced built-in initial configuration generatorsbullHuman readable XML input files

bullSpace-filling curve particle reordering to increase

performance

bullExtensible object-oriented design Additional features may

be added in new classes contained in plugins

bullSimulations can be visualized in real-time using VMDs IMDinterface

bullReal time analysis can be run at a non-linear rate if desired

bullQuantities such as temperature pressure and box size can bevaried smoothly over a run

bullFlexible selection of particles for integration allows freezing

some particles in place and many other use-casesbullOnly reduced units





Python script for LJ run testhoomd

from hoomd_script import

create 100 random particles of name A

initcreate_random(N=100 phi_p=001

name=A)

specify Lennard-Jones interactions

between particle pairs

lj = pairlj(r_cut=30)

ljpair_coeffset(A A epsilon=10

sigma=10)

integrate at constant temperature

all = groupall()

integratemode_standard(dt=0005)

integratenvt(group=all T=12 tau=05)

run 10000 time steps

run(10e3)

Run

$ hoomd testhoomd

OutputHOOMD-blue 090

Compiled Wed Oct 28 065846 EDT 2009

Copyright 2008 2009 Ames Laboratory Iowa State University and

the Regents of the University of Michigan -----

httpcodeblueumicheduhoomd-blue

This code is the implementation of the algorithms discussed in

Joshua A Anderson Chris D Lorenz and Alex Travesset - General

Purpose Molecular Dynamics Fully Implemented on Graphics

Processing Units Journal of Computational Physics 227 (2008)

5342-5359 -----

testhoomd004 | initcreate_random(N=100 phi_p=001 name=A)testhoomd007 | lj = pairlj(r_cut=30)

testhoomd008 | ljpair_coeffset(A A epsilon=10 sigma=10)

testhoomd011 | all = groupall() Group all created containing 100

particles testhoomd012 | integratemode_standard(dt=0005)

testhoomd013 | integratenvt(group=all T=12 tau=05)

testhoomd016 | run(10e3)starting run Time 000000 | Step 10000 10000 | TPS 354179 |

ETA 000000 Average TPS 35405 --------- --

Neighborlist stats

370 normal updates 100 forced updates 0 dangerous updates

n_neigh_min 0 n_neigh_max 10 n_neigh_avg 241

bins_min 0 bins_max 6 bins_avg 15625

run complete


httpcodeblueumicheduhoomd-bluedoc-trunkpage_quick_starthtml



Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS




bull Examples

bull Summary



LAMMPS (httplammpssandiagov ) on GPUs

bullGPU version update from Brownrsquos site (now 04292011) will update files in the

libgpu and srcGPU directories) httpusersnccsgov~wb8gpudownloadhtm

bullNeed CUDA GPU driver and CUDA toolkit but CUDA SDK is not needed

bullSample compilation (compile gpu library add files to main src dir then compile

whole code)

cd ~lammpslibgpu

emacs Makefilelinux (compute capability 1320 singlemixdouble precision)make -f Makefilelinux (to obtain libgpua)

cd src

emacs MAKEMakefilelinux (here need to change options and paths)

make yes-asphere (for granular materials)

Make yes-manybody (for EAM in GPULAMMPS)

make yes-kspace (for electrostatics needs fftw2x)

make yes-gpu (to copy srcgpu files to src)

make linux (to obtain executable)



GPULAMMPS

bull httpcodegooglecompgpulammps

bull httpcodegooglecompgpulammpswikiLammps_cuda

bull httpcodegooglecompgpulammpswikiLammps_cuda_UI_Features

Novel most up to date GPU code GNU GPL v2 license

Main developers Paul Crozier (Sandia) Mike Brown Arnold Tharrington Scott

Hampton (Oak Ridge) Axel Kohlmeyer (Temple) Christian Trott Lars Winterfeld

(Ilmenau Germany) Duncan Poole Peng Wang (Nvidia) etc

Non-members can download read-only working copy anonymously over HTTP (SVN)

Quite detailed Wiki pages (installation amp features) E-mail list gpulammpssandiagov

bull Many more features than LAMMPS for instance EAM potential implementation

bull GPULAMMPS features are gradually incorporated into the main LAMMPS

distribution

bull Likely package like USER-CUDA within LAMMPS

bull OpenCL version also expanded compared to LAMMPS OpenCL version



Schematic code structure

GERYON LIB API cudpp_mini Lib

srcGPU

LAMMPSGPULAMMPSlibgpu (double

precision support

CUDA gt=13)

Atom amp pair styles kernels

Geryon library (set of header

files) for managing the CUDA

Runtime CUDA Driver and

OpenCL APIs vector and matrix

containers data copy and

casting data IOclasses for

kernel compilation and

execution OpenCL support

httpusersnccsgov~wb8gery

onindexhtm

CUDPP reduced version

of cudpp

(httpcodegooglecomp

cudpp)

CUDA Data Parallel

Primitives lib

radixsort scan vector

max etc

Atom amp pair styles fixes

srcUSER-CUDA

in GPULAMMPS

Pair styles lj lj96

gayberne amp cmm testing

stage at 12-04-2011 buck

eam gran morse pppm



GPULAMMPS-CUDA Compilation

httpcodegooglecompgpulammpswikiInstallation

USER-CUDAProvides (more details about the features of LAMMPS CUDA)

26 pair forceslong range coulomb with pppmcuda

nvecuda nvtcuda nptcuda nvespherecudaseveral more important fixes and computes

Installation (more details about the installaton of LAMMPS CUDA)

Make sure you can compile LAMMPS without packages withmake YOUR-MachinefileInsert your path to CUDA in srcUSER-CUDAMakefilecommon

Install the standard packages with make yes-standardInstall USER-CUDA with make yes-USER-CUDA

Go to srcUSER-CUDA and compile the USER-CUDA library with make OptionsGo to src and compile LAMMPS with make YOUR-Machinefile OptionsIMPORTANT use the same options for the library and LAMMPS



Features in GPULAMMPS-CUDA

httpcodegooglecompgpulammpswikiLammps_cuda_DI_Features

Run styles

verletcuda

Forces

borncoullongcuda buckcoulcutcudabuckcoullongcuda buckcuda cgcmmcoulcutcuda

cgcmmcouldebyecuda cgcmmcoullongcuda cgcmmcuda eamcuda eamalloycuda eamfscuda

granhookecuda ljcharmmcoulcharmmimplicitcuda ljcharmmcoulcharmmcuda ljcharmmcoullongcudaljcutcoulcutcuda ljcutcouldebyecuda ljcutcoullongcuda ljcutcuda ljexpandcuda

ljgromacscoulgromacscuda ljgromacscuda ljsmoothcuda lj96cutcuda morsecuda morsecoullongcuda

pppmcuda

Fixes

nptcuda nvecuda nvtcuda nvespherecuda enforce2dcuda tempberendsencuda temprescalecuda

addforcecuda setforcecuda aveforcecuda shakecuda gravitycuda freezecuda

Computes

tempcuda temppartialcuda pressurecuda pecuda

Atom styles

atomiccuda chargecuda fullcuda granularcuda





Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS

bull Performance

a) singlemixeddouble precision



bull Examples

bull Summary



GPULAMMPS Serial Performance

BenchmarkshttpcodegooglecompgpulammpswikiSingleGPUBenchmarkResultsOne slot of Tesla S1070-400 Intel Xeon E5520 4-core cpus (227GHz) HOOMDversion 09

GB Gay-BerneLJ Lennard-Jones

PEG Coarse grain (CGCMM)SDS CGCMM with long range electrostatics



GPULAMMPS Parallel Performance

bull Benchmarks httpusersnccsgov~wb8gpuyonahtm

bull Also httpsitesgooglecomsiteakohlmeysoftwarelammps-benchmarks

864K atoms LJ liquid reduceddensity=08442 NVE rcut= 25σ 5000stepsSpeedup ~ 3-4

Rhodopsin protein in solvated lipidbilayer CHARMM force field long-rangeCoulombics via PPPMSHAKEconstraints Counter-ions and reducedamount of water32K atoms 1000 timesteps LJ rcut= 1

nm neighbor skin of 10 σ NPTSpeedup ~ 17-3

Gay-Berne ellipsoids125 K atoms NVE rcut= 7σ 1000 stepsSpeedup ~ 9-11

Yona cluster

15 Nodes 2x6-core AMD Opteron 2435 (26GHz) amp 2 Tesla C2050 GPUs

3GB GDDR5 memory 448 cores (115GHz) memory bandwidth 144GBs

GPUs are connected on PCIx16 gen 20 slots ECC support enabled

Mellanox MT26428 QDR InfiniBand interconnect



Single vs Double Precision (lib compile option)

Force kernels can be compiled to use single double or mixed

precision

The drawback of double precision for memory-bound kernelsis that twice as many bytes must be fetched for cutoff

evaluation

A potential solution is to use mixed precision In this case thepositions are stored in single precision but accumulation and

storage of forces torques energies and virials is performed in

double precision

Because this memory access occurs outside the loop the

performance penalty for mixed precision is very small

Implementing molecular dynamics on hybrid high performance computers ndash short range forces

W Michael Brown Peng Wang Steven J Plimpton and Arnold N TharringtonComp Phys Comm 182 (2011) 898ndash911



Benchmarks for different precision modes

httpusersnccsgov~wb8gpukid_precisionhtm

256000 atoms LJ liquid reduced density=08442 NVErcut= 25σ 5000 steps

Rhodopsin protein in solvated lipid bilayer CHARMM force fieldlong-range Coulombics via PPPMSHAKE constraintsCounter-ions and reduced amount of water to make a 32K atom

system replicated 2x2x2 to create box256000 atoms 1000 timesteps LJ rcut= 1 nm neighbor skin of10 σ NPT

bull Single precision OK for many runs but use at your peril Colberg amp Houmlfling Comp Phys Comm 182 (2011) 1120ndash1129

bull Mixed precision (single for positions and double for forces) nearly as fast as single precision

bull Double precision still cheaper than CPU



Review on LAMMPS GPU implementation

Discussion of several important issues in porting a large

molecular dynamics code for use on parallel hybrid machines

Objectives

a) Minimize the amount of code that must be ported forefficient acceleration

b) Utilize the available processing power from both multi-core

CPUs and accelerators

Presents results on a parallel test cluster containing 32 Fermi

GPUs and 180 CPU cores


W Michael Brown Peng Wang Steven J Plimpton and Arnold N Tharrington Comp Phys Comm 182 (2011) 898ndash911



Parallel and CPUGPU Decomposition

Multiple MPI processes (CPU cores) can share single

accelerator (GPU)

User can choose fixed load balance between CPU amp GPU for

the calculation of short range forces

Dynamic load balancing can also be chosen GPU force calculation

Neighbor list can be carried out in GPU or CPU

Time integration represents only small computational cost and

it is carried out in CPU



Fixed or Dynamic Load balancing

Fixed load balancing setting the CPU core to accelerator ratio and by

setting the fraction of particles that will have forces calculated by the

accelerator

Consider a job run with 4 MPI processes on a node with 2 accelerator

devices and the fraction set to 07 At each timestep each MPI process will

place data transfer of positions kernel execution of forces and datatransfer of forces into the device (GPU) queue for 70 of the particles

At the same time data is being transferred and forces are being calculated

on the GPU the MPI process will perform force calculations on the CPU

Ideal fraction CPU time = GPU time for data transfer and kernel

execution dynamic balancing with calculation of optimal fraction based

on CPUGPU timing at some timestep interval





Benchmarks for load balancing

The performance impact

resulting from splitting the

force calculation between

the host and device will

depend on the CPU core to

device ratio and the relative

rates of force calculation on

the host and device

Processes per node (ppn)

Dynamic Load Balancing

(LB) Neighboring performed on

the GPU (GPU-N)

LJ N=864000

Implementing molecular dynamics on hybrid high

performance computers ndash short range forces

W Michael Brown Peng Wang Steven J Plimpton and

Arnold N Tharrington Comp Phys Comm 182 (2011)

898ndash911



More benchmarks hellip

Strong scaling benchmark usingLJ cutoff of 25 and N=864 K LJ Single node





LAMMPS OpenCL vs CUDA

Single node Code compiled with CUDA and OpenCL N=256K with neighboring

performed on the GPU Time normalized by the time required to completesimulation loop with CUDA





CPU versus CPU-GPU Speedups









Scaling with ldquooldrdquo GPU shock wave

ldquoGP-GPU Processing of Molecular Dynamics Simulationsrdquo E Millaacuten Kujtiuk EM Bringa A Higginbotham C Garciacutea

Garino Proceedings of HPC 2010 High-Performance Computing Symposium pp 3234-3248








Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS




bull Examples

bull Summary











DL_POLY




atoms max













(2008) 5342-5359




10(6) (2008)


Host GPUPolymerTPS

LJ

liquidTPS

Amd

Athlon II

X428GHz

GTX

480 60228 70635

AMDOpteron

2356 23

GHz

S2050

(14)49607 56074

Intel

Core2Q9300

250GHz

GTX460

38365 43213

AMD

Opteron2356 23GHz

Tesla

S1070(14)

26406 30189

983112983119983119983117983108983085983106983148983157983141 983088983086983097983086983090

(64K particles)




MOL2 DCD PDB


CGCMM


Lennard-Jones





Bond Potentials

FENE


Harmonic

CGCMM


Harmonic


CUDACPU










performance
















name=A)





sigma=10)


all = groupall()




run(10e3)

Run

$ hoomd testhoomd










5342-5359 -----







ETA 000000 Average TPS 35405 --------- --

Neighborlist stats




run complete





Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS




bull Examples

bull Summary








whole code)

cd ~lammpslibgpu


cd src









GPULAMMPS












distribution







srcGPU


precision support

CUDA gt=13)











onindexhtm


of cudpp

(httpcodegooglecomp

cudpp)

CUDA Data Parallel

Primitives lib


max etc


srcUSER-CUDA

in GPULAMMPS

Pair styles lj lj96



eam gran morse pppm
















Run styles

verletcuda

Forces





pppmcuda

Fixes



Computes


Atom styles






Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS

bull Performance




bull Examples

bull Summary
















Yona cluster









precision


evaluation



double precision




















Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911











































DL_POLY




atoms max













(2008) 5342-5359




10(6) (2008)


Host GPUPolymerTPS

LJ

liquidTPS

Amd

Athlon II

X428GHz

GTX

480 60228 70635

AMDOpteron

2356 23

GHz

S2050

(14)49607 56074

Intel

Core2Q9300

250GHz

GTX460

38365 43213

AMD

Opteron2356 23GHz

Tesla

S1070(14)

26406 30189

983112983119983119983117983108983085983106983148983157983141 983088983086983097983086983090

(64K particles)




MOL2 DCD PDB


CGCMM


Lennard-Jones





Bond Potentials

FENE


Harmonic

CGCMM


Harmonic


CUDACPU










performance
















name=A)





sigma=10)


all = groupall()




run(10e3)

Run

$ hoomd testhoomd










5342-5359 -----







ETA 000000 Average TPS 35405 --------- --

Neighborlist stats




run complete





Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS




bull Examples

bull Summary








whole code)

cd ~lammpslibgpu


cd src









GPULAMMPS












distribution







srcGPU


precision support

CUDA gt=13)











onindexhtm


of cudpp

(httpcodegooglecomp

cudpp)

CUDA Data Parallel

Primitives lib


max etc


srcUSER-CUDA

in GPULAMMPS

Pair styles lj lj96



eam gran morse pppm
















Run styles

verletcuda

Forces





pppmcuda

Fixes



Computes


Atom styles






Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS

bull Performance




bull Examples

bull Summary
















Yona cluster









precision


evaluation



double precision




















Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911








































(2008) 5342-5359




10(6) (2008)


Host GPUPolymerTPS

LJ

liquidTPS

Amd

Athlon II

X428GHz

GTX

480 60228 70635

AMDOpteron

2356 23

GHz

S2050

(14)49607 56074

Intel

Core2Q9300

250GHz

GTX460

38365 43213

AMD

Opteron2356 23GHz

Tesla

S1070(14)

26406 30189

983112983119983119983117983108983085983106983148983157983141 983088983086983097983086983090

(64K particles)




MOL2 DCD PDB


CGCMM


Lennard-Jones





Bond Potentials

FENE


Harmonic

CGCMM


Harmonic


CUDACPU










performance
















name=A)





sigma=10)


all = groupall()




run(10e3)

Run

$ hoomd testhoomd










5342-5359 -----







ETA 000000 Average TPS 35405 --------- --

Neighborlist stats




run complete





Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS




bull Examples

bull Summary








whole code)

cd ~lammpslibgpu


cd src









GPULAMMPS












distribution







srcGPU


precision support

CUDA gt=13)











onindexhtm


of cudpp

(httpcodegooglecomp

cudpp)

CUDA Data Parallel

Primitives lib


max etc


srcUSER-CUDA

in GPULAMMPS

Pair styles lj lj96



eam gran morse pppm
















Run styles

verletcuda

Forces





pppmcuda

Fixes



Computes


Atom styles






Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS

bull Performance




bull Examples

bull Summary
















Yona cluster









precision


evaluation



double precision




















Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911




































MOL2 DCD PDB


CGCMM


Lennard-Jones





Bond Potentials

FENE


Harmonic

CGCMM


Harmonic


CUDACPU










performance
















name=A)





sigma=10)


all = groupall()




run(10e3)

Run

$ hoomd testhoomd










5342-5359 -----







ETA 000000 Average TPS 35405 --------- --

Neighborlist stats




run complete





Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS




bull Examples

bull Summary








whole code)

cd ~lammpslibgpu


cd src









GPULAMMPS












distribution







srcGPU


precision support

CUDA gt=13)











onindexhtm


of cudpp

(httpcodegooglecomp

cudpp)

CUDA Data Parallel

Primitives lib


max etc


srcUSER-CUDA

in GPULAMMPS

Pair styles lj lj96



eam gran morse pppm
















Run styles

verletcuda

Forces





pppmcuda

Fixes



Computes


Atom styles






Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS

bull Performance




bull Examples

bull Summary
















Yona cluster









precision


evaluation



double precision




















Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911







































name=A)





sigma=10)


all = groupall()




run(10e3)

Run

$ hoomd testhoomd










5342-5359 -----







ETA 000000 Average TPS 35405 --------- --

Neighborlist stats




run complete





Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS




bull Examples

bull Summary








whole code)

cd ~lammpslibgpu


cd src









GPULAMMPS












distribution







srcGPU


precision support

CUDA gt=13)











onindexhtm


of cudpp

(httpcodegooglecomp

cudpp)

CUDA Data Parallel

Primitives lib


max etc


srcUSER-CUDA

in GPULAMMPS

Pair styles lj lj96



eam gran morse pppm
















Run styles

verletcuda

Forces





pppmcuda

Fixes



Computes


Atom styles






Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS

bull Performance




bull Examples

bull Summary
















Yona cluster









precision


evaluation



double precision




















Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911



































Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS




bull Examples

bull Summary








whole code)

cd ~lammpslibgpu


cd src









GPULAMMPS












distribution







srcGPU


precision support

CUDA gt=13)











onindexhtm


of cudpp

(httpcodegooglecomp

cudpp)

CUDA Data Parallel

Primitives lib


max etc


srcUSER-CUDA

in GPULAMMPS

Pair styles lj lj96



eam gran morse pppm
















Run styles

verletcuda

Forces





pppmcuda

Fixes



Computes


Atom styles






Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS

bull Performance




bull Examples

bull Summary
















Yona cluster









precision


evaluation



double precision




















Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911








































whole code)

cd ~lammpslibgpu


cd src









GPULAMMPS












distribution







srcGPU


precision support

CUDA gt=13)











onindexhtm


of cudpp

(httpcodegooglecomp

cudpp)

CUDA Data Parallel

Primitives lib


max etc


srcUSER-CUDA

in GPULAMMPS

Pair styles lj lj96



eam gran morse pppm
















Run styles

verletcuda

Forces





pppmcuda

Fixes



Computes


Atom styles






Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS

bull Performance




bull Examples

bull Summary
















Yona cluster









precision


evaluation



double precision




















Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911



































GPULAMMPS












distribution







srcGPU


precision support

CUDA gt=13)











onindexhtm


of cudpp

(httpcodegooglecomp

cudpp)

CUDA Data Parallel

Primitives lib


max etc


srcUSER-CUDA

in GPULAMMPS

Pair styles lj lj96



eam gran morse pppm
















Run styles

verletcuda

Forces





pppmcuda

Fixes



Computes


Atom styles






Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS

bull Performance




bull Examples

bull Summary
















Yona cluster









precision


evaluation



double precision




















Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911





































srcGPU


precision support

CUDA gt=13)











onindexhtm


of cudpp

(httpcodegooglecomp

cudpp)

CUDA Data Parallel

Primitives lib


max etc


srcUSER-CUDA

in GPULAMMPS

Pair styles lj lj96



eam gran morse pppm
















Run styles

verletcuda

Forces





pppmcuda

Fixes



Computes


Atom styles






Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS

bull Performance




bull Examples

bull Summary
















Yona cluster









precision


evaluation



double precision




















Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911
















































Run styles

verletcuda

Forces





pppmcuda

Fixes



Computes


Atom styles






Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS

bull Performance




bull Examples

bull Summary
















Yona cluster









precision


evaluation



double precision




















Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911





































Run styles

verletcuda

Forces





pppmcuda

Fixes



Computes


Atom styles






Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS

bull Performance




bull Examples

bull Summary
















Yona cluster









precision


evaluation



double precision




















Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911





































Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS

bull Performance




bull Examples

bull Summary
















Yona cluster









precision


evaluation



double precision




















Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911



































Outline

bull Introduction

bull HOOMD-BLUE

bull LAMMPS

bull GPULAMMPS

bull Performance




bull Examples

bull Summary
















Yona cluster









precision


evaluation



double precision




















Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911
















































Yona cluster









precision


evaluation



double precision




















Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911










































Yona cluster









precision


evaluation



double precision




















Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911





































precision


evaluation



double precision




















Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911
















































Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911






































Objectives












accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911





































accelerator (GPU)












accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911






































accelerator





















the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911











































the host and device




the GPU (GPU-N)

LJ N=864000





898ndash911























































































































































Documents

MD Bringa Comahue 2012 4 GPU