Upload
sidone
View
81
Download
1
Embed Size (px)
DESCRIPTION
NWChem software development with Global Arrays. Edoardo Apr à High Performance Computational Chemistry Environmental Molecular Sciences Laboratory Pacific Northwest National Laboratory Richland, WA 99352. Outline. Global Arrays Toolkit overview NWChem overview Performance - PowerPoint PPT Presentation
Citation preview
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
Edoardo ApràHigh Performance Computational Chemistry
Environmental Molecular Sciences LaboratoryPacific Northwest National Laboratory
Richland, WA 99352
NWChem software development with Global Arrays
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
Outline
Global Arrays Toolkit overview
NWChem overview
Performance
Usage of Global Arrays in NWChem
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
Single, shared data structure
Physically distributed data
Global Arrays programming interface
Shared-memory programming in context of
distributed dense arrays combines better features of shared memory
and message passing explicit control of data distribution &
locality like in MPI codes shared memory ease of use
focus on hierarchical memory of modern
computers (NUMA) for highest performance, requires porting to
each individual platform
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
Global Arrays Overview
A substantial capability extension to the message-passing model
Compatible with MPI and standard parallel libraries
ScaLAPACK, SUMMA,PeIGS, PETSc, CUMULVS
BLAS interface
Forms a part of bigger system for NUMA programming
extensions for computational grid environments: Mirrored Arrays
natural parallel I/O extensions: Disk Resident Arrays
Originated in an High Performance Computational Chemistry project
Used by 5 major chemistry codes, financial futures forecasting, astrophysics, computer graphics
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
local memory
Global Array Model of Computations
Shared Object
copy to local mem
ory
Shared Object
cop
y to
sha
red
obje
ct
local memorylocal memory
compute/update
1-sidedcommunication
1-sidedcommunication
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
ARMCI First portable 1-sided communication library
Core communication capability of GA were generalized, extended, and made standalone
Approach
simple progress rules
less restrictive than MPI-2 (includes 1-sided communication)
low-level
high-performance
optimized for noncontiguous data transfers (array sections, scatter/gather)
implemented using whatever mechanisms work best on given platform:
active messages, threads, sockets, shared memory, remote memory copy
Used by GA as its run-time system and contributed to other projects
Adlib (U. Syracuse), Padre/Overture (LLNL), DDI (Ames)
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
Intelligent Protocols in ARMCI
On IBM SP with SMP nodes ARMCI exploits • cluster locality information• Active Messages• remote memory copy• shared memory within SMP node• threads
0
20
40
60
80
100
120
140
1 100 10000 1000000
bytes
band
wid
th [M
B/s]
ARMCI
Lapi remote
Lapi SMP
P0 P1 P2 P3adapter
memcpy
P0 P1 P2 P3
adapterSMP node
put
LAPILAPI
logically partitioned shared memory segment
local process memory
swit
chSMP node
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
Advanced networks
Exciting research and development opportunities High-performance <10S latency, 100-200 MB/s bandwidth
Low cost
Flexible
Finally the traditionally closed interfaces open
Protocols: GM(Myrinet), VIA (Giganet), Elan (Quadrics) offer a lot of capabilities and performance to support not only MPI but also more advanced models
H/W support for 1-sided communication
NIC support More of the high-level protocols pushed down to h/w
Opportunities for optimization e.g., fast collective operations
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
High Performance Parallel I/O Models
Disk Resident Arraysarray objects on disk
(RI-SCF, RI-MP2)
Exclusive Access Filesprivate files per processor
(semidirect SCF and MP-2)
Shared Files“shared-memory on disk”
(MRCI)
ELIOdevice library
Distant I/O one-sided
communication to disk
portability layer
applicationlayer
hardware
filesystem layer
System of Hints allows performance tuning to match application characteristics
filesystem Cfilesystem B
filesystem A
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
Forms of Parallel I/O in Chemistry Apps
collectiveto shared file
independentto private files
independentto shared file
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
Full eigensolution performed on a matrix 966x966
Unique feature not available elsewhere• Inverse iteration using Dhillon-Fann-
Parlett’s parallel algorithm (fastest uniprocessor performance and good parallel scaling)
• Guaranteed orthonormal eigenvectors in the presence of large clusters of degenerate eigenvalues
• Packed Storage• Smaller scratch space requirements
0
10
20
30
40
50
60
70
80
90
0 32 64 96 128
Eigenvectors
Householder
Backtransform
Total
GA Interface to PeIGS 3.0 (Solution of real symmetric generalized and standard
eigensystem problems)
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
Why NWChem Was Developed
Developed as part of the construction of the Environmental Molecular Sciences Laboratory (EMSL)
Envisioned to be used as an integrated component in solving DOE’s Grand Challenge environmental restoration problems
Designed and developed to be a highly efficient and portable MPP computational chemistry package
Provides computational chemistry solutions that are scalable with respect to chemical system size as well as MPP hardware size
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
New Theory Development
Theory (Method Development)
Algorithm Development
Preliminary Design Phase
Prototyping (Implementation)
Requirements Definition and Analysis Phase
Computational Chemistry Models
Detailed Design Phase
Implementation Phase
Acceptance Test Phase
Research including work in:
• Theory innovation
• O(N) reduction
• Improved MPP algorithms
• etc
NWChem Software Development Cycle
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
NWChem ArchitectureR
un-
tim
e da
taba
se
DFT energy, gradient, …
MD, NMR, Solvation, …
Optimize, Dynamics, …
SCF energy, gradient, …In
tegr
al A
PI
Geo
met
ry O
bje
ct
Bas
is S
et O
bje
ct
...PeI
GS
...
Global Arrays
Memory Allocator
Parallel IO
MolecularModelingToolkit
MolecularCalculation
Modules
MolecularSoftware
DevelopmentToolkit
GenericTasksEnergy, structure, …
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
• All major functions exist as independent modules.
• Modules only communicate thru the database and files
• No shared common blocks• The only argument to a
module is the database• Modules have well defined
actions• Modules can call other
modules
Program ModulesInput
SCF
DFT
MP2
CCSD
Stepper
Driver
nwArgos
Basis
Geometry
Energy
Filenames
Status
Database
Theory
Operation
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
NWChem is supported on at least these platforms, and is readily ported to essentially
all sequential and parallel computers. IBM SP
IBM workstations
CRAY T3
SGI SMP systems
Fujitsu VX/VPP
SUN and other Homogeneous workstation networks
x86-based workstations running Linux including laptops
x86-based workstations running NT or Win98
Tru64 and Linux Alpha servers (including SC series)
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
0
20
40
60
80
100
120
0 20 40 60 80 100 120
Speedup
LinearCRAY T3E-900
IBM SP2
iPSC/860 (Charmm Brooks et al.)
Myoglobin in Water 10,914 atoms
Rc=1.6 nm
0.36 / 108
0.94 / 128
0.17 / 108
0
250
500
750
1000
0 250 500 750 1000
Speedup
Number of nodes
LinearCRAY T3E-900
Octanol216,000 atoms
Rc=2.4 nm
0.58 / 1000
0
50
100
150
200
250
0 50 100 150 200 250
Speedup
Number of nodes
Linear
IBM SP2CRAY T3E-900
Dichloroethane-Water Interface 100,369 atoms
Rc=1.8 nm
1.0 / 250
1.6 / 250
MD Benchmarks
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
0
20
40
60
80
100
120
0 20 40 60 80 100 120
Speedup
LinearCRAY T3E-900
IBM SP2
iPSC/860 (Charmm Brooks et al.)
0.36 / 108
0.94 / 128
0.17 / 108
Wall clock times and scaling obtained for a 1000 step NWChem classical molecular dynamics simulation of myoglobin in water, using the AMBER force field, and a 1.6 nm cutoff radius. The system size is 10,914 atoms.
Myoglobin
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
105 atoms1343 functions362 electrons
Dunning augmentedcc-pvdz basis set
IBM Poughkeepsie vendor rack160 MHz nodes, 512 MB/node, 3 GB disk/node15 MB/sec/node sustained read bandwidth
37 hours(wall)
5.7 hours
0
100
200
300
0 60 120 180 240
No. of processors
Sp
eed
up
0
300
600
900
Tot
al d
isk
sp
ace
/ GB
Speedup(CPU)Speedup (Wall)GB
SCF: Parallel Performance
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
KC8O4H16
C/O/H -aug-cc-pvdzK - Ahlrichs PVDZ458 functions114 electrons
120 MHz P2SC128MB mem/node2GB disk/nodeTB3 switch
Global ArraysTCGMSG+LAPI
100
1000
10000
50 100 150 200
No. of processors
Wal
l tim
e
TotalForw-tranMake-tLaiBack-tranNon-sepCPHFSepFock
2.9 hours
0.98 hours
MP2 Gradient Parallel Scaling
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
Si8O7H18
347 Basis f.
LDA wavefunction
DFT Benchmarks
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
Si28O67H30
1687 Basis f.
LDA wavefunction
DFT Benchmarks
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
GA Operations The primitive operations that are invoked collectively by all processes are:
create a distributed array, controlling alignment and distribution;
destroy an array;
synchronize.
Primitive operations in MIMD style :
fetch, store and accumulate into a rectangular patch of global array;
gather and scatter;
atomic read and increment;
BLAS-like data-parallel operations on sections or entire global arrays :
vector operations including: dot product, scale, add;
matrix operations including: symmetrize, transpose, multiplication.
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
GA sample program
status = ga_create(mt_dbl,n,n,'matrix',n,0,g_a) status = ga_create(mt_dbl,n,n,'matrix',n,0,g_b) status = ga_create(mt_dbl,n,n,'matrix',n,0,g_c) ...call ga_dgemm('N','N',n,n,n,1d0,g_a,g_b,0d0,g_c)
call ga_print(g_c) call ga_diag_std(g_c,g_a,evals)
status = ga_destroy(g_a)
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
The SCF Algorithm in a Nutshell
( ) ( ) ( )
( ) ( ) [ ( )]
r r rD
F r r F r dr
; S[ S[ S [_ _
_ _
Density
Fock matrix
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
Acknowledgments
Jarek Nieplocha
The NWChem developers team
Environmental Molecular Sciences Laboratory
Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA
NWChem DevelopersHPCC Developers
Staff
• Dr. Edo Apra
• Dr. Eric Bylaska
• Dr. Michel Dupuis
• Dr. George Fann
• Dr. Robert Harrison
• Dr. Rick Kendall
• Dr. Jeff Nichols
• Dr. T. P. Straatsma
• Dr. Theresa Windus
Research Fellows
• Dr. Ken Dyall
• Prof. Eric Glendenning
• Dr. Benny Johnson
• Prof. Joop van Lenthe
• Dr. Krzyzstof Wolinski
Post-Doctoral Fellows• Dr. James Anchell• Dr. Dave Bernholdt• Dr. Piotr Borowski• Dr. Terry Clark• Dr. Holger Dachsel• Dr. Miles Deegan• Dr. Bert de Jong• Dr. Herbert Fruchtl• Dr. Ramzi Kutteh• Dr. Xiping Long• Dr. Baoqi Meng• Dr. Gianni Sandrone• Dr. Mark Stave• Dr. Hugh Taylor• Dr. Adrian Wong• Dr. Zhiyong Zhang
Non-HPCC Developers
Staff• Dr. Dave Elwood• Dr. Maciej Gutowski• Dr. Anthony Hess• Dr. John Jaffe • Dr. Rik Littlefield• Dr. Jarek Nieplocha• Dr. Matt Rosing• Dr. Greg Thomas
Post-Doctoral Fellows• Dr. Zijing Lin• Dr. Rika Kobayash• Dr. Jialin Ju
Graduate Students• Dr. Daryl Clerc