28
Environmental Molecular Sciences Laboratory Pacific Northwest National Laboratory SPSCICOMP2000 8/15/2000 EA Edoardo Aprà High Performance Computational Chemistry Environmental Molecular Sciences Laboratory Pacific Northwest National Laboratory Richland, WA 99352 NWChem software development with Global Arrays

NWChem software development with Global Arrays

  • Upload
    sidone

  • View
    81

  • Download
    1

Embed Size (px)

DESCRIPTION

NWChem software development with Global Arrays. Edoardo Apr à High Performance Computational Chemistry Environmental Molecular Sciences Laboratory Pacific Northwest National Laboratory Richland, WA 99352. Outline. Global Arrays Toolkit overview NWChem overview Performance - PowerPoint PPT Presentation

Citation preview

Page 1: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

Edoardo ApràHigh Performance Computational Chemistry

Environmental Molecular Sciences LaboratoryPacific Northwest National Laboratory

Richland, WA 99352

NWChem software development with Global Arrays

Page 2: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

Outline

Global Arrays Toolkit overview

NWChem overview

Performance

Usage of Global Arrays in NWChem

Page 3: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

Single, shared data structure

Physically distributed data

Global Arrays programming interface

Shared-memory programming in context of

distributed dense arrays combines better features of shared memory

and message passing explicit control of data distribution &

locality like in MPI codes shared memory ease of use

focus on hierarchical memory of modern

computers (NUMA) for highest performance, requires porting to

each individual platform

Page 4: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

Global Arrays Overview

A substantial capability extension to the message-passing model

Compatible with MPI and standard parallel libraries

ScaLAPACK, SUMMA,PeIGS, PETSc, CUMULVS

BLAS interface

Forms a part of bigger system for NUMA programming

extensions for computational grid environments: Mirrored Arrays

natural parallel I/O extensions: Disk Resident Arrays

Originated in an High Performance Computational Chemistry project

Used by 5 major chemistry codes, financial futures forecasting, astrophysics, computer graphics

Page 5: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

local memory

Global Array Model of Computations

Shared Object

copy to local mem

ory

Shared Object

cop

y to

sha

red

obje

ct

local memorylocal memory

compute/update

1-sidedcommunication

1-sidedcommunication

Page 6: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

ARMCI First portable 1-sided communication library

Core communication capability of GA were generalized, extended, and made standalone

Approach

simple progress rules

less restrictive than MPI-2 (includes 1-sided communication)

low-level

high-performance

optimized for noncontiguous data transfers (array sections, scatter/gather)

implemented using whatever mechanisms work best on given platform:

active messages, threads, sockets, shared memory, remote memory copy

Used by GA as its run-time system and contributed to other projects

Adlib (U. Syracuse), Padre/Overture (LLNL), DDI (Ames)

Page 7: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

Intelligent Protocols in ARMCI

On IBM SP with SMP nodes ARMCI exploits • cluster locality information• Active Messages• remote memory copy• shared memory within SMP node• threads

0

20

40

60

80

100

120

140

1 100 10000 1000000

bytes

band

wid

th [M

B/s]

ARMCI

Lapi remote

Lapi SMP

P0 P1 P2 P3adapter

memcpy

P0 P1 P2 P3

adapterSMP node

put

LAPILAPI

logically partitioned shared memory segment

local process memory

swit

chSMP node

Page 8: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

Advanced networks

Exciting research and development opportunities High-performance <10S latency, 100-200 MB/s bandwidth

Low cost

Flexible

Finally the traditionally closed interfaces open

Protocols: GM(Myrinet), VIA (Giganet), Elan (Quadrics) offer a lot of capabilities and performance to support not only MPI but also more advanced models

H/W support for 1-sided communication

NIC support More of the high-level protocols pushed down to h/w

Opportunities for optimization e.g., fast collective operations

Page 9: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

High Performance Parallel I/O Models

Disk Resident Arraysarray objects on disk

(RI-SCF, RI-MP2)

Exclusive Access Filesprivate files per processor

(semidirect SCF and MP-2)

Shared Files“shared-memory on disk”

(MRCI)

ELIOdevice library

Distant I/O one-sided

communication to disk

portability layer

applicationlayer

hardware

filesystem layer

System of Hints allows performance tuning to match application characteristics

filesystem Cfilesystem B

filesystem A

Page 10: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

Forms of Parallel I/O in Chemistry Apps

collectiveto shared file

independentto private files

independentto shared file

Page 11: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

Full eigensolution performed on a matrix 966x966

Unique feature not available elsewhere• Inverse iteration using Dhillon-Fann-

Parlett’s parallel algorithm (fastest uniprocessor performance and good parallel scaling)

• Guaranteed orthonormal eigenvectors in the presence of large clusters of degenerate eigenvalues

• Packed Storage• Smaller scratch space requirements

0

10

20

30

40

50

60

70

80

90

0 32 64 96 128

Eigenvectors

Householder

Backtransform

Total

GA Interface to PeIGS 3.0 (Solution of real symmetric generalized and standard

eigensystem problems)

Page 12: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

Why NWChem Was Developed

Developed as part of the construction of the Environmental Molecular Sciences Laboratory (EMSL)

Envisioned to be used as an integrated component in solving DOE’s Grand Challenge environmental restoration problems

Designed and developed to be a highly efficient and portable MPP computational chemistry package

Provides computational chemistry solutions that are scalable with respect to chemical system size as well as MPP hardware size

Page 13: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

New Theory Development

Theory (Method Development)

Algorithm Development

Preliminary Design Phase

Prototyping (Implementation)

Requirements Definition and Analysis Phase

Computational Chemistry Models

Detailed Design Phase

Implementation Phase

Acceptance Test Phase

Research including work in:

• Theory innovation

• O(N) reduction

• Improved MPP algorithms

• etc

NWChem Software Development Cycle

Page 14: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

NWChem ArchitectureR

un-

tim

e da

taba

se

DFT energy, gradient, …

MD, NMR, Solvation, …

Optimize, Dynamics, …

SCF energy, gradient, …In

tegr

al A

PI

Geo

met

ry O

bje

ct

Bas

is S

et O

bje

ct

...PeI

GS

...

Global Arrays

Memory Allocator

Parallel IO

MolecularModelingToolkit

MolecularCalculation

Modules

MolecularSoftware

DevelopmentToolkit

GenericTasksEnergy, structure, …

Page 15: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

• All major functions exist as independent modules.

• Modules only communicate thru the database and files

• No shared common blocks• The only argument to a

module is the database• Modules have well defined

actions• Modules can call other

modules

Program ModulesInput

SCF

DFT

MP2

CCSD

Stepper

Driver

nwArgos

Basis

Geometry

Energy

Filenames

Status

Database

Theory

Operation

Page 16: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

NWChem is supported on at least these platforms, and is readily ported to essentially

all sequential and parallel computers. IBM SP

IBM workstations

CRAY T3

SGI SMP systems

Fujitsu VX/VPP

SUN and other Homogeneous workstation networks

x86-based workstations running Linux including laptops

x86-based workstations running NT or Win98

Tru64 and Linux Alpha servers (including SC series)

Page 17: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

0

20

40

60

80

100

120

0 20 40 60 80 100 120

Speedup

LinearCRAY T3E-900

IBM SP2

iPSC/860 (Charmm Brooks et al.)

Myoglobin in Water 10,914 atoms

Rc=1.6 nm

0.36 / 108

0.94 / 128

0.17 / 108

0

250

500

750

1000

0 250 500 750 1000

Speedup

Number of nodes

LinearCRAY T3E-900

Octanol216,000 atoms

Rc=2.4 nm

0.58 / 1000

0

50

100

150

200

250

0 50 100 150 200 250

Speedup

Number of nodes

Linear

IBM SP2CRAY T3E-900

Dichloroethane-Water Interface 100,369 atoms

Rc=1.8 nm

1.0 / 250

1.6 / 250

MD Benchmarks

Page 18: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

0

20

40

60

80

100

120

0 20 40 60 80 100 120

Speedup

LinearCRAY T3E-900

IBM SP2

iPSC/860 (Charmm Brooks et al.)

0.36 / 108

0.94 / 128

0.17 / 108

Wall clock times and scaling obtained for a 1000 step NWChem classical molecular dynamics simulation of myoglobin in water, using the AMBER force field, and a 1.6 nm cutoff radius. The system size is 10,914 atoms.

Myoglobin

Page 19: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

105 atoms1343 functions362 electrons

Dunning augmentedcc-pvdz basis set

IBM Poughkeepsie vendor rack160 MHz nodes, 512 MB/node, 3 GB disk/node15 MB/sec/node sustained read bandwidth

37 hours(wall)

5.7 hours

0

100

200

300

0 60 120 180 240

No. of processors

Sp

eed

up

0

300

600

900

Tot

al d

isk

sp

ace

/ GB

Speedup(CPU)Speedup (Wall)GB

SCF: Parallel Performance

Page 20: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

KC8O4H16

C/O/H -aug-cc-pvdzK - Ahlrichs PVDZ458 functions114 electrons

120 MHz P2SC128MB mem/node2GB disk/nodeTB3 switch

Global ArraysTCGMSG+LAPI

100

1000

10000

50 100 150 200

No. of processors

Wal

l tim

e

TotalForw-tranMake-tLaiBack-tranNon-sepCPHFSepFock

2.9 hours

0.98 hours

MP2 Gradient Parallel Scaling

Page 21: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

Si8O7H18

347 Basis f.

LDA wavefunction

DFT Benchmarks

Page 22: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

Si28O67H30

1687 Basis f.

LDA wavefunction

DFT Benchmarks

Page 23: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

GA Operations The primitive operations that are invoked collectively by all processes are:

create a distributed array, controlling alignment and distribution;

destroy an array;

synchronize.

Primitive operations in MIMD style :

fetch, store and accumulate into a rectangular patch of global array;

gather and scatter;

atomic read and increment;

BLAS-like data-parallel operations on sections or entire global arrays :

vector operations including: dot product, scale, add;

matrix operations including: symmetrize, transpose, multiplication.

Page 24: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

GA sample program

status = ga_create(mt_dbl,n,n,'matrix',n,0,g_a) status = ga_create(mt_dbl,n,n,'matrix',n,0,g_b) status = ga_create(mt_dbl,n,n,'matrix',n,0,g_c) ...call ga_dgemm('N','N',n,n,n,1d0,g_a,g_b,0d0,g_c)

call ga_print(g_c) call ga_diag_std(g_c,g_a,evals)

status = ga_destroy(g_a)

Page 25: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

The SCF Algorithm in a Nutshell

( ) ( ) ( )

( ) ( ) [ ( )]

r r rD

F r r F r dr

; S[ S[ S [_ _

_ _

Density

Fock matrix

Page 26: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

Page 27: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

Acknowledgments

Jarek Nieplocha

The NWChem developers team

Page 28: NWChem software development with Global Arrays

Environmental Molecular Sciences Laboratory

Pacific Northwest National LaboratorySPSCICOMP2000 8/15/2000 EA

NWChem DevelopersHPCC Developers

Staff

• Dr. Edo Apra

• Dr. Eric Bylaska

• Dr. Michel Dupuis

• Dr. George Fann

• Dr. Robert Harrison

• Dr. Rick Kendall

• Dr. Jeff Nichols

• Dr. T. P. Straatsma

• Dr. Theresa Windus

Research Fellows

• Dr. Ken Dyall

• Prof. Eric Glendenning

• Dr. Benny Johnson

• Prof. Joop van Lenthe

• Dr. Krzyzstof Wolinski

Post-Doctoral Fellows• Dr. James Anchell• Dr. Dave Bernholdt• Dr. Piotr Borowski• Dr. Terry Clark• Dr. Holger Dachsel• Dr. Miles Deegan• Dr. Bert de Jong• Dr. Herbert Fruchtl• Dr. Ramzi Kutteh• Dr. Xiping Long• Dr. Baoqi Meng• Dr. Gianni Sandrone• Dr. Mark Stave• Dr. Hugh Taylor• Dr. Adrian Wong• Dr. Zhiyong Zhang

Non-HPCC Developers

Staff• Dr. Dave Elwood• Dr. Maciej Gutowski• Dr. Anthony Hess• Dr. John Jaffe • Dr. Rik Littlefield• Dr. Jarek Nieplocha• Dr. Matt Rosing• Dr. Greg Thomas

Post-Doctoral Fellows• Dr. Zijing Lin• Dr. Rika Kobayash• Dr. Jialin Ju

Graduate Students• Dr. Daryl Clerc