28
Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic Structure Calculations)

Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

  • View
    223

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Parallel Methods for Nano/Materials Science Applications

Andrew Canning

Computational Research Division LBNL & UC Davis, Applied Science Dept.

(Electronic Structure Calculations)

Page 2: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Outline

• Introduction to Nano/Materials science • Electronic Structure Calculations (DFT) • Parallelism for Plane Wave Approach • Code performance on High Performance

Parallel Computers • New Methods and Applications

Page 3: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

1991 Silicon surface reconstruction (7x7), Phys. Rev. (Stich, Payne, King-Smith, Lin, Clarke) Meiko I860, 64 processor Computing Surface (Brommer, Needels, Larson, Joannopoulos) Thinking Machines CM2, 16,384 bit processors

2006 1000 atom Molybdenum simulation with Qbox SC05 (Gordon Bell prize for peak performance). (F. Gygi et al. )

BlueGene/L, 207 Tflops on IBM BG/L (LLNL)

1998 FeMn alloys (exchange bias), Gordon Bell prize SC98 (Ujfalussy, Stocks, Canning, Y. Wang, Shelton et al.) Cray T3E, 1500 procs. first > 1 Tflop Simulation

Milestones in Parallel Calculations

2008 New Algorithm to Enable 400+ TFlop/s Sustained Performance in Simulations of Disorder Effects in High-Tc SC08 Gordon Bell prize (Thomas C. Schulthess et al.) Cray XT5 (ORNL), first > 1 Pflop Simulation Linear Scaling Divide-and-Conquer Electronic Structure Calculations (L-W Wang et. al.) SC08 Gordon Bell prize (algorithmic innovation )

Page 4: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Electronic Structure Calculations

• Accurate Quantum Mechanical treatment for the electrons• Each electron represented on grid or with some basis functions

(eg. Fourier components) • Compute Intensive: Each electron requires 1 million points/basis

(need 100s of electrons) • 70-80% NERSC Materials Science Computer Time

(first-principles electronic structure)

BaYCl:Ce excited state (new scintillator gamma radiation detector)

Page 5: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Motivation for Electronic Structure Calculations

• Most Materials Properties Only Understood at a fundamental level from accurate electronic structure calculations (Strength, Cohesion etc)

• Many Properties Purely Electronic eg. Optical Properties (Lasers)

• Complements Experiments • Computer Design Materials at the nanoscale

Page 6: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Materials Science Methods

• Many Body Quantum Mechanical Approach (Quantum Monte Carlo) 20-30 atoms

• Single Particle QM (Density Functional Theory) No free parameters. 100-1000 atoms

• Empirical QM Models eg. Tight Binding 1000-5000 atoms

• Empirical Classical Potential Methods thousand-million atoms

• Continuum Methods

Increasing #atoms

Single particle DFT methods largest user of supercomputer cycles of any scientific method in any discipline

Page 7: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Quantum Mechanics for molecules and crystals

Many Body Schrodinger Equation for electrons (position ri) in a molecule or crystals with nuclei charge Z and

position RI (natural units e = h = 1 etc.)

),..(),..(}||||

1

2

1{ 11

,,

2NN

Ii Iiji jii

i

rrErrRr

Z

rr

kinetic energy

electron-electron interaction

electron-nuclei interaction

• Solution of the above eigenfunction equation H=E (H is Hamiltonian) for the wavefunction gives complete information for the system.

• Observables take the form of operators eg. momentum p (classical form = mv) pop = - i h then p observed is given by the solution of pop=p

• Lowest eigenvalue pair (E0,) is the groundstate energy. Higher eigenpairs correspond to excited states of the system.

Page 8: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Ab initio Method: Density Functional Theory (Kohn 98 Nobel Prize)

),..(),..(}||||

1

2

1{ 11

,,

2NN

Ii Iiji jii

i

rrErrRr

Z

rr

)()(}||||

)(

2

1{ 2 rErV

Rr

Zrd

rr

riiiXC

I I

2

12 |),..(||)(|)( N

ii rrrr

Kohn Sham Equation (65): The many body ground state problem can be mapped onto a single particle non-linear problem with the same electron density and a different effective potential (cubic scaling).

Use Local Density Approximation (LDA) for )]([ rVXC (good for Si,C)

Many Body Schrodinger Equation (exact but exponential scaling )

= charge density

Page 9: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Selfconsistent Solution

)()()},(2

1{ 2 rErrV iiiin

Nii ,..,1}{

2|)(|)( rrN

ii

),( rVoutSelf

con

sis

ten

cy u

nti

l Vout =

Vin

N electronsN wave functionslowest N eigenfunctions

fix V(r,) linear problem

Page 10: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Choice of Basis for DFT(LDA)

Increasing basis size M

Gaussian FLAPW Fourier grid

Percentage of eigenpairs M/N

30% 2%Eigensolvers

Direct (scalapack) Iterative

Page 11: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Plane-wave Pseudopotential Method in DFT

)()())}((||||

)(

2

1{ 2 rErrV

Rr

Zrd

rr

rjjjXC

I I

Solve Kohn-Sham Equations self-consistently for electron wavefunctions within the Local Density Approximation

rkgi

g

jgkj ekCr ).(

, )()( 1. Plane-wave (Fourier) expansion for to energy cutoff |g| < rcut (sphere )

2. Replace “frozen” atom core (core electrons and nuclei) by a pseudopotential (pseudo-wavefunctions in core region allows rcut to be reduced without loss of accuracy)

Different parts of the Hamiltonian calculated in different spaces (Fourier and real) via 3d FFT

Page 12: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Computational Considerations

• H never computed explicitly (available through mat-vec product)• Matrix-vector product in NlogN steps (not N2) • Orthogonalization is expensive, Mat-vec is cheap • Matrix is dense (in Fourier or Real space) • Real space grid approximately double size of Fourier sphere• Each Selfconsistent step we have good guess for eigenvectors (from previous step) • Typically use stable CG based iterative methods or Davidson

FFT

)(2

1 2 ri )(rV

Page 13: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Parallelization (load balance, minimize communications)

Dataset: spheres of Fourier coefficients for each wavefunction (electron) 100s-1000s

N spheres of M points per sphere (M ~100N)

……….

Calculations (most costly):

1. Calculation of overlap matrices for orthogonalization, scales as MN2 (written in BLAS3)

2. Non-local part of pseudopotential calculation (ion-electron interaction) scales as MN2 (written in BLAS3) can also be done in real space

3. 3d FFTs to construct charge density in real space, scales as NMlogM

Load balancing calculations:

1., 2. Give the same number of points in sphere to each processor

3. Give the same number of columns of data to each processor

Conflicting conditions, np complete problem: bin (bag) packing problem

Page 14: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Parallelization (load balance, minimize communications)

• Consider sphere as one dimensional list of columns ordered according to length.• Give out columns to processors in descending order such that each column is given to the processor with the least number of points in the sphere.

3d FFT

Load Balancing Scheme:

3 processor example

Close to the same number of points and columns go to each processor

Communications:

• With this data distribution the BLAS3 operations in Fourier space have limited communications

• Main communications occur in 3d FFTs

Write specialized 3d FFTs for this data layout

Page 15: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Standard Parallel 3d FFT on NxNxN (x,y,z)

grid

1. Perform 1d FFTs on N2 x direction columns 2. Transpose (x,y,z) -> (y,z,x)

3. Perform 1d FFTs on N2 y direction columns

4. Transpose (y,z,x) -> (z,y,x)

5. Perform 1d FFTs on N2 z direction columns

6. Transpose (z,y,x) -> (x,y,z) optional

Scaling Issues (bandwidth and latency): computations/communications ~ N2NlogN/N3 = logN ~ O(10)

message size ~ (num. of processors)-2 for 1d layout

Page 16: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

– Works for any grid size on any number of processors

– Only non-zero elements communicated/calculated

– Most communication in global transpose (b) to (c) little communication (d) to (e)

– Many 3d FFTs done at the same time to avoid latency issues

– Much faster than vendor supplied 3d-FFT (no grid size limitations)

– Used in many codes (PARATEC, PETot, ESCAN, GIT code etc. )

Specialized 3d FFT for Electronic Structure Codes (Plane Waves/Fourier)

(a) (b)

(e)

(c)

(f)

(d)

FFT

FFT

FFT

Transp.

Transp.

Page 17: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Results: 3d-FFT 5123 grid on Cray XT4 (Franklin, NERSC, LBNL)

• Cray XT4 (NERSC) • Quad core proc. Opteron 2.3 GHz • 38,640 proc. cores, 9660 nodes

Procs. isen/rec isen/rec (1 core)

alltoallv isenrec-block = 40

alltoallv-block = 40

Weak scaling

Nprocs x 5122

128 0.6175 s 0.3350 0.5560

256 0.3752 s 0.1858 0.3033 0.3292 0.3013 0.1604

512 0.3417 s 0.1326 0.2007 0.1770 0.1609 0.1609

1024 6.3045 s 0.1221 0.1992 0.1913 0.0772 0.1711

2048 7.6232 s 6.2814 0.2767 0.5552 0.0477 0.1857

4096 9.4510 s 6.8414 0.4272 0.0439 0.2019

8192 0.3746 0.6888 0.0369 0.2415

Time for forward+reverse 3d FFT, 5123 grid uses FFTW 1d FFT libs

Page 18: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

PARATEC (PARAllel Total Energy Code)

• PARATEC performs first-principles quantum mechanical total energy calculation using pseudopotentials & plane wave basis set

• Written in F90 and MPI• Designed to run on large parallel

machines IBM SP etc. but also runs on PCs

• PARATEC uses all-band CG approach to obtain wavefunctions of electrons (blocks comms. Specialized 3dffts)

• Generally obtains high percentage of peak on different platforms (uses BLAS3 and 1d FFT libs)

• Developed with Louie and Cohen’s groups (UCB, LBNL)

Overall ratio calcs./communications ~ N (not logN)

Page 19: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

PARATEC: Code Details

• Code written in F90 and MPI (~50,000 lines) • 33% 3D FFT, 33% BLAS3, 33% Hand coded F90 • Global Communications in 3D FFT (Transpose)• Parallel 3D FFT handwritten, minimize comms. reduce

latency (written on top of vendor supplied 1D complex FFT )

Page 20: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

PARATEC: Performance

• Grid size 2503

• All architectures generally achieve high performance due to computational intensity of code (BLAS3, FFT)

• ES achieves highest overall performance : 5.5Tflop/s on 2048 procs (5.3 Tflops on XT4 on 2048 procs in single proc. node mode)

• FFT used for benchmark for NERSC procurements (run on up to 18K procs on Cray XT4, weak scaling )

• Vectorisation directives and multiple 1d FFTs required for NEC SX6

Developed with Louie and Cohen’s groups (UCB, LBNL), also work with L. Oliker, J Carter

Problem Proc

Bassi NERSC (IBM Power5)

Jaquard NERSC (Opteron)

Thunder (Itanium2)

Franklin NERSC (Cray XT4)

NEC ES (SX6)

IBM BG/L

Gflops/Proc

% peak

Gflops/Proc

% peak

Gflops/Proc

% peak

Gflops/Proc

% peak

Gflops/Proc

% peak

Gflops/Proc

% peak

488 AtomCdSe

QuantumDot

128 5.49 72%     2.8 51% 5.1 64%

256 5.52 73% 1.98 45% 2.6 47% 3.36 65% 5.0 62% 1.21 43%

512 5.13 67% 0.95 21% 2.4 44%   3.15 61% 4.4 55% 1.00  35%

1024 3.74 49% 1.8 32% 2.93 56% 3.6 46%

2048       2.37  46% 2.7 35%    

Page 21: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Application: Gamma Ray Detection ( Cerium activated Scintillators )

• Applications in cosmology, particle physics, medical, homeland security

• Scintillators: produce light in the optical range when exposed to gamma rays

• Brightest Scintillators are Cerium doped materials (lanthanum halides)

• Theoretical Screening of Candidate Materials

Project funded by the DHS, in collaboration with S. Derenzo, M. Weber, A. Chaudhry in Life Sciences (LBNL)

conduction

valence

5 d5 d

4 f4 f Ce states

Increasing energy

Page 22: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Scintillation process model (Ce activ.)

-ray excitation produces a hole and an e in the host material

The hole is captured by the dopant Ce site (Ce3+ -> Ce4+ )

Electron diffuses down through conduction band into 5d Ce

5d -> 4f transition on Ce site emits optical photon

conduction

valence

5 d

4 f

h

Key assumption: 4f and 5d states of Ce atom must be in the energy gap of the host crystal

F

Page 23: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Density of States Plot for Ce in LaBr3

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Ce

de

ns

ity

of

sta

tes

EnergyFermi

ho

st c

on

du

ctio

n b

and

ho

st v

alen

ce b

and

5 d5 d4 f4 f

Single Ce atom in large unit cell

Page 24: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Criteria to determine bright Ce activated Scintillators

1. Size of host material bandgap: smaller the bandgap the more electron hole pairs but must accommodate Ce 4f,5d

2. Energy difference between the VBM and Ce 4f level: larger this value the less probable will be hole capture at the Ce3+ site (requires multiple or high-energy phonons)

3. Energy difference between the CBM and Ce 5d level: if 5d is too close thermal excitations may excite electron into CB

4. Level of localisation of the (Ce3+)* state on the Ce atom:

conduction

valence

5 d5 d

4 f4 f Ce states

Page 25: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Predictions for some new materials

Compound Bandgap EG (eV)

Ce 4f - VBM (eV)

(Ce3+)* localisation %

Ba2YCl7:Ce 4.05 0.87 64

BaY2Br8:Ce 2.94 0.45 39

BaY2I8:Ce 2.50 ~0.0 42

LaFS:Ce 3.27 0.71 45

La2Ca2O5:Ce 2.38 0.26 28

La2Mg2O5:Ce 2.80 0.1 12

Ba2YCl7:Ce made by experimentalists and found to be a very bright scintillator. Initial Patent Filing taken out for Material Ba2YCl7:Ce

Page 26: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

The Quantization Condition of Quantum-well The Quantization Condition of Quantum-well States in Cu/Co(100)States in Cu/Co(100)

Copper substrate

Cobalt layerCopper Wedge

0

54 Å

4 m

m sc

an

d

photon beam in

electronsout

• Theoretical investigation of Quantum Well states in Cu films using our codes (PARATEC, PEtot) to compare with experiments at the ALS (E. Rotenberg, Y.Z. Wu, Z.Q. Qiu)

• New computational methods for metallic systems used in the calculations.

•Lead to an understanding of surface effects on the Quantum Well States. Improves on simple Phase Accumulation Model used previously

-1.5

-1.0

-0.5

0.0

E-E

F (e

V)

20151050Cu thickness(ML)

Difference between theory and experiment improved by taking surface effects into account

QW states in Copper Wedge

Page 27: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Summary

• This research was funded by the DOE as part of the Modeling and Simulation in Nanoscience Initiative (MICS and BES) contract No DE-AC03-76SF00098

• Gamma ray detection funded by DHS

• Plane wave electronic structure methods can scale to the 1K-10K processor regime

• Require different parallelization methods for higher proc. count (state parallelism, k-point parallelism eg. Qbox code)

Page 28: Parallel Methods for Nano/Materials Science Applications Andrew Canning Computational Research Division LBNL & UC Davis, Applied Science Dept. (Electronic

Future Directions

• O(N) based methods (exploit locality) can give sparse matrix problem

• Excited state calculations • Transport Calculations (conductivity)

(dynamical properties)