Summary of O(N) DFT: the Conquest code

CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006

Order-N DFT calculations with the Conquest code

Michael J. Gillan1,2, T. Miyazaki3 and D. Bowler1,2

1Physics and Astronomy Department, University College London, U.K.

2London Centre for Nanotechnology, University College London, U.K.

3Computational Materials Science Center, NIMS, Tsukuba, Japan Summary of O(N) DFT: the Conquest code

Summary of practical operation of Conquest: linear scaling, parallel scaling

Validation against standard plane-wave codes

Work in progress on Ge/Si hut-clusters


Si/Ge nanostructures

Epitaxial deposition of Ge on the Si (001) surface. There is a strain mismatch of a few percent, and the strain is relieved by formation of missing-row trenches. Evolution of surface structure with increasing coverage...

Early 2 x N

Late 2 x N

M x N

‘Hut’ clusters


Why traditional methods scale as N2 or worse

• There are N Kohn-Sham orbitals

• Each Kohn-Sham orbital extends over the entire volume V, so information in each is proportional to N. Hence, amount of information to be calculated is proportional to N 2.

• In standard methods, we have to calculate all overlap integrals

• Hence, number of operations is proportional to N 3. Prefactor of N 3 is small, so practical dependence is N 2, except for very large systems.

( )n r

( )n r( )n r

( ) ( )mn m nS d r r r


Density matrix is key to O(N)

• But Etot in DFT can be expressed entirely in terms of , so we have a variational principle: minimize Etot with respect to density matrix, subject to conditions that : (i) is symmetric; (ii) is idempotent; (iii) gives the correct number of electrons. It is enough to require that be weakly idempotent: its eigenvalues lie between 0 and 1.

( , ') r r

( , ') r r( , ') r r

• Density matrix is defined in terms of Kohn-Sham eigenfunctions as:

with f n orbital occupation numbers: at T = 0, f n = 0 or 1. The operator is idempotent: its eigenvalues are 0 or 1 (it is a projector).

( , ') r r

( , ') r r

( , ') ( ) ( )n n nn

f r r r r

• Now as . So we get an upper bound to Etot if we minimize Etot with the additional constraint that for .

( , ') 0 r r ' r r( , ') 0 r r ' cR r r


Density matrix and localised orbitals

• In practice, we cannot work directly with , since it is a function of two vector variables.

( , ') r r

with localized orbitals that vanish outside ‘localization regions’. In practice, take the localization regions spherical, radius Rreg and centred on the atoms. Label specifies different localized orbitals on given atom i.

• Instead, express as:( , ') r r

,,

( , ') ( ) ( ')i i j ji j

K

r r r r

( )i r

• Matrix is the density matrix in the (non-orthogonal) representation of .

,i jK ( )i r

• Search for ground state by minimizing Etot with respect to and subject to idempotency and correct electron number.

( )i r ,i jK


Enforcing idempotency

• In orthogonal tight-binding, the equivalent problem is to minimize:

subject to weak idempotency of K, and fixed electron number. tot TrE KH

• The same scheme is applied in CONQUEST, but in a non-orthogonal version:

where

3 2K LSL LSLSL

, ( ) ( )i j i jS d r r r

• An effective method is that of Li, Nunes and Vanderbilt, in which:

with L the ‘auxiliary density matrix’. This works because of the properties of the polynomial

2 33 2K L L

2 33 2x x


Summary of CONQUEST

• Ground-state search is done as three nested loops:

1. inner: minimization with respect to auxiliary density matrix , with cut-off distance RL.

2. middle: self-consistency

3. outer: minimization with respect to with cut-off distance Rreg

• Minimization with respect to at fixed Kohn-Sham potential and fixed is equivalent to non-orthogonal tight-binding; done by a combination of McWeeny purification and Li-Nunes-Vanderbilt.

• Self-consistency: reduction of density residual to zero: done by ‘Guaranteed Reduction Pulay’.

• Localized orbitals represented by B-splines, or by pseudo-atomic orbitals; minimization by conjugate gradients.

• Written as a parallel code from an early stage.

i jL

i jL

i

i

i


Parallel operation

Main aspects of CONQUEST parallel coding:

• Formation of overlap and Hamiltonian matrix elements by integration over grid points: integration grid is divided ito ‘domains’, with one processor responsible for a domain.

• Matrix operations: atoms divided into ‘primary sets’, with one processor responsible for rows of matrices associated with one primary set.

• Fourier transformation (Hartree potential etc...) is done in parallel.


Practical linear scaling

Tests of convergence of total energy with respect to L-matrix cut-off RL and localization-region radius Rreg. Bulk silicon.


Practical parallel scaling

Parallel scaling: increasing number of processors, constant number of atoms per processor

O(N) scaling: constant number of processors, increasing number of atoms


Relaxation of Si (001) surface

Method Basis Bond length (A)

Bond angle

SIESTA/NSC SZ 2.50 15.9

CONQUEST/NSC SZ 2.50 14.5

CONQUEST/NSC O(N)

SZ 2.42 15.0

CONQUEST/SC B-spline 2.37 22.8

SIESTA/SC DZP 2.40 19.9

VASP PW 2.41 19.7

Comparison of relaxed structure of Si (001) surface from non-self-consistent and self-consistent SIESTA and CONQUEST calculations using different basis sets with plane-wave results (VASP). Quantities compared are bond length and tilt angle of surface dimer pair.


Structure and energetics of Ge/Si hut clusters

Energetics of hut clusters with O(N) electronic-structure calculations: aims of the investigation:

• The smallest stable hut-clusters contain ~10,000 atoms of Ge. Previous investigations have used continuum elasticity theory combined with DFT calculations. Energetics of faces and edges is important.

• Combination of DFT and elasticity theory is not easy, and it is not known how large the clusters must be for this approach to be valid.

• Our aim: to use the hierarchy of electronic-structure methods to test previous calculations. We are investigating the energetics using: tight-binding; non-self-consistent DFT tight-binding, and self-consistent DFT tight-binding. In the future, also full DFT with plane-wave accuracy.

The faces of the hut-clusters are Ge (105) surfaces


Ge(105) : surface energy by different methods

Ge(105): surface energy (LDA)

empirical TB : 74.3 meV/Å2

NSC-AITB : 76.5 meV/Å2

SC-AITB : 81.5 meV/Å2

full DFT(blip) : 74.8 meV/Å2

STATE(planewave): 70.0 meV/Å2

Ge/Si(105): surface energy (GGA)

STATE(12.25 Ry): 43.7 meV/Å2

STATE( 36 Ry ): 45.6 meV/Å2

by Hashimoto et al.


Ge(105): cutoff of (auxiliary) density matrix L

Ge(105) : NSC-AITB

RL(bohr)Etot(Ha/atom)

Fmax(Ha) Esurf(eV/A2)

15.4 -4.847635 0.0027 0.0801

20.4 -4.850438 0.0014 0.0753

25.4 -4.851420 0.0007 0.0752

30.4 -4.851811 0.0004 0.0755

diag -4.852048 0.0001 0.0765

RL=20.4 (or 25.4) bohr is enough.


Strategy for calculations on hut clusters

Si (001)

Ge: total energy per 1 Ge atom of hut cluster and its dependence on • size of hut cluster: a• spacings of hut clusters: b• thickness of Si substrate: d

Ge of 2xN Ge/Si(001) surface

d

Ge= (Etot-Esub)/(# of Ge hut atoms)

(empirical) TB DFT calculations

a

b

Ge hut cluster


Size: • 28x28 on 32x32, thickness 8 layer

– 22,746 atoms

• Calculations are done on the Earth Simulator using 64 nodes (512 processors, 1/10 of the ES)– DMM : ~ 10min.– Initial forces are already small.– Structure Optimisation within NSC-AITB level

174Å

48Å


Ge hut cluster v.s. 2xN on Si(001): Tight-binding O(N)

•Type 2 ‘hut’ is more stable than type 1 for small coverage. •crossing of (Ge) between 2xN and hut?

• substrate of hut cluster: p2x2 (without dimer vacancies)• thicker substrates will reduce (Ge) of larger hut clusters.

-> hut should be more stable

• Similar calculations by order-N DFT in progress....


Conquest O(N) DFT on Ge/Si hut-clusters

• Type 1 ‘hut’ is more stable than type2.

• Further calculations being done now.

• Self-consistent DFT has also been achieved for this size of system.

Minimal-basis, non-self-consistent DFT. Size of system: 22,746 atoms


Summary and continuation...

Conquest O(N) DFT has been tested on simple systems, and shows satisfactory agreement with calculations using pseudo-atomic orbitals (Siesta) and with plane-wave calculations.

For Ge (105) surface, O(N) DFT calculations with different density-matrix cut-off radius RL show rapid convergence with respect to RL. Full blip-function basis in Conquest (equivalent to plane waves) gives good agreement with plane-wave results for surface formation energy.

Ge/Si hut clusters: O(N) tight-binding calculations on systems of 20,000 – 30,000 atoms are straightforward. O(N) self-consistent DFT calculations on this size of system are feasible.

Preliminary results on Ge/Si hut-clusters suggest cross-over to stable hut-clusters at coverage of about 3 monolayers. However, much more calculation and analysis is still needed.


Credits

• National Institute for Materials Science, Tsukuba

• International Center for Young Scientists, NIMS, Tsukuba

• MEXT grant for Scientific Research in Priority Areas “Development of New Quantum Simulators and Quantum Design” (No. 17064017)

• Earth Simulator Center, Yokohama

• UK Engineering and Physical Sciences Research Council

• Royal Society

Documents

Summary of O(N) DFT: the Conquest code