Upload
jovan
View
28
Download
0
Tags:
Embed Size (px)
DESCRIPTION
- PowerPoint PPT Presentation
Citation preview
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Order-N DFT calculations with the Conquest code
Michael J. Gillan1,2, T. Miyazaki3 and D. Bowler1,2
1Physics and Astronomy Department, University College London, U.K.
2London Centre for Nanotechnology, University College London, U.K.
3Computational Materials Science Center, NIMS, Tsukuba, Japan Summary of O(N) DFT: the Conquest code
Summary of practical operation of Conquest: linear scaling, parallel scaling
Validation against standard plane-wave codes
Work in progress on Ge/Si hut-clusters
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Si/Ge nanostructures
Epitaxial deposition of Ge on the Si (001) surface. There is a strain mismatch of a few percent, and the strain is relieved by formation of missing-row trenches. Evolution of surface structure with increasing coverage...
Early 2 x N
Late 2 x N
M x N
‘Hut’ clusters
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Why traditional methods scale as N2 or worse
• There are N Kohn-Sham orbitals
• Each Kohn-Sham orbital extends over the entire volume V, so information in each is proportional to N. Hence, amount of information to be calculated is proportional to N 2.
• In standard methods, we have to calculate all overlap integrals
• Hence, number of operations is proportional to N 3. Prefactor of N 3 is small, so practical dependence is N 2, except for very large systems.
( )n r
( )n r( )n r
( ) ( )mn m nS d r r r
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Density matrix is key to O(N)
• But Etot in DFT can be expressed entirely in terms of , so we have a variational principle: minimize Etot with respect to density matrix, subject to conditions that : (i) is symmetric; (ii) is idempotent; (iii) gives the correct number of electrons. It is enough to require that be weakly idempotent: its eigenvalues lie between 0 and 1.
( , ') r r
( , ') r r( , ') r r
• Density matrix is defined in terms of Kohn-Sham eigenfunctions as:
with f n orbital occupation numbers: at T = 0, f n = 0 or 1. The operator is idempotent: its eigenvalues are 0 or 1 (it is a projector).
( , ') r r
( , ') r r
( , ') ( ) ( )n n nn
f r r r r
• Now as . So we get an upper bound to Etot if we minimize Etot with the additional constraint that for .
( , ') 0 r r ' r r( , ') 0 r r ' cR r r
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Density matrix and localised orbitals
• In practice, we cannot work directly with , since it is a function of two vector variables.
( , ') r r
with localized orbitals that vanish outside ‘localization regions’. In practice, take the localization regions spherical, radius Rreg and centred on the atoms. Label specifies different localized orbitals on given atom i.
• Instead, express as:( , ') r r
,,
( , ') ( ) ( ')i i j ji j
K
r r r r
( )i r
• Matrix is the density matrix in the (non-orthogonal) representation of .
,i jK ( )i r
• Search for ground state by minimizing Etot with respect to and subject to idempotency and correct electron number.
( )i r ,i jK
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Enforcing idempotency
• In orthogonal tight-binding, the equivalent problem is to minimize:
subject to weak idempotency of K, and fixed electron number. tot TrE KH
• The same scheme is applied in CONQUEST, but in a non-orthogonal version:
where
3 2K LSL LSLSL
, ( ) ( )i j i jS d r r r
• An effective method is that of Li, Nunes and Vanderbilt, in which:
with L the ‘auxiliary density matrix’. This works because of the properties of the polynomial
2 33 2K L L
2 33 2x x
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Summary of CONQUEST
• Ground-state search is done as three nested loops:
1. inner: minimization with respect to auxiliary density matrix , with cut-off distance RL.
2. middle: self-consistency
3. outer: minimization with respect to with cut-off distance Rreg
• Minimization with respect to at fixed Kohn-Sham potential and fixed is equivalent to non-orthogonal tight-binding; done by a combination of McWeeny purification and Li-Nunes-Vanderbilt.
• Self-consistency: reduction of density residual to zero: done by ‘Guaranteed Reduction Pulay’.
• Localized orbitals represented by B-splines, or by pseudo-atomic orbitals; minimization by conjugate gradients.
• Written as a parallel code from an early stage.
i jL
i jL
i
i
i
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Parallel operation
Main aspects of CONQUEST parallel coding:
• Formation of overlap and Hamiltonian matrix elements by integration over grid points: integration grid is divided ito ‘domains’, with one processor responsible for a domain.
• Matrix operations: atoms divided into ‘primary sets’, with one processor responsible for rows of matrices associated with one primary set.
• Fourier transformation (Hartree potential etc...) is done in parallel.
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Practical linear scaling
Tests of convergence of total energy with respect to L-matrix cut-off RL and localization-region radius Rreg. Bulk silicon.
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Practical parallel scaling
Parallel scaling: increasing number of processors, constant number of atoms per processor
O(N) scaling: constant number of processors, increasing number of atoms
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Relaxation of Si (001) surface
Method Basis Bond length (A)
Bond angle
SIESTA/NSC SZ 2.50 15.9
CONQUEST/NSC SZ 2.50 14.5
CONQUEST/NSC O(N)
SZ 2.42 15.0
CONQUEST/SC B-spline 2.37 22.8
SIESTA/SC DZP 2.40 19.9
VASP PW 2.41 19.7
Comparison of relaxed structure of Si (001) surface from non-self-consistent and self-consistent SIESTA and CONQUEST calculations using different basis sets with plane-wave results (VASP). Quantities compared are bond length and tilt angle of surface dimer pair.
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Structure and energetics of Ge/Si hut clusters
Energetics of hut clusters with O(N) electronic-structure calculations: aims of the investigation:
• The smallest stable hut-clusters contain ~10,000 atoms of Ge. Previous investigations have used continuum elasticity theory combined with DFT calculations. Energetics of faces and edges is important.
• Combination of DFT and elasticity theory is not easy, and it is not known how large the clusters must be for this approach to be valid.
• Our aim: to use the hierarchy of electronic-structure methods to test previous calculations. We are investigating the energetics using: tight-binding; non-self-consistent DFT tight-binding, and self-consistent DFT tight-binding. In the future, also full DFT with plane-wave accuracy.
The faces of the hut-clusters are Ge (105) surfaces
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Ge(105) : surface energy by different methods
Ge(105): surface energy (LDA)
empirical TB : 74.3 meV/Å2
NSC-AITB : 76.5 meV/Å2
SC-AITB : 81.5 meV/Å2
full DFT(blip) : 74.8 meV/Å2
STATE(planewave): 70.0 meV/Å2
Ge/Si(105): surface energy (GGA)
STATE(12.25 Ry): 43.7 meV/Å2
STATE( 36 Ry ): 45.6 meV/Å2
by Hashimoto et al.
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Ge(105): cutoff of (auxiliary) density matrix L
Ge(105) : NSC-AITB
RL(bohr)Etot(Ha/atom)
Fmax(Ha) Esurf(eV/A2)
15.4 -4.847635 0.0027 0.0801
20.4 -4.850438 0.0014 0.0753
25.4 -4.851420 0.0007 0.0752
30.4 -4.851811 0.0004 0.0755
diag -4.852048 0.0001 0.0765
RL=20.4 (or 25.4) bohr is enough.
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Strategy for calculations on hut clusters
Si (001)
Ge: total energy per 1 Ge atom of hut cluster and its dependence on • size of hut cluster: a• spacings of hut clusters: b• thickness of Si substrate: d
Ge of 2xN Ge/Si(001) surface
d
Ge= (Etot-Esub)/(# of Ge hut atoms)
(empirical) TB DFT calculations
a
b
Ge hut cluster
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Size: • 28x28 on 32x32, thickness 8 layer
– 22,746 atoms
• Calculations are done on the Earth Simulator using 64 nodes (512 processors, 1/10 of the ES)– DMM : ~ 10min.– Initial forces are already small.– Structure Optimisation within NSC-AITB level
174Å
48Å
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Ge hut cluster v.s. 2xN on Si(001): Tight-binding O(N)
•Type 2 ‘hut’ is more stable than type 1 for small coverage. •crossing of (Ge) between 2xN and hut?
• substrate of hut cluster: p2x2 (without dimer vacancies)• thicker substrates will reduce (Ge) of larger hut clusters.
-> hut should be more stable
• Similar calculations by order-N DFT in progress....
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Conquest O(N) DFT on Ge/Si hut-clusters
• Type 1 ‘hut’ is more stable than type2.
• Further calculations being done now.
• Self-consistent DFT has also been achieved for this size of system.
Minimal-basis, non-self-consistent DFT. Size of system: 22,746 atoms
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Summary and continuation...
Conquest O(N) DFT has been tested on simple systems, and shows satisfactory agreement with calculations using pseudo-atomic orbitals (Siesta) and with plane-wave calculations.
For Ge (105) surface, O(N) DFT calculations with different density-matrix cut-off radius RL show rapid convergence with respect to RL. Full blip-function basis in Conquest (equivalent to plane waves) gives good agreement with plane-wave results for surface formation energy.
Ge/Si hut clusters: O(N) tight-binding calculations on systems of 20,000 – 30,000 atoms are straightforward. O(N) self-consistent DFT calculations on this size of system are feasible.
Preliminary results on Ge/Si hut-clusters suggest cross-over to stable hut-clusters at coverage of about 3 monolayers. However, much more calculation and analysis is still needed.
CCP6 Conference on Computational Physics,Gyeongju , Republic of Korea, August 2006
Credits
• National Institute for Materials Science, Tsukuba
• International Center for Young Scientists, NIMS, Tsukuba
• MEXT grant for Scientific Research in Priority Areas “Development of New Quantum Simulators and Quantum Design” (No. 17064017)
• Earth Simulator Center, Yokohama
• UK Engineering and Physical Sciences Research Council
• Royal Society