Upload
duongxuyen
View
224
Download
1
Embed Size (px)
Citation preview
RPA :
LaFeAsO, -(ET)2Cu(SCN)2, EtMe3Sb[Pd(dmit)2]2
中村 和磨 (東大工, A03-9)
Post constrained RPA Project: Reduction of spatial dimension
KAZUMA NAKAMURA (A03-9) YOSHIHIDE YOSHIMOTO (A02-5) MASATOSHI IMADA (A03-9)
Acknowledge: YOSHIRO NOHARA (Max Plank Institute)
KN-Yoshimoto-Nohara-Imada, J. Phys. Soc. Jpn. 79, 123708 (2010)
z2 z3
z1
z4
z1
z2z3
z4
Aim and Background
Strong correlation and quantum fluctuation from first principles and prediction of new phases and functions of correlated materials
Ab initio construction of effective model describing Low-energy property
Model analysis of derived model considering strong correlation and quantum fluctuation in high accuracy
LDA+Dynamical-Mean-Field Theory, V. I. Anisimov, et al. J. Phys. Cond. Mat., 9, 767 (1997) LDA+path-integral-renormalization-group; Y. Imai, I. V. Solovyev, M. Imada, PRL 95, 176405 (2005)
(1) Iron-bansed superconductors:
(2) Organic compounds:
(3) Alkali-cluster-in-zeolite systems:
- KN-Arita-Imada, JPSJ 77, 093711 (2008) - Miyake-KN-Arita-Imada, JPSJ 79, 044705 (2009) - Misawa-KN-Imada, JPSJ, 80, 023704 (2011) - KN-Yoshimoto-Nohara-Imada, JPSJ 79, 123708 (2010)
- KN-Koretsune-Arita, PRB 80, 043941 (2009)
- KN-Yoshimoto-Kosugi-Arita-Imada, JPSJ 78, 083710 (2009) - Shinaoka-Misawa-KN-Imada, in preparation
(4) Transition metal and its oxides: - KN-Arita-Yoshimoto-Tsuneyuki, PRB 74, 235113 (2006) - Miyake-Aryasetiawan-Imada, PRB 80, 155134 (2009)
(5) Excited states of semiconductors: - KN-Yoshimoto-Arita-Tsuneyuki-Imada, PRB 77,195126(2008)
Feasibility Studies (2006-prenent)
(6) Review: - Imada-Miyake, JPSJ 79, 112001 (2010)
1) Basis function
2) Transfer integral
3) Screened Coulomb, Screened exchange
Low-energy Hamiltonian
1) Maximally localized Wannier function (Marzari- Vanderbilt 1997, Souza-Marzari-Vanderbilt 2002)
2) Matrix elements for DFT Kohn-Sham Hamiltonian
Occupied (O)
Virtual (V)
Target (T)
Ef
RPA polarizability:
(1)
(2)
(3)
(4)
3) constrained RPA; Original idea Aryasetiawan et.al. 2004, Solovyev-Imada 2005 Practical detail KN-Arita-Imada, JPSJ 77, 093711, 2008
LaFeAsO: constrained RPA
bare constrained RPA full RPA
1/r
1/(6.7r)
Inte
racti
on
(eV
)
r (Angstrom)
KN-Arita-Imada, JPSJ 77, 093711 (2008)
LaFeAsOcRPA is 3D interaction with long-range tail decaying with power
What‘s the problem ? We derive ab initio parameters for 3D model, while we solve 2D model in the analysis stage
Considering strong quantum fluctuation effects with high accuracy is considerably difficult for the 3D model
We have serious problem on “dimensional inconsistency” LaFeAsO is quasi-2D system Derived model = 3D model, Analyzed model = 2D model FeAs layer
LaO layer
Reducing 3D to 2D
3D 2D
KEY IDEA: renormalize spatial dimension “Dimensional Downfolding”
We extend cRPA idea to the degree of freedom of “spatial space”
delete
delete
Interlayer interaction
Intralayer interaction
Renormalized interaction
Interlayer screening
(d)
;
Computational details:
1.
2.
3.
4.
5.
6.
with
with
xy yz
z2 x2-y2 zx
LaFeAsO: Band & Wannier
FeAs
LaO
t 300 meVt 10 meV
FeAs
LaO
Typical quasi-2D system, good target of present study
LaFeAsO: 2D downfolded
Inte
racti
on
(eV
)
r (Angstrom)
2D-cRPA
bare 3D-cRPA
full RPA
2D-cRPA
SummaryWe developed a new ab initio downfolding scheme for deriving effective low-energy models in low dimensions
It justify 2D short-ranged Hubbard models as effective models from first principles
Nakamura-Yoshimoto-Nohara-Imada, J. Phys. Soc. Jpn. 79, 123708 (2010)
Inte
racti
on
(eV
)
r (Angstrom) r (Angstrom)
Performance Report for Massively-Parallel Project For constrained-RPA code
KAZUMA NAKAMURA (A03-9) YOSHIHIDE YOSHIMOTO (A02-5)
Acknowledge: YOSHIRO NOHARA (Max Plank Institute) YUICHIRO MATSUSHITA (OSHIYAMA Lab) HIROAKI ISHIZUKA (MOTOME Lab)
Computational cost
Nk Nb Nb NPW
Nk
cost (Nk )2(Nb)
2 NPW
(Nk )2(Nb)
2 NPW
NkNbNPWO(10) O(10)
NkNb=
= 10,000 (if Nk =100, Nb =1,000)
Need of development for “distributed-memory RPA code”
Need: distributed-memory code
Memory size of ~ 400 Gbyte with Nband=2000, Nk=125, NPW = 100000
The data cannot be stored by single node alone
EtMe3Sb[Pd(dmit)2]2
For massively parallelization I
occ
unocc
occ
unocc
occ
unocc
1 2 3 4 5 6 7 8 9 10
Step1
Step2
Step3
occ
unoccStep4
1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10
Division of data;
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
Calc
Data split
Data send to MPI
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
Proposed by YOSHIHIDE YOSHIMOTO
For massively parallelization II
1 2 3 4 5 6 7 8 9 10
Step9
Step6
Step7
1 2 3 4 5 6 7 8 9 10
Step8
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
10 1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
9 10 1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9 10
10 1 2 3 4 5 6 7 8 9
Data Rotation MPI_SENDRECV
Calc
Calc
1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10
1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10
Data Rotation MPI_SENDRECV
9 10 1 2 3 4 5 6 7 8
core=128
- 8MPI*4OMP/comm ( ; )
- 4comm (q ; )
MPI_COMM_SPLIT
MPI_COMM_SPLIT
(q1) (q2) (q3) (q4)
Performance of our Code: Benchmark for small
System: SrVO3
SrVO3@kashiwa 2q
n time(sec) 1 341.4 - - - 4(1x4) 89.9 98.2 94.9 3.8 8(2x4) 49.5 97.7 86.2 6.9 12(3x4) 33.8 98.3 84.1 10.1 16(4x4) 27.3 98.1 78.2 12.5
(n=MPI OMP)
SrVO3@kashiwa 20q
+ (n=COMM MPI OMP)
n time(sec) 1 3590 - - - 8( 1x2x4) 492 98.6 91.3 7.3 32( 4x2x4) 126 99.6 89.3 28.6 80(10x2x4) 51 99.8 87.6 70.1 160(20x2x4) 26 99.9 85.8 137.3
Test Run at 2011/1/14: 2048-cores calculation
Performance of our code: Benchmark for large
system: C60
C60@kashiwa 1q
n time(sec) 1 15639.1 - - - 4( 1x4) 4077.0 98.6 95.9 3.8 8( 2x4) 2108.3 98.9 92.7 7.4 16( 4x4) 1071.3 99.4 91.2 14.6 32( 8x4) 542.6 99.6 90.1 28.8 64(16x4) 297.9 99.7 82.0 52.5
(n=MPI OMP)
C60@kashiwa 32q+ (n=COMM MPI OMP)
n time(sec) 64( 1x16x4) 9202.83 - - - 128( 2x16x4) 4657.06 99.98 98.81 126.72 256( 4x16x4) 2352.64 99.99 97.79 250.24 512( 8x16x4) 1166.69 100.00 98.60 504.96 1024(16x16x4) 589.33 100.00 97.62 999.04 2048(32x16x4) 305.81 100.00 94.04 1925.76
Product Run at 2011/2/11: 4096-cores calculation
Constrained RPA for dmit
- Nk=75 (5 5 3), - Nband = 1000 (Nocc= 464, Npocc= 4, Nvir= 532), - Ecut( ) = 36 Ry (100,000 PWs), - Ecut( ) = 4.0 Ry (3,200 PWs)
Condition of product run:
- SGI Altix ICE 8400EX sytem - X5570(4core) 2 - Ifort 11.1, SGI-oriented MPI, InfiniBand - 4096 core (4comm 128MPI 8omp) - Total time = 43h19min
Architecture and performance:
Dielectric function: dmit and -bedt
dmit -bedt
|q + G| (a.u)
M(q
+G
)
|q + G| (a.u)
- 4096 cores - 43h19min - kashiwa
- 128 cores - 384h (16days) - SR11000@ITC - 1/6 of dmit
Ener
gy (
eV)
Convergence:
12.5eV
|q +G| (a.u)
M(q
+G
)
20.0eV
| G| ( )
(q)
|q +G| (a.u)
dmit-bedt
egy
(eV
)
3D-cRPA Interaction: dmit and -bedt
dmit -bedtIn
tera
cti
on
(eV
)
r (Angstrom) r (Angstrom)
bare 3D-cRPA
Unfortunately dmit yet to be converged…
APPENDIX
z1
z2z3
z4
z2
z3
Computational data:
-Cu(SCN)2: Band & Wannier
t 65 meV
t 0.1 meV
Geometry
Wannier
-Cu(SCN)2: 2D downfolded In
tera
cti
on
(eV
)
r (Angstrom)
2D-cRPA
bare 3D-cRPA
full RPA
2D-cRPA
Screening length
c=16.4Å
-Cu(SCN)2LaFeAsO
zero at 8.4Å zero at 16.4Å
c=8.4Å
Thus, screening length of interlayer screening corresponds to the c value
Inte
racti
on
(eV
)
r (Å) r (Å)
z2 z3
z1
z4
Feynman Diagram for Screened interaction
Coulomb interaction between electrons at z1 and z4 are screened by RPA polarization of
(z2,z3)
z1
z2z3
z4
z1
z2
z3
z4
Interlayer screening
z1
z2
z3
z4
Electrons at z1 and z4 are in target layer, while screened electrons exist in z2 and z3 on other layer
z1
z2
z3
z411111111
other types of interlayer screening:
Computational details:
(2) Fourier transform of
(1) Target-band-RPA
wave vector in BZ
reciprocal lattice vector
in-plane, out-of-plane
(0) Below is post-cRPA story
z1
z2z3
z4
Layer 1
Layer 2 = target
Layer 3
z2
z3
z2
z3
z1
z2
z3
z41
Layer 1
Layer 2 = target
Layer 3
z2
z3
z1
z2
z3
z4
Layer 1
Layer 2 = target
Layer 3
z2
z3
z1
z2
z3
z4
Layer 1
Layer 2 = target
Layer 3
z1
z2
z3
z4
z2
z3
Layer 1
Layer 2 = target
Layer 3
We have to cut this polarization to avoid double counting of it
(3) Polarization cutting
CUT0
(4) Inverse FT of cut
(5) 2D dielectric function 2D
g , g’ : reciprocal lattice vector of super lattice
(6) 2D screened Coulomb
(7) 2D screened exchange
LaFeAsO: cRPA (previous slide)
Inte
racti
on
(eV
)
r (Angstrom)
bare 3D-cRPA
full RPA
3D-cRPA3D-cRPA3D-cRPA
full RPA
Program structurePolarization
do q = 1, Nk
do = 1, Npair
do k = 1, Nk
call FFT module to calculate
enddo
call TETRAHEDRON module to calculate
do G=1, NPW
do G’=1, NPW
do = 1,N
do k=1, Nk
enddo
enddo
enddo
enddo
enndo
enddo