10
Machine Learning Modeling of Wigner Intracule Functionals for Two Electrons in One-Dimension Rutvij Bhavsar 1, 2 and Raghunathan Ramakrishnan 2, a) 1) Department of Physics, Indian Institute of Technology Kanpur, Kanpur 208016, India 2) Tata Institute of Fundamental Research, Centre for Interdisciplinary Sciences, Hyderabad 500107, India (Dated: 23 January 2019) In principle, many-electron correlation energy can be precisely computed from a reduced Wigner distribution function (W) thanks to a universal functional transformation (F ), whose formal existence is akin to that of the exchange-correlation functional in density functional theory. While the exact dependence of F on W is unknown, a few approximate parametric models have been proposed in the past. Here, for a dataset of 923 one-dimensional external potentials with two interacting electrons, we apply machine learning to model F within the kernel Ansatz. We deal with over-fitting of the kernel to a specific region of phase-space by a one-step regularization not depending on any hyperparameters. Reference correlation energies have been computed by performing exact and Hartree–Fock calculations using discrete variable representation. The resulting models require W calculated at the Hartree–Fock level as input while yielding monotonous decay in the predicted correlation energies of new molecules reaching sub-chemical accuracy with training. I. INTRODUCTION The pursuit of reaching chemical accuracy (which is 1 kcal/mol = 0.0434 eV) in first-principles predictions of atomic and molecular energetics is persistent. Al- though the first landmark paper 1 in molecular quantum mechanics 2,3 by Hylleraas, had shown how to accurately calculate the energy of the most straightforward non- trivial electronic system, helium atom; to date, achiev- ing this feat for an arbitrary system is far from being reached. The key challenge lies in the incorporation of many-body correlation in the wavefunction—widely used quantum chemistry hierarchies exhibit very strong speed-to-accuracy trade-off, prohibiting accurate model- ing of such moderate-sized systems as benzene 4 . To ex- emplify, only in the last few years, it has become possi- ble to predict the vibrational spectrum of formaldehyde 5 , by incorporating anharmonic effects and agree with ex- perimental measurements to within 1 cm -1 . On the other hand, Kohn–Sham density functional theory (KS- DFT) 6 , within its domain of applicability, is so success- ful because its computational complexity is less than or similar to that of Hartree–Fock (HF) while its accuracy often exceeding that of even post-HF methods 7 . It is for this very reason, KS-DFT has found broad applicability in various domains such as catalysis, materials design, and even in ab-initio molecular dynamics simulations of protein-ligand complexes. However, reaching chemical accuracies for energetics using KS-DFT has been a long- standing problem. In the past decades, several research groups have explored a variety of non-standard quantum mechani- cal methods 8 , ranging from intracule functional theory a) Electronic mail: [email protected] (IFT), density matrix renormalization group (DMRG), reduced-density matrix functional approach (aka 2- RDM method), Sturmian method, quantum Monte-Carlo (QMC), etc. Among these, the IFT is the only method to have been included in a popular quantum chemistry package based on the Gaussian basis set framework with performance benchmarked for the energetics of small molecules 8,9 . One of the most attractive features of IFT is that the central variable here is the so-called Wigner in- tracule, which is a reduced-Wigner distribution function expressed in pairwise relative distances and momenta. In the following, we denote the Wigner intracule using the symbol W. In analogy with the formal existence of an exact exchange-correlation (XC) functional in DFT that maps the one-electron reduced-density uniquely to a system’s ground state energy 10 , IFT seeks a functional, F , that maps the W to the correlation energy, E c 9 . A very re- markable feature of this formalism is that the input vari- able, W, which is formally related to the pair-density, may be approximately coming from a HF wavefunction. Gill et al. 11 have shown a simple Gaussian form of F de- pending on 2-4 parameters to predict correlation energies of 18 atoms and 56 small molecules rather well. In Ref. 8, Gill had proposed strategies also to account for static correlation, so as to enhance the method’s performance for unsaturated systems. In short, this avenue, to quote Gill et al., 12 is a fertile, but largely unexplored, middle ground between the simplicity of DFT and the complexity of many-electron wavefunction theories”. Be it DFT or IFT, the ultimate goal of finding a uni- versal functional forecasting energetics within the afore- mentioned chemical accuracy would continue to remain elusive for the foreseeable future. For both problems, the main hurdle is that we do not know how to design the exact universal functional systematically, and to date, XC functionals have been developed mostly via empir- arXiv:1802.00873v2 [physics.chem-ph] 21 Jan 2019

Machine Learning Modeling of Wigner Intracule Functionals ... · (IFT), density matrix renormalization group (DMRG), reduced-density matrix functional approach (aka 2-RDM method),

  • Upload
    others

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning Modeling of Wigner Intracule Functionals ... · (IFT), density matrix renormalization group (DMRG), reduced-density matrix functional approach (aka 2-RDM method),

Machine Learning Modeling of Wigner Intracule Functionalsfor Two Electrons in One-Dimension

Rutvij Bhavsar1, 2 and Raghunathan Ramakrishnan2, a)

1)Department of Physics, Indian Institute of Technology Kanpur, Kanpur 208016,India2)Tata Institute of Fundamental Research, Centre for Interdisciplinary Sciences, Hyderabad 500107,India

(Dated: 23 January 2019)

In principle, many-electron correlation energy can be precisely computed from a reduced Wigner distributionfunction (W) thanks to a universal functional transformation (F), whose formal existence is akin to thatof the exchange-correlation functional in density functional theory. While the exact dependence of F on Wis unknown, a few approximate parametric models have been proposed in the past. Here, for a dataset of923 one-dimensional external potentials with two interacting electrons, we apply machine learning to modelF within the kernel Ansatz. We deal with over-fitting of the kernel to a specific region of phase-space bya one-step regularization not depending on any hyperparameters. Reference correlation energies have beencomputed by performing exact and Hartree–Fock calculations using discrete variable representation. Theresulting models require W calculated at the Hartree–Fock level as input while yielding monotonous decay inthe predicted correlation energies of new molecules reaching sub-chemical accuracy with training.

I. INTRODUCTION

The pursuit of reaching chemical accuracy (which is1 kcal/mol = 0.0434 eV) in first-principles predictionsof atomic and molecular energetics is persistent. Al-though the first landmark paper1 in molecular quantummechanics2,3 by Hylleraas, had shown how to accuratelycalculate the energy of the most straightforward non-trivial electronic system, helium atom; to date, achiev-ing this feat for an arbitrary system is far from beingreached. The key challenge lies in the incorporationof many-body correlation in the wavefunction—widelyused quantum chemistry hierarchies exhibit very strongspeed-to-accuracy trade-off, prohibiting accurate model-ing of such moderate-sized systems as benzene4. To ex-emplify, only in the last few years, it has become possi-ble to predict the vibrational spectrum of formaldehyde5,by incorporating anharmonic effects and agree with ex-perimental measurements to within ≈1 cm−1. On theother hand, Kohn–Sham density functional theory (KS-DFT)6, within its domain of applicability, is so success-ful because its computational complexity is less than orsimilar to that of Hartree–Fock (HF) while its accuracyoften exceeding that of even post-HF methods7. It is forthis very reason, KS-DFT has found broad applicabilityin various domains such as catalysis, materials design,and even in ab-initio molecular dynamics simulations ofprotein-ligand complexes. However, reaching chemicalaccuracies for energetics using KS-DFT has been a long-standing problem.

In the past decades, several research groups haveexplored a variety of non-standard quantum mechani-cal methods8, ranging from intracule functional theory

a)Electronic mail: [email protected]

(IFT), density matrix renormalization group (DMRG),reduced-density matrix functional approach (aka 2-RDM method), Sturmian method, quantum Monte-Carlo(QMC), etc. Among these, the IFT is the only methodto have been included in a popular quantum chemistrypackage based on the Gaussian basis set framework withperformance benchmarked for the energetics of smallmolecules8,9. One of the most attractive features of IFTis that the central variable here is the so-called Wigner in-tracule, which is a reduced-Wigner distribution functionexpressed in pairwise relative distances and momenta. Inthe following, we denote the Wigner intracule using thesymbol W.

In analogy with the formal existence of an exactexchange-correlation (XC) functional in DFT that mapsthe one-electron reduced-density uniquely to a system’sground state energy10, IFT seeks a functional, F , thatmaps the W to the correlation energy, Ec

9. A very re-markable feature of this formalism is that the input vari-able, W, which is formally related to the pair-density,may be approximately coming from a HF wavefunction.Gill et al.11 have shown a simple Gaussian form of F de-pending on 2-4 parameters to predict correlation energiesof 18 atoms and 56 small molecules rather well. In Ref. 8,Gill had proposed strategies also to account for staticcorrelation, so as to enhance the method’s performancefor unsaturated systems. In short, this avenue, to quoteGill et al.,12 “is a fertile, but largely unexplored, middleground between the simplicity of DFT and the complexityof many-electron wavefunction theories”.

Be it DFT or IFT, the ultimate goal of finding a uni-versal functional forecasting energetics within the afore-mentioned chemical accuracy would continue to remainelusive for the foreseeable future. For both problems, themain hurdle is that we do not know how to design theexact universal functional systematically, and to date,XC functionals have been developed mostly via empir-

arX

iv:1

802.

0087

3v2

[ph

ysic

s.ch

em-p

h] 2

1 Ja

n 20

19

Page 2: Machine Learning Modeling of Wigner Intracule Functionals ... · (IFT), density matrix renormalization group (DMRG), reduced-density matrix functional approach (aka 2-RDM method),

2

ical tuning. One of the more recent attempts to find-ing an exact XC functional have utilized kernel-ridge-regression, a machine learning (ML) method, and re-sulted in a model for N noninteracting spin-less fermionsin a one-dimensional box13. Such data-driven modelingstrategies, that need not be universally applicable buttailored for given dataset/domain, have now-a-days beenapplied to a multitude of problems such as quantum me-chanical properties of molecules14–25 and extended sys-tems26–31. For more comprehensive reviews, see Refs. 32and 33.

The present study aims to apply ML to discover anumerically exact F for the IFT. In this first proof-of-concept study, we restrict our explorations to a datasetcomprising of one-dimensional atoms and molecules, withtwo electrons. The total number of model systems con-sidered is 923 which includes all atoms and moleculesthat can be formed by taking up to six atoms with atomicnumbers ≤ 6. To use as training data, we calculate nu-merically exact Ec using discrete variable representation.

The rest of the paper reads as follows. In the nextsection, we develop the IFT formalism that is suitablefor model studies in one-dimension (1D). We also discussthe electronic Schrodinger equation, the composition ofthe dataset, and details of ML. Then, we present theprediction errors of the models along with the shapes ofnumerical F predicted by ML in a data-driven fashion.Finally, we conclude.

-30

-25

-20

-15

-10

-5

0

-10 -8 -6 -4 -2 0 2 4 6 8 10

Vext(

x)

[au]

x [au]

FIG. 1. Plots of 923 one-dimensional external potentials em-ployed in this study.

II. METHODS

A. Wigner Intracules in One-Dimension

The postulates of quantum mechanics state the exis-tence of a Hermitian operator S for any physically mea-surable quantity S, and that the expectation value of S

can be calculated for a given wavefunction, in either posi-tion (x) or momentum (p) representation, via the integral

〈S〉 =

∫dq ψ∗(x)S(x, p)ψ(x)

=

∫dp φ∗(p)S(x, p)φ(p) (1)

The phase-space formulation of quantum mechanics34 en-ables calculation of the same expectation value directlyusing a scalar observable S(x, p) corresponding to the op-

erator S, by employing a distribution function, f(x, p).

〈S〉 =

∫dx f(x, p)s(x, p) (2)

To date, for practical applications, the most popularchoice of f has been the Wigner distribution function(WDF), W (q, p)35. In 1D, i.e. when q is scalar valued,the WDF is defined as

W (x, p) =1

∫ +∞

−∞dy ψ∗

(x+

~y2

(x− ~y

2

)eipy

(3)

This function has also been widely considered to providea suitable framework to describe the dynamics of inter-acting quantum systems34,36,37. Despite its widespreadpopularity, WDF has been known to have counter-intuitive features. First of all, W (x, p) can be negativevalued; hence it is a quasi-probability distribution func-tion. Here, without delving further into the epistemologi-cal interpretations of WDF, for which we refer the readersto the excellent exposition by Zurek38, we proceed to thederivation of an analytic formula of W starting from aWDF.

For two electrons confined in 1D, the WDF is definedas the 2D integration

W (x,p) =1

4π2

∫ +∞

−∞dyΨ∗

(x +

~y2

(x− ~y

2

)eip·y

(4)

Note that W (x,p) ∈ R, x = {x1, x2} and p = {p1, p2}being the coordinate and momentum vectors of the twoelectrons. Further, WDF satisfies the normalization con-vention ∫ +∞

−∞dx

∫ +∞

−∞dpW (x,p) = 1 (5)

and is symmetric with respect to the exchange of positionand momentum variables.

W (x,p) =1

4π2

∫ +∞

−∞dqΦ∗

(p +

~q2

(p− ~q

2

)eix·q

(6)

From W (x,p), an intracule that is translationally androtationally invariant in phase-space can be derived as

W(u, v) =

∫ +∞

−∞dx

∫ +∞

−∞dpW (x,p)

δ (|x1 − x2| − u) δ (|p1 − p2| − v) (7)

Page 3: Machine Learning Modeling of Wigner Intracule Functionals ... · (IFT), density matrix renormalization group (DMRG), reduced-density matrix functional approach (aka 2-RDM method),

3

In principle, this 4D integration can be numericallyperformed, albeit using a tiny integration step that isrequired to capture the effect of Dirac-δ functions. Toarrive at an expression that is amenable to quick compu-tation, the explicit dependence of Eq.7 on the δ-functionshas to be eliminated. To this end, we invoke the relation,δ(|a| − b) = δ(a− b) + δ(−a− b), resulting in

W(u, v) =

∫ +∞

−∞dx

∫ +∞

−∞dpW (x,p) (8)

[δ (x1 − x2 − u) + δ (x1 − x2 + u)]

[δ (p1 − p2 − v) + δ (p1 − p2 + v)] ,

where we have utilized the fact that δ(x) is an even func-tion. We can now invoke the elementary properties of δ-function and arrive at an expression with four displacedWDFs.

W(u, v) =1

π~

∫ +∞

−∞dx1

∫ +∞

−∞dp1

[W (x1, p1, x1 + u, p1 + v) +W (x1, p1, x1 − u, p1 + v)+

W (x1, p1, x1 + u, p1 − v) +W (x1, p1, x1 − u, p1 − v)](9)

The first of the four terms in Eq. 9 can be simplifiedthrough a straight-forward analytic integration over p1

variable

W (x1, p1, x1 + u, p1 + v) =1

4π2

∫ +∞

−∞dydx1dp1

Ψ∗(x1 +

~y1

2, x1 + u+

~y2

2

(x1 −

~y1

2, x1 + u− ~y2

2

)exp [i {p1y1 + (p1 + v) y2}] (10)

Then, by making the substitutions α = (y1 + y2) /√

2 and

β = (y1 − y2) /√

2 along with the standard formula

∫ +∞

−∞dp1e

ip1√

2α = 2πδ(√

2α), (11)

we can simplify the RHS of Eq. 10 as a 3D integration.

1

∫ +∞

−∞dαdβdx1 Ψ∗

(x1 +

α+ β

2√

2, x1 + u+

α− β2√

2

(x1 −

α+ β

2√

2, x1 + u− α− β

2√

2

)eiv(α−β)/

√2δ(√

2α)

(12)

This expression, can be further simplified as the following 2D intergration by utilizing the property of δ-function.

1

2√

∫ +∞

−∞dβdx1 Ψ∗

(x1 +

β

2√

2, x1 + u− β

2√

2

(x1 −

β

2√

2, x1 + u+

β

2√

2

)e−ivβ/

√2 (13)

Finally, by collecting the simplified expressions for allfour displaced WDFs in Eq. 9, we arrive at the following

formula which requires a wavefunction as the argumentand computes the intracule through a 2D integration

W(u, v) =2

π

∫ +∞

−∞dβdx1 [Ψ∗ (x1 + β, x1 + u− β) Ψ (x1 − β, x1 + u+ β) +

Ψ∗ (x1 + β, x1 − u− β) Ψ (x1 − β, x1 − u+ β)] cos (2vβ) (14)

B. Computational Details and Dataset

In the following, we employ atomic units, i.e., massof electron, me = 1, and ~ = 1. Within the Born–Oppenheimer approximation, the non-relativistic molec-

ular electronic Hamiltonian for two electrons is given by

H =

2∑i=1

T (xi) +

2∑i=1

Vext(xi) +

2∑i=1

2∑j>i

Vee(xi, xj), (15)

The one-particle kinetic energy term takes the usual form

T (xi) = −1

2d2xi

(16)

Page 4: Machine Learning Modeling of Wigner Intracule Functionals ... · (IFT), density matrix renormalization group (DMRG), reduced-density matrix functional approach (aka 2-RDM method),

4

with the external potential defined as

Vext(xi) =

M∑A=1

−ZA√(xi − xA)2 + α

. (17)

where M is the number of nuclei, ZA is the nuclear chargeof atom A, and α ≥ 0; while the electron-electron inter-action operator is given by

Vee(xi, xj) =1√

(xi − xj)2 + α. (18)

For both attraction and repulsion potential energy op-erators, with α = 0 we recover the hard-Coulomb limit,while for other values of α, we have soft-Coulomb op-erators. In this study, we have used α = 1.0, whichresults in potential profiles qualitatively similar to thatof a hard-Coulomb potential with an incomplete cusp 39.For multi-well systems, we have utilized a separation of2 bohr. With the resulting Hamiltonians, we have per-formed HF and exact calculations using the basis set freeapproach, discrete variable representation (DVR)40. Inall calculations, we have used 128 grids in the domain−15 ≤ x1, x2 ≤ +15 bohr. In the 2D DVR calculationsthis results in matrices of size 1282 = 16384.

It is important to note that soft-Coulomb potentialsare not an approximations to exact potentials in 1D. Us-ing Gauss’s law it can be shown that the true (i.e. exact)electrostatic interaction potential can be obtained as thesolution of Poisson’s equation. In 1D, such a solutionamounts to a potential linearly depending on the coor-dinate (see APPENDIX 1). Hence, the so-called hard-Coulomb potentials with α = 0 in Eq. 17 and Eq. 18do not represent exact electrostatic potentials but ratherserve as model potentials providing qualitative insights.This latter case is peculiar in its own regards—such in-teractions lead to diverging energies. For instance, inthe case of a 1D analogue of helium atom, correspondingto Z = 2 in Eq.17, exact calculations performed with81922 = 67.1 × 106 product basis functions yield theground state energy to be −2.22420955, −5.83766144,−14.07977723, −30.76728906, and −60.61204718 hartreefor α = 1, 0.1, 0.01, 0.001 and 0.0001, respectively. Ananalytic proof for this divergence is presented separatelyin APPENDIX 2. However, it may be worthwhile to notethat the 1D hard-Coulomb problem can be tackled ap-proximately by introducing the Dirichlet boundary con-dition41.

Our dataset includes all possible 1D atoms andmolecules with two electrons. The total number of sys-tems is limited by the maximal number of atoms (Nmax)and the maximal nuclear charge (Zmax). In this way,we arrive at Zmax single-wells, Zmax(Zmax +1)/2 double-wells, Zmax(Zmax + 1)(Zmax + 2)/3! triple-wells, and ingeneral, Zmax(Zmax +1) . . . (Zmax +Nmax−1)/Nmax! po-tentials with Nmax minima. Using Nmax = Zmax = 6 wearrive at 923 1D potentials that are on display in Fig. 1.

C. Intracule-kernel modeling

The exact Wigner intracule correlation functional (F)provides an injective mapping between the Wigner in-tracule derived from a HF wavefunction and the exactmany-body correlation energy: F [W (θ)] = Ecorr. Here,θ collectively denotes the intracule variables {u, v}. Inthis work, we model F as a functional transformationthat maps W (θ, V ext) to Ecorr (V ext), where V ext is theexternal potential. Our goal, in particular, is to model Fas a kernel, G. In this case, the mapping is establishedthrough the inner product

Ecorr(V ext

)=

∫G (θ)W (θ) dθ (19)

While it is not the purpose of this study to investigate theformal existence of such a kernel, we assume its existenceas in8,9,11,12,42, and aim to find its numerical approxima-

tion, which we denote by G.The problem of finding an optimal kernel that mini-

mizes the prediction error in a least-squares fashion leadsto the unconstrained loss function

L = minG

∑k

[Ecorrk −

∫G (θ)Wk (θ) dθ

]2

(20)

The same equation, written in matrix-vector notation isgiven as

L = minG

∥∥∥∥Ecorr −∫G (θ)W (θ) dθ

∥∥∥∥2

2

, (21)

where Ecorr is a column vector of correlation energiesof the training set, and W is a super-matrix containingintracules of all the training molecules. The notation ‖·‖2indicates an L2- or Euclidean-norm. While this equationis exactly solvable ensuring zero loss, L, like all the rank-deficient system of equations, the solution is not unique,and one can arrive at one of the infinite solutions allsatisfying

G (θ) = W+Ecorr (22)

In the above equation, W+ is the Moore–Penrose pseudo-inverse43,44 of W

W+ = WT(WWT

)−1(23)

Furthermore, the main drawback of the intracule func-tional thus obtained will be the lack of continuity in θ.In other words, the values of the kernel over a given rangeof u and v will tend to oscillate so rapidly that the overallperformance will be governed by an excessive overfittingto the training set.

For such problems, one of the widely used proceduresto quench overfitting is regularizing the model45 by con-straining the magnitude of the kernel. In kernel-ridge-regression (KRR)—that has widely been applied in theML modeling of properties across chemical space—it isthe L2-norm of the coefficient vector that is added as a

Page 5: Machine Learning Modeling of Wigner Intracule Functionals ... · (IFT), density matrix renormalization group (DMRG), reduced-density matrix functional approach (aka 2-RDM method),

5

penalty to the loss-function32 after multiplying with theLagrangian multiplier, aka the length-scale hyperparam-eter. In the present study, where our inference is not donevia KRR (the kernel in KRR here should not be confusedwith the intracule kernel functional, G), our goal is to in-clude a penalty function in Eq. 21. For this purpose, we

use the L2-norm of G.

L = minG

∥∥∥∥Ecorr −∫G (θ)W (θ) dθ

∥∥∥∥2

2

+∥∥∥G (θ)

∥∥∥2

2(24)

It may be of interest to note that, this problem,when employing an L1-norm would be analogous to theleast-absolute shrinkage and selection operator (LASSO)approach46, that has recently been so successfully em-ployed to map the structure of binary materials29. Whilethe LASSO approach involves a two-fold non-linear op-timization, Eq. 24 has the desirable feature of being aconvex problem that can be solved exactly using purelinear algebra resulting in a unique minimum norm solu-

tion G. Accordingly, the exact solution to this problemcan be obtained as

G (θ) = W+Ecorr (25)

where W+ is the pseudoinverse of W obtained via rank-revealing QR (RRQR) factorization with column inter-changes47–49. In this procedure, the matrix W ∈ Rm×nis first decomposed as

W = QRPT = Q

[R11 R12

0 R22

]PT (26)

where Q ∈ Rm×m is an orthogonal matrix satisfyingQTQ = I and R11 ∈ Rq×q is an upper diagonal ma-trix. The permutation matrix P and the effective rankq are chosen such that R11 is well-conditioned (i.e. thecondition number, κ, is smaller than a threshold) and theL2-norm of the matrix R22 ∈ R(m−q)×(m−q) is numeri-cally negligible

R ≈[R11 R12

0 0

](27)

It is possible to further simplify the super-matrix R viaan orthogonal transformation from the right to eliminatethe off-diagonal matrix R12

R ≈ VT

[R11 R12

0 0

]V =

[T11 0

0 0

]V (28)

The overall decomposition can now be expressed as

W = Q

[T11 0

0 0

]VPT (29)

where T is also a triangular matrix. The pseudoinverseof W is now given by

W+ = PVT

[T−1

11 00 0

]QT (30)

and is used with Eq. 25.

0

10

20

30

40

50

-80 -70 -60 -50 -40 -30 -20 -10 0

#

Ec [kcal/mol]

FIG. 2. Distribution of correlation energies across 923 one-dimensional systems.

We note in passing that in the training of the ML mod-els, we have kept the most localized 83 potentials, corre-sponding to single-, double- until four-wells in the train-ing set because such small systems are under-representedin the entire dataset. A similar strategy had been em-ployed in an earlier ML study of molecular electronicproperties50. Such pruning, of course, is not necessarywhen dealing with large training sets as in large scaleKRR studies18,23,25. After training, the correlation en-ergy of a new out-of-sample system, that was not used inthe training of the ML model is estimated as

Ecorr(V ext,new

)=

∫G (θ)Wnew (θ) dθ (31)

III. RESULTS AND DISCUSSION

Our target quantities of interest, which are the cor-relation energies of the 923 systems, ranges from -2 to-73 kcal/mol. The distribution of Ec over the systemsis displayed in Fig. 2. Majority of the molecules exhibitmoderate correlation energies of about 10 kcal/mol. Incomparison, the correlation energy of a 3 d Helium atomis -26.4 kcal/mol. The system exhibiting weakest corre-lation also coincides with that of steepest single-well, the1D C4+ ion, with Ec = −2.1 kcal/mol. This trend is un-derstandable because within the soft-Coulomb approxi-mation, the electron-electron interaction is now relativelyweaker compared to the steeper external potential, con-fining the system to a small region.

The system exhibiting the strongest correlation withEc = −72.8 kcal/mol corresponds to a six-well, with fiveZ = 1 atomic centers and a terminal atom being Z = 2.The molecule that exhibits the next strongest correlationis also a six-well, but with all Z = 1, making it the mostdelocalized system. To exemplify the trend in Ec acrossthe dataset, for the extreme cases, we have plotted (seeFig. 3) the two-electron reduced density, ρ(x1, x2), and

Page 6: Machine Learning Modeling of Wigner Intracule Functionals ... · (IFT), density matrix renormalization group (DMRG), reduced-density matrix functional approach (aka 2-RDM method),

6

y [a

u]

x [au]

-8

-4

0

4

8

-8 -4 0 4 80

0.2

0.4

0.6

0.8

1.0

y [a

u]

x [au]

-8

-4

0

4

8

-8 -4 0 4 80

0.2

0.4

0.6

0.8

1.0

y [a

u]

x [au]

-8

-4

0

4

8

-8 -4 0 4 80

0.2

0.4

0.6

0.8

1.0

y [a

u]

x [au]

-8

-4

0

4

8

-8 -4 0 4 80

0.2

0.4

0.6

0.8

1.0

v [a

u]

u [au]

0

2

4

6

8

10

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

v [a

u]

u [au]

0

2

4

6

8

10

0 2 4 6 8 10-0.5-0.4-0.3-0.2-0.100.10.20.30.40.5

v [a

u]

u [au]

0

2

4

6

8

10

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

v [a

u]

u [au]

0

2

4

6

8

10

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

A) Exact

B) HF

FIG. 3. Two-electron probability density of the ground state, ρ(x1, x2) (LEFT) and the corresponding Wigner-intracules,W (u, v) (RIGHT) are plotted for the most localized (single well, Z = 6), and most delocalized (six-wells, Zi = 1; i = 1, . . . , 6)potentials in the dataset: A) using the exact ground state wavefunction, and B) using the restricted Hartree–Fock wavefunction.The shapes of the external potentials are shown as gray curves. For clarity, probability densities are normalized to maximalvalues.

Page 7: Machine Learning Modeling of Wigner Intracule Functionals ... · (IFT), density matrix renormalization group (DMRG), reduced-density matrix functional approach (aka 2-RDM method),

7

0

5

10

15

20

25

30

35

40

100 200 300 400 500 600 700 800 900

out-o

f-sam

ple

MA

E [k

cal/m

ol]

Training set size

0.2

1.0

5.0

25.0

300 600 900

out-o

f-sam

ple

MPA

E

Training set size

0

5

10

15

20

25

30

35

40

100 200 300 400 500 600 700 800 900

A)

B)

FIG. 4. Out-of-sample errors in correlation energies pre-dicted using Eq.31 as a function of training set size: A) Meanabsolute error (MAE) in kcal/mol. The inset shows errorsin log scale; B) Mean percentage absolute error (MPAE). Inboth cases, blue squares correspond to prediction errors whenthe models were generated after removing the numerical noisein intracules (see text for more details). In all cases, the en-velope encloses the standard deviation of the estimate from100 independent runs.

the corresponding W computed using Eq. 14. The figurefeatures the same plots from exact and HF calculations.For the most localized external potential, we find boththe correlated and uncorrelated wavefunctions to resultin essentially similar ρ(x1, x2) and W (u, v), implying aweak post-HF correction that complies with a small Ec.On the other hand, for the most delocalized system, theHF density lacks a cusp that is present in the exact den-sity along the x1 = x2 line. These observations are inline with the quantitative trends in Ec noted above.

Based on the trends noted for ρ(x1, x2), one can in-fer the intracule of the single-well, in Fig. 3 to be char-acteristic of an uncorrelated system. In particular, thecorresponding plots imply that an intracule localized atu = v = 0 can only arise from a wavefunction that intrin-sically lacks a Coulomb hole. For the same system, i.e.,the 1D C4+ ion, one could encounter a differentW profile

for a different choice of α. The intracule of the delocal-ized system, in contrast, exhibits a strong distortion fromthat of C4+. At the HF level, the profile corresponds toan elongated, and somewhat distorted, ellipse lacking anode. The exact intracule, on the other hand, exhibitsa non-radial node showing lobes of opposite signs. Alobe centered at u = 0 and v = 1 implies that both theelectrons tend to move along opposite directions with arelative velocity of 1 atomic unit.

Having discussed the composition of the dataset, andthe range of the target property to be modeled, we nowdiscuss the performance of the ML-predicted F . Fig. 4presents the out-of-sample prediction errors of the MLmodels as a function of the training set size. We present

results from two sets of calculations. One, in which G isobtained by solving Eq. 24. In the other, before train-ing, numerical noise in intracules was filtered. Such noisearises from the numerical integration of Eq. 14, resultingin non-zeroW(u, v), of the order of 10−6, for large u andv. As a consequence, the resulting kernels show spuri-ous non-zero values near boundary. The learning ratein Fig. 4 shows that independent of the noise reduction,with sufficient training, the models forecast reference cor-relation energies to a mean absolute error (MAE) lessthan 1 kcal/mol. However, one does note excellent learn-ing rates only after the noise filtration. From the inset ofthis plot, we find that even for the training set with 300potentials, the prediction errors drop below the desiredthreshold. We estimate the uncertainty in the model’sperformance arising from the training set bias by select-ing hundred different random sets. When using noise-freekernels, the prediction errors seemly show sub-kcal/molstandard deviation for training set sizes over 300 whilethe mean percentage absolute error (MPAE) drops to lessthan 1% for trainingset sizes > 500.

Since a analytic form for the ML-intracule-functionalis unknown, we could only appreciate the shapes of thesefunctions by plotting them on grids. In Fig. 5, we havecollected these plots for training set sizes 100, 500 and900. For all the three choices, we have presented the ker-nels computed with and without noise filtration. Whileoverall we note the essential shape of the functional tobe preserved in all case, we do find the profile to growdenser with more training data. Additionally, we observe

de-noising to dampen the G while improving its continu-ity.

IV. CONCLUSIONS

We have introduced a machine learning approach basedon the rank-revealing QR decomposition to numericallyidentify the correlation intracule functional. While usingWigner intracules derived from Hartree–Fock wavefunc-tions, the ML-predicted intracule functional yields accu-rate correlation energies. Our reference data comprises of923 1D externals potentials, with 6 atoms (single-wells)and 917 molecules (multi-wells), for which we have com-

Page 8: Machine Learning Modeling of Wigner Intracule Functionals ... · (IFT), density matrix renormalization group (DMRG), reduced-density matrix functional approach (aka 2-RDM method),

8

v[a

u]

u [au]

0

2

4

6

8

10

12

14

0 2 4 6 8 10 12 14-5-4-3-2-1012345

v[a

u]

u [au]

0

2

4

6

8

10

12

14

0 2 4 6 8 10 12 14-5-4-3-2-1012345

v[a

u]

u [au]

0

2

4

6

8

10

12

14

0 2 4 6 8 10 12 14-5-4-3-2-1012345

v[a

u]

u [au]

0

2

4

6

8

10

12

14

0 2 4 6 8 10 12 14-20

-10

0

10

20

v[a

u]

u [au]

0

2

4

6

8

10

12

14

0 2 4 6 8 10 12 14-20

-10

0

10

20

v[a

u]

u [au]

0

2

4

6

8

10

12

14

0 2 4 6 8 10 12 14-20

-10

0

10

20

BA

N = 100

N = 500

N = 900

N = 100

N = 500

N = 900

FIG. 5. ML-predicted intracule kernels, G, for trainingset sizes 100, 500 and 900. A) for ML training without noise filtrationin W, and B) after filtering noise in W

puted accurate HF and exact electronic energies usingDVR. We have derived an efficient expression to computethe Wigner intracule in 1D, which requires as input thetwo-electron wavefunction on a fine grid or as an analyticfunction.

Based on the trends in quantum chemistry method de-velopment, it would seem that the problem of deriving aclosed-form expression for the exact correlation intraculefunctional will continue to remain an open challenge, for

the foreseeable future. However, no severe hurdle seemsto be on the way of data-driven modeling of such func-tionals, at least for 1D models of atoms and moleculeswith two electrons. It remains to be seen if the approachpresented here can be extended to many-electron sys-tems, but still depending on a reduced Wigner function.Such efforts must also address if Wigner-intracules areN−representable, i.e., there is at least one antisymetric,N -electron wavefunction of which the Wigner-function

Page 9: Machine Learning Modeling of Wigner Intracule Functionals ... · (IFT), density matrix renormalization group (DMRG), reduced-density matrix functional approach (aka 2-RDM method),

9

is a reduced function51. With this note, we hope ourpresent study to aid other researchers in the combinedapplication of ML and IFT to study realistic 3D atomsand molecules.

V. ACKNOWLEDGMENTS

RB gratefully acknowledges TIFR for Visiting Stu-dents Research Programme (VSRP) and junior researchfellowships. RR thanks TIFR for financial support. Theauthors thank anonymous referees for thought-provokingcomments to a earlier version of the paper.

APPENDIX 1: COULOMB INTERACTION INONE-DIMENSION

We know that in 3D, the Coulomb potential (V (r))can be computed as the solution of the Poisson equation

∇2V (r) = −ρ(r)

ε0. (32)

For a unit positive charge located at r′, the charge-density is a Dirac-delta function, ρ(r) = δ(r − r′).The Poisson equation when solved with the appropriateboundary condition, lim

|r|→∞V (r) = 0, results in

∇2V (r) = −δ(r− r′)

ε0; V (r) = − 1

4πε0

1

|r− r′|. (33)

The solution may be checked using the Green’s function

G(ζ) =1

4π|ζ|; ∇2G(ζ) = δ(ζ). (34)

In comparison, in 1D, for a unit positive charge locatedat x′, Poisson equation takes the form

d2xV (x) =

−δ(x− x′)ε0

, (35)

the boundary condition being

dxV (x)|x′+∆x = −dxV (x)|x′−∆x (36)

for any positive ∆x. The solution is then the 1D Coulombpotential which is linear in x− x′

V (x) = − 1

2ε0|x− x′|. (37)

This potential gives rise to a uniform electric field up toa change in sign, E(x > x′) = −1/(2ε0); E(x < x′) =1/(2ε0).

APPENDIX 2: A NOTE ON HARD-COULOMBINTERACTIONS IN ONE-DIMENSION

Classical case: Let us consider a single-well(atomic) system with two electrons. For convenience,let the nucleus be fixed at the origin, x = 0. The hard-Coulomb external and electron-repulsion potentials areVext(x1, x2) = −Z(1/|x1| + 1/|x2|) and Vee(x1, x2) =1/|x1 − x2|, where x1 and x2 are the coordinates of thetwo electrons and Z is the atomic number. In classicalmechanics, such a system will reach equilibrium whenthe net-force on every electron Fext,i + Fee,i = 0. Usingsymmetry arguments, it can be shown that this conditionis reached only when the particles are equally displacedfrom the nucleus, x1 = −x2. For electron-1 (i = 1) theforce-balance criterion leads to the relation

Z

|x1|2− 1

|x1 − x2|2= 0

⇒ (Z − 1/4)1

|x1|2= 0 (38)

For any integer nuclear charge, Z, the system cannotbe in equilibrium for finite x1. So, starting with anyfinite electronic positions, the system will approach theleast-energy state corresponding to x1 = x2 = 0, withVext + Vee → −∞.

Quantum mechanical case: In the quantum me-chanical version of the same system, let us start with a(normalized) singlet trial-wavefunction, ψ(x1, x2), satis-fying the following two conditions

1. Kinetic energy and electron-electron repulsion ex-pectation values are finite; which also accounts forthe Coulomb-hole condition |ψ(x1, x1)|2 = 0∀x1 ∈R.

2. For one electron at the origin, the wavefunctiondoes not vanish (ψ(0, x2) 6= 0) in the neighbor-hood of at least one non-trivial value of x2 =y 6= 0. In other words, the minimal value of|ψ(x1, x2)|2 = β is non-vanishing in the domain(−ε, y − ε) < (x1, x2) < (+ε, y + ε) for a small pos-itive ε.

An example function satisfying both these conditions is

g(x1, x2) = Nea1(x1−x2)2+a2(x1+x2)2(x1 − x2)2, (39)

where a1, a2 ∈ R and N is the appropriate normalizationfactor.

While the second of the aforestated conditions en-sures that the minimal value of |ψ(x1, x2)|2 = β is non-vanishing around the point (0, y), it is easy to see that in

Page 10: Machine Learning Modeling of Wigner Intracule Functionals ... · (IFT), density matrix renormalization group (DMRG), reduced-density matrix functional approach (aka 2-RDM method),

10

the same domain, the expectation value of Vn diverges:

〈Vne〉 = −Z∫∫ ∞−∞

dx1dx2|ψ(x1, x2)|2( 1

|x1|+

1

|x2|

)= −2Z

∫∫ ∞−∞

dx1dx2|ψ(x1, x2)|2 1

|x1|

≤ −2Z

∫ y+ε

y−εdx2

∫ ε

−εdx1β

1

|x1|

= −4Zεβ

∫ +ε

−εdx1

1

|x1|= −8Zεβ[ln(|ε| − ln(0)]

→ −∞ (40)

Thus, for the trial-wavefunction chosen, 〈H〉 → −∞. It isnow fairly straightforward to apply variational principleand show that any other choice of trial-wavefunction ψwill satisfy 〈ψ|H|ψ〉 ≥ Eg, where Eg is the ground stateenergy. Hence, the upper bound for the ground stateenergy of a hard-Coulomb two-electron system alwaysdiverges towards negative infinity.

REFERENCES

1E. Hylleraas, Z. Phys. 48, 469 (1928).2H. F. Schaefer III, Quantum chemistry: The Development of abinitio Methods in Molecular Electronic Structure Theory (Doverpublications, 2004).

3E. A. Hylleraas, Adv. Quantum Chem. 1, 1 (1964).4A. Szabo and N. S. Ostlund, Modern Quantum Chemistry: Introto Advanced Electronic Structure Theory (Dover publications,1996).

5R. Ramakrishnan and G. Rauhut, J. Chem. Phys. 142, 154118(2015).

6R. G. Parr and Y. Weitao, Density-Functional Theory of Atomsand Molecules (Oxford University Press, 1989).

7F. Jensen, Introduction to computational chemistry (John wiley& sons, 2017).

8P. L. A. Popelier, Solving the Schrodinger Equation: Has Every-thing Been Tried? (World Scientific, 2011).

9P. M. Gill, D. L. Crittenden, D. P. ONeill, and N. A. Besley,Phys. Chem. Chem. Phys. 8, 15 (2006).

10P. Hohenberg and W. Kohn, Phys. Rev. 136, B864 (1964).11D. P. ONeill and P. M. Gill, Mol. Phys. 103, 763 (2005).12P. M. Gill, Annu. Rep. Prog. Chem. Sect. C 107, 229 (2011).13J. C. Snyder, M. Rupp, K. Hansen, K.-R. Muller, and K. Burke,

Phys. Rev. Lett. 108, 253002 (2012).14L. Hu, X. Wang, L. Wong, and G. Chen, J. Chem. Phys. 119,

11501 (2003).15X. Zheng, L. Hu, X. Wang, and G. Chen, Chem. Phys. Lett.390, 186 (2004).

16R. M. Balabin and E. I. Lomakina, Phys. Chem. Chem. Phys.13, 11710 (2011).

17E. O. Pyzer-Knapp, K. Li, and A. Aspuru-Guzik, Adv. Funct.Mater. (2015).

18R. Ramakrishnan and O. A. von Lilienfeld, CHIMIA Interna-tional Journal for Chemistry 69, 182 (2015).

19M. Rupp, A. Tkatchenko, K.-R. Muller, and O. A. von Lilienfeld,Phys. Rev. Lett. 108, 058301 (2012).

20P. C. Hansen, Discrete inverse problems: insight and algorithms,Vol. 7 (Siam, 2010).

21F. A. Faber, L. Hutchison, B. Huang, J. Gilmer, S. S. Schoenholz,G. E. Dahl, O. Vinyals, S. Kearnes, P. F. Riley, and O. A. vonLilienfeld, J. Chem. Theory Comput. 13, 5255 (2017).

22B. Huang and O. von Lilienfeld, J. Chem. Phys. 145, 161102(2016).

23R. Ramakrishnan, M. Hartmann, E. Tapavicza, and O. A. vonLilienfeld, J. Chem. Phys. 143, 084111 (2015).

24K. Hansen, F. Biegler, R. Ramakrishnan, W. Pronobis, O. A.Von Lilienfeld, K.-R. Muller, and A. Tkatchenko, J. Phys. Chem.Lett. 6, 2326 (2015).

25R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. von Lilienfeld,J. Chem. Theory Comput. 11, 2087 (2015).

26K. T. Schutt, H. Glawe, F. Brockherde, A. Sanna, K. R. Muller,and E. K. U. Gross, Phys. Rev. B 89, 205118 (2014).

27B. Meredig, A. Agrawal, S. Kirklin, J. E. Saal, J. W. Doak,A. Thompson, K. Zhang, A. Choudhary, and C. Wolverton,Phys. Rev. B 89, 094104 (2014).

28G. Pilania, C. Wang, X. Jiang, S. Rajasekaran, and R. Ram-prasad, Sci. Rep. 3 (2013).

29L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, andM. Scheffler, Phys. Rev. Lett. 114, 105503 (2015).

30F. A. Faber, A. Lindmaa, O. A. Von Lilienfeld, and R. Armiento,Phys. Rev. Lett. 117, 135502 (2016).

31F. Faber, A. Lindmaa, O. A. von Lilienfeld, and R. Armiento,Int. J. Quantum Chem. 115, 1094 (2015).

32R. Ramakrishnan and O. A. von Lilienfeld, Rev. Comp. Chem. ,225 (2017).

33O. A. von Lilienfeld, Angew. Chem. Int. Ed. (2017).34Y.-S. Kim and M. E. Noz, Phase space picture of quantum me-chanics: group theoretical approach, Vol. 40 (World Scientific,1991).

35E. Wigner, Phys. Rev. 40, 749 (1932).36J. J. W lodarz, Phys. Lett. A 133, 459 (1988).37J. P. Dahl, Physica A 114, 439 (1982).38W. H. Zurek, Nature 412, 712 (2001).39R. Ramakrishnan and M. Nest, Phys. Rev. A 85, 054501 (2012).40D. T. Colbert and W. H. Miller, J. Chem. Phys. 96, 1982 (1992).41C. J. Ball, P.-F. Loos, and P. M. Gill, Phys. Chem. Chem. Phys.19, 3987 (2017).

42P. M. Gill, N. A. Besley, and D. P. O’Neill, International journalof quantum chemistry 100, 166 (2004).

43E. Moors, Bull. Amer. Math. Soc. 26, 394 (1920).44R. Penrose, in Mathematical proceedings of the Cambridge philo-sophical society, Vol. 51 (Cambridge University Press, 1955) pp.406–413.

45B. Scholkopf and A. J. Smola, Learning with kernels: supportvector machines, regularization, optimization, and beyond (MITpress, 2002).

46R. Tibshirani, J. R. Stat. Soc. Ser. B , 267 (1996).47E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Dongarra,

J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, andD. Sorensen, LAPACK Users’ guide, Vol. 9 (Siam, 1999).

48G. H. Golub and C. F. Van Loan, Matrix computations, Vol. 3(JHU Press, 2012).

49M. Gu and S. C. Eisenstat, , SIAM J. Sci. Comput. 17, 848(1996).

50G. Montavon, M. Rupp, V. Gobre, A. Vazquez-Mayagoitia,K. Hansen, A. Tkatchenko, K.-R. Muller, and O. A. Von Lilien-feld, New J. Phys. 15, 095003 (2013).

51J. E. Harriman, in Energy, Structure and Reactivity: Proceedingsof the 1972 Boulder summer research conference on theoreticalchemistry (Wiley, 1972) pp. 221–236.