Introduction to Quantum Monte-CarloIntroduction to Quantum Monte-Carlo Francesco Sottile Ecole Polytechnique and ETSF ESNUM 9 June 2016. PreliminaryMonte-CarloQMC Outline Preminary

Preliminary Monte-Carlo QMC

Introduction to Quantum Monte-Carlo

Francesco Sottile

Ecole Polytechnique and ETSF

ESNUM 9 June 2016


Outline

Preminary (statistic) concepts

Monte-Carlo: means, samplings and Markov chains

Quantum Monte-Carlo: variational and diffusion MC


Outline





Two theorems


Two theorems

Law of large numbers

If you perform the same experiment a large number of times, theaverage of the results obtained should be close to the expectedvalue, and will tend to become closer as more trials are performed.

Central limit theorem

The mean of a sufficiently large number of independent randomvariables, each with finite mean and variance, will be approximatelynormally distributed.


Two theorems



mean or expected value = µ = 〈x〉 =∑j

pjxj =

∫dx p(x) x

variance = σ2 =⟨(x − µ)2

⟩=∑j

pjx2j − µ2 =

∫dx p(x) x2 − µ2

Sn =x1 + x2 + ..+ xn

n−→ µ


Two theorems

(Sn − µ) −→ N (0,σ2

n)




Two theorems






Two theorems

Large numbers + central limit

Sn −→ µ± ξ√n


Pseudo-Random Number Generator (PRNG)

Two (+1) requests for good PRNG

• It has to be good: long period, good lattice structure, goodsequences, etc.

• It has to be fast.

* It has to be reproducible


Pseudo-Random Number Generator (PRNG)

Two (+1) requests for good PRNG

• It has to be good: long period, good lattice structure, goodsequences, etc.

• It has to be fast.

* It has to be reproducible


Random Numbers

• Today’s libraries give reliable uniform random numbers(∈ [0, 1]).

• We are able, by transformation from the uniform distribution,to create random numbers distributed according to other(simple) functions, like the Gaussian.


Outline





Monte-Carlo sampling

y

A

BC

O x

X

X

X

X

X

X

1

π

4∼ Nhit

Ntot


Monte-Carlo sampling

f(x)

O x1

1

x x1 2

π

4∼∫ √

1− x2dx ∼ V

N

N∑i

√1− x2

i


Barely relevant Monte-Carlo sampling

I =

∫f (x)dx

advantages

• easy to implement

disadvantages

• converges only like O(

1√N

),

poorly compared to theSimpson’s method O

(1N4

)


Barely relevant Monte-Carlo sampling

I =

∫· ·∫

f (x)ddx

advantages

• easy to implement

• converges still like O(

1√N

),

compared to the Simpson’s

method O(

1N4/d

)disadvantages

• We hit a lot of empty(or barely relevant)space


Importance sampling

I =

∫ 1

0f (x)dx =

∫ 1

0

f (x)

p(x)p(x)dx

I =< f >=

⟨f

p

⟩p

How to choose p(x)?

• Choose p(x) to minimize the variance

σ = 0 ←− p(x) = f (x)I quite useless?

p(x) close to f (x), but simple enough to be sampled


Importance sampling

I =

∫ 1

0f (x)dx =

∫ 1

0

f (x)

p(x)p(x)dx

I =< f >=

⟨f

p

⟩p

How to choose p(x)?





Importance sampling

I =

∫ 1

0f (x)dx =

∫ 1

0

f (x)

p(x)p(x)dx

I =< f >=

⟨f

p

⟩p

How to choose p(x)?





Importance sampling

I =

∫ 1

0

ex − 1

e − 1dx = 0.418

σ1 = 0.3009540σ2 = 0.0560286σ3 = 0.1380024σ4 = 0.1838806

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

f(x) =

ex-1

e - 1

Sample of a function f(x) using different probability distributions


Importance sampling

I =

∫ 1

0

ex − 1

e − 1dx = 0.418

σ1 = 0.3009540σ2 = 0.0560286σ3 = 0.1380024σ4 = 0.1838806

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

f(x) =

ex-1

e - 1

p1(x) = 1



Importance sampling

I =

∫ 1

0

ex − 1

e − 1dx = 0.418

σ1 = 0.3009540σ2 = 0.0560286σ3 = 0.1380024σ4 = 0.1838806

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

f(x) =

ex-1

e - 1

p1(x) = 1

p2(x) = 2 x



Importance sampling

I =

∫ 1

0

ex − 1

e − 1dx = 0.418

σ1 = 0.3009540σ2 = 0.0560286σ3 = 0.1380024σ4 = 0.1838806

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

f(x) =

ex-1

e - 1

p1(x) = 1

p2(x) = 2 x

p3(x) = e

x/(e-1)



Importance sampling

I =

∫ 1

0

ex − 1

e − 1dx = 0.418

σ1 = 0.3009540σ2 = 0.0560286σ3 = 0.1380024σ4 = 0.1838806

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

f(x) =

ex-1

e - 1

p1(x) = 1

p2(x) = 2 x

p3(x) = e

x/(e-1)

p4(x) = 3 x

2



Importance sampling

√importance sampling is crucial in practice

√it relies on finding p(x)

× many-dimensions complex p(x) are difficult to find andto sample

One solution is Markov chains


Importance sampling






Importance sampling






Importance sampling






Markov Chains

Distribution function p(x)

e−βH∫e−βH

;|ψ|2∫|ψ|2

;e−S(x)∫e−S(x)

Two problems to overcome

• Hamiltonians, wavefunctions, actions are complicate(d-dimensional) functions (no way to find an analyticprimitive).

• they are normalized by an integral that has intrinsically thesame difficulty of the main integral


Markov Chains

Markov Chains sequence

x1P−→ x2

P−→ x3..P−→ xn

x1, x2, .. random but not independent

Markov Chain operator P(x → y)

It is possible to demonstrate that, no matter how complicate p(x)

• P(x → y) generates a sequence that, at the end, isdistributed according to p(x)

• we don’t need to know P(x → y)

• we don’t need to know p(x), but just a function proportionalto p(x).


Markov Chains


x1P−→ x2

P−→ x3..P−→ xn








Markov Chains


x1P−→ x2

P−→ x3..P−→ xn








Markov Chains


x1P−→ x2

P−→ x3..P−→ xn








Markov Chains


x1P−→ x2

P−→ x3..P−→ xn








Markov Chains

Simple example: two levels system

Population of cityA and population of cityB.Every year: 40% of people of cityA moves to CityB; 30% of thecontrary. Initially the population is A and B, for cityA and cityB.So the second year will be(

A′

B ′

)=

(0.6A + 0.3B0.4A + 0.7B

)=

(0.6 0.30.4 0.7

)(AB

)

P =

(0.6 0.30.4 0.7

)is the stochastic matrix


Markov Chains

Finding the converged stable distribution (of people)

• Iterate the Markov process, applying P to (A,B) to produce(A’,B’), then (A”,B”), etc.

• Considering that, at convergency PX = X . So the convergeddistribution is the eigenvector of the stochastic matrix, relatedto the unitary eigenvalue

This is the case in which we know the stochastic matrix and wefind the final distribution function.

But we want to generate sequences, according to a known finaldistribution function, without knowing P.


Markov Chains







Markov Chains







Markov Chains

Detailed balance principle or microreversibility

It can be demonstrated that any stochastic matrix P converges toa distribution function p(x) if

p(x)P(x → y) = p(y)P(y → x)

Missing: how to conceive P, in order to generate this sequence.Missing: here it seems we have to know p(x).


Markov Chains

Detailed balance principle or microreversibility

It can be demonstrated that any stochastic matrix P converges toa distribution function p(x) if

p(x)P(x → y) = p(y)P(y → x)

Missing: how to conceive P, in order to generate this sequence.Missing: here it seems we have to know p(x).


Metropolis method

Method to generate a microreversible P(x → y)

• We are at x

• We propose a trial move xT according to a symmetricprobability distribution F (x → xT ) = F (xT → x)

• Accept the trial move xT (and so put y = xT ) with

probability min(

1, p(xT )p(x)

)

we don’t need the exact p(x),but just a function proportional to αp(x)


Metropolis method


• We are at x



probability min(

1, p(xT )p(x)

)



Metropolis method


• We are at x



probability min(

1, p(xT )p(x)

)



Metropolis method


• We are at x



probability min(

1, p(xT )p(x)

)



Metropolis method


• We are at x



probability min(

1, p(xT )p(x)

)



Metropolis method


• We are at x



probability min(

1, p(xT )p(x)

)



Metropolis method

In practice

• F (x → xT ) is a Gaussian centered on x .

σ dynamically adjusted

• Accepting a trial move with probability p(xT )p(x) ?

Get a random ξ ∈ [0, 1].

Accept if p(xT )p(x) > ξ


Metropolis method

In practice

• F (x → xT ) is a Gaussian centered on x .

σ dynamically adjusted

• Accepting a trial move with probability p(xT )p(x) ?

Get a random ξ ∈ [0, 1].

Accept if p(xT )p(x) > ξ


Metropolis method

The M(RT)2 method is today used inmany different application,ranging fromnon-linear differential equations, tosimulation of galaxy formations: whatabout electronic structure calculation?


Metropolis method

The M(RT)2 method is today used inmany different application,ranging fromnon-linear differential equations, tosimulation of galaxy formations: whatabout electronic structure calculation?


Outline





Quantum Monte-Carlo

• Method to calculate the exact values of certain (ground-state)properties.

• Capable to reach high accuracy

• Wavefunction sampling is an alternative to brute forcewave-function representation (CI, CC) with advantages anddisadvantages

• QMC better scaling (N3 vs N6)• QMC subject to statistical errors


Quantum Monte-Carlo






Quantum Monte-Carlo






Quantum Monte-Carlo






Quantum Monte-Carlo







Quantum Monte-Carlo

• Variational Monte-Carlo

• Diffusion Monte-Carlo

• Path Integral Monte-Carlo, Reptation Monte-Carlo, Green’sfunctions Monte-Carlo


Quantum Monte-Carlo

• Variational Monte-Carlo

• Diffusion Monte-Carlo

• Path Integral Monte-Carlo, Reptation Monte-Carlo, Green’sfunctions Monte-Carlo


Variational Monte-Carlo

Variational Theorem

Given

〈E 〉 =

∫dx ψ∗(x)Hψ(x)∫dx ψ∗(x)ψ(x)

the variational theorem states that 〈E 〉 ≥ E0.And 〈E 〉 = E0 if and only if ψ ∝ φ0



Idea of VMC

Let’s consider a trial ψT (x , {α}).∫dx ψ∗T (x , {α})HψT (x , {α})∫dx ψ∗T (x , {α})ψT (x , {α})

= E ({α}) ≥ E0

Minimizing E ({α}), with respect to the paramaters {α} will givean (upper) estimate of E0

Of course, we will use Monte-Carlo methods to calculate the3N-dimensional integrals



Idea of VMC


= E ({α}) ≥ E0





Idea of VMC


= E ({α}) ≥ E0





Idea of VMC


= E ({α}) ≥ E0





What we don’t do

Naıvely we might uniformly sample ψHψ and ψψ for the twointegrals. ∫

dx ψ∗THψT ;

∫dx ψ∗TψT

for any {α}.



What we do: importance sampling

〈E 〉 =

∫dx ψ∗THψT∫dx |ψ∗T |2

=

∫dx |ψT |2 HψT

ψT∫dx |ψT |2

=

〈E 〉 =

∫dx ρ(x , {α})EL(x , {α})

EL(x , {α}) =HψT (x , {α})ψT (x , {α})

ρ(x , {α}) =|ψT (x , {α})|2∫dx |ψT (x , {α})|2



VMC in practice

1. Let’s generate a number of copies of the system, each one withdifferent (random) electron coordinates: x ’s (the walkers).

2. Let’s choose a form for the ψT (x , {α})

3. Let’s use the Metropolis method to propagate such walkers.

4. We can monitor some observables during the Markov chain, likelocal energy, variance, etc.

5. When the walkers are distributed like |ψT |2, say at step L, wecalculate the local energy

E =1

N

L+N∑i=L

EL(xi , {α})

6. We change now α and we go to step 3



VMC in practice

1. Let’s generate a number of copies of the system, each one withdifferent (random) electron coordinates: x ’s (the walkers).

2. Let’s choose a form for the ψT (x , {α})

3. Let’s use the Metropolis method to propagate such walkers.

4. We can monitor some observables during the Markov chain, likelocal energy, variance, etc.

5. When the walkers are distributed like |ψT |2, say at step L, wecalculate the local energy

E =1

N

L+N∑i=L

EL(xi , {α})

6. We change now α and we go to step 3


The trial wavefunction

Antisymmetric function for fermions

ψT (x) = D(x)J (x) =

ψT (x) =few∑ν

cν det[ψ↑ν,n(ri )

]det[ψ↓ν,m(rj)

]e−V (x).

V (x) =∑i

V1(ri ) +∑i ,j>i

V2(rij),


The trial wavefunction

Jastrow factor for jellium spheres

J = exp

(N∑i=1

V1(ri )

)exp

N∑i<j

V2(rij )

exp(V (N)

)

V1(ri ) =20∑n=1

α(i)n j0

(nβri

)

V(λ)2 (rij ) =

a(λ)rij + c(λ)r2ij + e(λ)r3

ij

1 + b(λ)(ri )rij + d (λ)r2ij + r3

ij

with

b(λ)(ri ) = b(λ)0 + b

(λ)1 arctan

[ r2i − R2

b

K (λ)

]λ = A,P (antiparallel and parallel spins), and j0 is the zero-order spherical Bessel

function(j0(x) = sin x

x

).

V (N) = γ(PC )2 + δ(PS )2

PC =N∑i

ri e PS = 2N∑i

riS(i)z



• Variational Monte-Carlo gives high quality results (recover90% of correlation energy)

• but it is still approximate (relies on the choice of the trialwavefunction)

We now want extremely accurate (exact) results for theground-state energy












Diffusion Monte-Carlo

ψ(t) =∑n

cne− i

~Entφn

Hφn = Enφn

cn =

∫dx φn(x)ψ(x , 0), n = 0, 1, 2, ..

In imaginary time,

ψ(τ) = c0e−E0τφ0 + c1e

−E1τφ1 + c2e−E2τφ2 + ..

τ→∞−−−→∝ φ0

We want a practical scheme to do this imaginary time evolutionand recover the ground-state energy



ψ(t) =∑n

cne− i

~Entφn

Hφn = Enφn

cn =

∫dx φn(x)ψ(x , 0), n = 0, 1, 2, ..

In imaginary time,


−E1τφ1 + c2e−E2τφ2 + ..

τ→∞−−−→∝ φ0




ψ(t) =∑n

cne− i

~Entφn

Hφn = Enφn

cn =

∫dx φn(x)ψ(x , 0), n = 0, 1, 2, ..

In imaginary time,


−E1τφ1 + c2e−E2τφ2 + ..

τ→∞−−−→∝ φ0




ψ(t) =∑n

cne− i

~Entφn

Hφn = Enφn

cn =

∫dx φn(x)ψ(x , 0), n = 0, 1, 2, ..

In imaginary time,


−E1τφ1 + c2e−E2τφ2 + ..

τ→∞−−−→∝ φ0




First step: shift of energy

i~∂ψ(x , t)

∂t=

[− ~2

2m∇2 + (V (x)− ET )

]ψ(x , t)

ψ(x , t) =∑n

cne− i

~ (En−ET )tφn(x)



Second step: Wick rotation in time

~∂ψ(x , τ)

∂τ=

[− ~2

2m∇2 + (V (x)− ET )

]ψ(x , τ)

ψ(x , τ) =∑n

cne− (En−ET )

~ tφn(x)

Role of ET

• ET > E0 the wavefunction will diverge exponentially fast

• ET < E0 the wavefunction will vanish exponentially fast

• ET = E0 the wavefunction will exponentially converge to φ0!

We want a practical method that,starting from an initial wave-function, performs an imaginarytime iteration, permitting succes-sive adjustements to ET , such thatat the end, the stationary solutioncorresponds to ET (τ →∞) = E0




~∂ψ(x , τ)

∂τ=

[− ~2

2m∇2 + (V (x)− ET )

]ψ(x , τ)

ψ(x , τ) =∑n

cne− (En−ET )

~ tφn(x)

Role of ET








~∂ψ(x , τ)

∂τ=

[− ~2

2m∇2 + (V (x)− ET )

]ψ(x , τ)

ψ(x , τ) =∑n

cne− (En−ET )

~ tφn(x)

Role of ET






DMC: practical scheme

First step: generation of walkers

Let’s generate Nw replicas of the systems sampled from the initialwavefunction ψT (x , 0)

ψ(x , 0) =Nw∑i=1

wiδ(x − xi )

Second step: writing the propagator

The integral form of the imaginary time Schrodinger equationinvolves the Green’s function

ψ(x ′, τ + δτ) =

∫dx G (x , x ′, δτ)ψ(x , τ)





ψ(x , 0) =Nw∑i=1

wiδ(x − xi )



ψ(x ′, τ + δτ) =

∫dx G (x , x ′, δτ)ψ(x , τ)



The propagator G

• Only diffusive term

∂ψ(x , τ)

∂τ= −D ∇2ψ(x , τ)

GD(x , x ′, δτ) = e−(x−x′)2

2δτ

• Only rate-term (branching)

∂ψ(x , τ)

∂τ= (V (x)− ET )ψ(x , τ)

GB(x , x , δτ) = e−(V (x)−ET )δτ

M = int[e−(V (x)−ET )δτ + ξ

]



The propagator G

• Only diffusive term

∂ψ(x , τ)

∂τ= −D ∇2ψ(x , τ)

GD(x , x ′, δτ) = e−(x−x′)2

2δτ

• Only rate-term (branching)

∂ψ(x , τ)

∂τ= (V (x)− ET )ψ(x , τ)

GB(x , x , δτ) = e−(V (x)−ET )δτ

M = int[e−(V (x)−ET )δτ + ξ

]



The propagator G

∂ψ(x , τ)

∂τ=[−D ∇2 + (V (x)− ET )

]ψ(x , τ)

G (x , x ′, δτ) = GD(x , x ′, δτ) · GB(x , x ′, δτ) +O(δτ)2

= e−(x − x ′)2

2δτ− (V (x)− ET ) δτ

+O(δτ)2

• diffuse a walker, and accept x ′ with probability

min(

1, ψ(x ′)G(x ,x ′)ψ(x)G(x ′,x)

)• remove or proliferate the walker according to the multiplicity,

calculated with the branching term



The propagator G

∂ψ(x , τ)

∂τ=[−D ∇2 + (V (x)− ET )

]ψ(x , τ)

G (x , x ′, δτ) = GD(x , x ′, δτ) · GB(x , x ′, δτ) +O(δτ)2

= e−(x − x ′)2

2δτ− (V (x)− ET ) δτ

+O(δτ)2

• diffuse a walker, and accept x ′ with probability

min(

1, ψ(x ′)G(x ,x ′)ψ(x)G(x ′,x)

)• remove or proliferate the walker according to the multiplicity,

calculated with the branching term





ψ(x , 0) =Nw∑i=1

wiδ(x − xi )



ψ(x ′, τ + δτ) =

∫dx G (x , x ′, δτ)ψ(x , τ)



Third step: calculate quantity of interest

Calculate quantity of interest (at this step) averaging on thewalkers.

E0(x , τ) =Nw∑i=1

EL(xi , τ) =Hψ(x , τ)

ψ(x , τ)

Fourth step: adjust trial energy

EnewT =

ET + E0(x , τ)

2

We continue to propagate until E0 = ET , exact result for theground-state energy.



This is exactly how things ...do not work


DMC: two issues

Fluctuations

Branching term causes large fluctuations in the number of walkers,preventing convergency. Solution: importance sampling.

Interpretation

ψT has to be positive everywhere (which is not the case) to beinterpreted as walkers distribution density. Solution: fixed-nodesapproximation.


DMC: Importance Sampling

f (x , τ) = ψT (x)ψ(x , τ)

f (x , 0) = |ψT (x)|2

so let’s get some walkers from our previous VariationalMonte-Carlo calculation.



f (x , τ) = ψT (x)ψ(x , τ)

f (x , 0) = |ψT (x)|2

so let’s get some walkers from our previous VariationalMonte-Carlo calculation.



−∂f (x , τ)

∂τ=[−D∇2 + (EL(x)− ET )

]f (x , τ) + D∇ [f (x , τ)v(x)]

with

EL(x) =HψT

ψTand v(x) =

∇ψT

ψT

• Branching term is now related to the local energy, rather thanto the potential.

• A new term appears, a drift term, for which the relativeGreen’s function can be easily evaluated, i.e.G (x , x ′, δτ) = δ(x − x ′ − v(x)δτ).

so our final equation becomes a drifted diffusion process (Brownianmotion within an external field) + branching.



−∂f (x , τ)

∂τ=[−D∇2 + (EL(x)− ET )

]f (x , τ) + D∇ [f (x , τ)v(x)]

with

EL(x) =HψT

ψTand v(x) =

∇ψT

ψT






−∂f (x , τ)

∂τ=[−D∇2 + (EL(x)− ET )

]f (x , τ) + D∇ [f (x , τ)v(x)]

with

EL(x) =HψT

ψTand v(x) =

∇ψT

ψT






−∂f (x , τ)

∂τ=[−D∇2 + (EL(x)− ET )

]f (x , τ) + D∇ [f (x , τ)v(x)]

with

EL(x) =HψT

ψTand v(x) =

∇ψT

ψT





DMC: Walkers evolution


DMC: Fixed-nodes approximation

Fixed-nodes: howto?

f (x , τ) = ψT (x)ψ(x , τ) ≥ 0

This is possible if the nodes(ψ(x , τ))=nodes(ψT (x)) all along theimaginary time evolution.Walker refusal= during the diffusive-drifted process a newposition is proposed for the walker. If this position changes thesign of the wave-function, the move is refused.This is an approximation and implies a (small) error.



Fixed-nodes: howto?

f (x , τ) = ψT (x)ψ(x , τ) ≥ 0




Fixed-nodes: howto?

f (x , τ) = ψT (x)ψ(x , τ) ≥ 0




Fixed-nodes: howto?

f (x , τ) = ψT (x)ψ(x , τ) ≥ 0




• FN-DMC uniquely depends on the nodes of the trialwave-function

• Even within Fixed-Nodes the accuracy of DMC is very high,and comparable to much more cumbersom methods like CI orCC


DMC: Can we release the nodes?• An antisymmetric function can be written as a difference

between to positive functions, f1 and f2

• Let’s associate a set of Walkers to f1, called W1, and a set ofWalkers to f2, called W2

• The Schrodinger eq. is linear, so we can perform imaginarytime iteration of these two set of walkers separately.

• The W1 and W2 wavefunctions have a not-negligiblesuperposition with the bosonic ground-state, lower in energy.

• If no numerical errors, the bosonic part of W1 and W2 cancelout, and we have an exact result.

• Since numerical errors are present, after few steps ⇒ bosoniccatastrophe


















































Fermion MC


Fermion Sign Problem: unresolved

Several solution / year BUT

• They have errors

• They are known not to work

• They have uncontrolled approximations

• Scaling not demonstrated






























QMC for solids: many issues

• Bloch theorem only valid in one-particle theories, not formany-body wavefunctions.

Consequences: twisted boundary conditions, supercells,finite-size error, Ewald sums not exact.

• Kinetic energy is large for deep electron, so pseudo-potentialare mandatory.

• Non-local pseudo-potential worsen the fermion-sign problem.

Documents

Introduction to Quantum Monte-CarloIntroduction to Quantum Monte-Carlo Francesco Sottile Ecole Polytechnique and ETSF ESNUM 9 June 2016. PreliminaryMonte-CarloQMC Outline Preminary