Monte Carlo Methods with applications to plasma physics€¦ · Monte Carlo Methods with applications to plasma physics ... Introduction 5 1. Plasmas 5 2. Controlled ... Plasma_physics

Monte Carlo Methods with applications to plasma

physics

Eric Sonnendrucker

Max-Planck-Institut fur Plasmaphysik

and

Zentrum Mathematik der TU Munchen

Lecture notes

Sommersemester 2014

version of July 7, 2014

Contents

Chapter 1. Introduction 51. Plasmas 52. Controlled thermonuclear fusion 53. The ITER project 74. The Vlasov-Maxwell and Vlasov-Poisson equations 75. Characteristics of a transport equation 96. The Particle In Cell (PIC) method 12

Chapter 2. Monte Carlo simulation 151. Principle of Monte Carlo method 152. Background in probability theory 153. Monte Carlo Simulation 214. Initialisation of given PDF 285. Variance reduction techniques 306. Coupling the Monte Carlo Vlasov solver with a grid based Poisson solver 34

Bibliography 41

3

CHAPTER 1

Introduction

1. Plasmas

When a gas is brought to a very high temperature (104K or more) electrons leave their orbitaround the nuclei of the atom to which they are attached. This gives an overall neutral mixture ofcharged particles, ions and electrons, which is called plasma. Plasmas are considered beside solids,liquids and gases, as the fourth state of matter.

You can also get what is called a non-neutral plasma, or a beam of charged particles, byimposing a very high potential difference so as to extract either electrons or ions of a metal chosenwell. Such a device is usually located in the injector of a particle accelerator.

The use of plasmas in everyday life have become common. These include, for example, neontubes and plasma displays. There are also a number industrial applications: amplifiers in telecom-munication satellites, plasma etching in microelectronics, production of X-rays.

We should also mention that while it is almost absent in the natural state on Earth, except theNorthern Lights at the poles, the plasma is 99% of the mass of the visible universe. Including thestars are formed from plasma and the energy they release from the process of fusion of light nucleisuch as protons. More information on plasmas and their applications can be found on the web siteen.wikipedia.org/wiki/Plasma_physics.

2. Controlled thermonuclear fusion

The evolution of energy needs and the depletion of fossil fuels make it essential to developnew energy sources. According to the well-known formula E = mc2, we can produce energy byperforming a transformation that removes the mass. There are two main types of nuclear reactionswith this. The fission reaction of generating two lighter nuclei from the nucleus of a heavy atomand the fusion reaction that is created from two light atoms a heavier nucleus. Fission is used inexisting nuclear power plants. Controlled fusion is still in the research stage.

The fusion reaction is the most accessible to fuse nuclei of deuterium and tritium, which areisotopes of hydrogen, for a helium atom and a neutron high energy will be used to produce theheat necessary to manufacture electricity (see Fig. 2).

The temperatures required for thermonuclear fusion exceed one hundred million degrees. Atthese temperatures the electrons are totally freed from their atoms so that one obtains a gas ofelectrons and ions which is a totally ionized plasma. To produce energy, it is necessary that theamplification factor Q which is the ratio of the power produced to the external power suppliedis greater than one. Energy balance allows for the Lawson criterion that connects the amplifica-tion factor Q the product nTtE where n is the plasma density, T its temperature and tE energyconfinement time in the plasma.

Fusion is the basis of the energy of stars in which a confinement at a sufficient density is providedby their mass. The research on controlled fusion on Earth is considering two approaches. On theone hand inertial confinement fusion aims at achieving a very high density for a relatively short timeby shooting on a capsule of deuterium and tritium beams with lasers. On the other hand magneticconfinement fusion consists in confining the plasma with a magnetic field at a lower density butfor a longer time. The latter approach is pursued in the ITER project whose construction has just

5

en.wikipedia.org/wiki/Plasma_physics

6 1. INTRODUCTION

Figure 1. Examples of plasmas at different densities and temperatures

Figure 2. The Deuterium-Tritium fusion reaction

started at Cadarache in the south-eastern France. The plasma is confined in a toroidal-shapedchamber called a tokamak that for ITER is shown in Figure 3.

There are also experimental facilities (NIF in the USA and LMJ in France) are being built forexperimental validation of the concept of inertial confinement fusion using lasers.

Note that an alternative option to lasers for inertial confinement using heavy ions beams is alsopursued. See http://hif.lbl.gov/tutorial/tutorial.html for more details.

More information on fusion can be found on wikipedia sites devoted to inertial fusion andmagnetic fusion: http://en.wikipedia.org/wiki/Inertial_confinement_fusion,http://en.wikipedia.org/wiki/Magnetic_confinement_fusion.

http://hif.lbl.gov/tutorial/tutorial.html

http://en.wikipedia.org/wiki/Inertial_confinement_fusion

http://en.wikipedia.org/wiki/Magnetic_confinement_fusion

4. THE VLASOV-MAXWELL AND VLASOV-POISSON EQUATIONS 7

Figure 3. Artist view of the ITER Tokamak

The current record fusion power produced for a deuterium-tritium reaction is equal to 16megawatts, corresponding to an amplification factor Q = 0.64. It was obtained in the JET tokamakin England. It is well established that to obtain an amplification factor much greater than one, it isnecessary to use a greater machine, hence the need for the construction of the ITER tokamak, whichwill contain a plasma volume five times larger than that of JET, to demonstrate the feasibility ofa power plant based on magnetic fusion. The amplification factor provided in ITER should begreater than 10.

3. The ITER project

The ITER project is a partnership between the European Union, Japan, China, South Korea,Russia, the United States and India for which an international agreement was signed November 21,2006 in Paris. It aims to demonstrate the scientific and technical feasibility of producing electricityfrom fusion energy for which there are significant resources of fuel and which has a low impact onthe environment.

The construction of the ITER tokamak is under way in Cadarache in the south-eastern Franceand the operational phase is expected to begin in 2019 and last for two decades. The main objectivesof ITER are firstly to achieve an amplification factor greater than 10 and so really allow theproduction of energy, secondly to implement and test the technologies needed for a fusion powerplant and finally to test concepts for the production of Tritium from Lithium belt used to absorbthe energy of neutrons.

If successful the next step called DEMO will be to build a fusion reactor fusion that will actuallyproduce energy before moving on to commercial fusion power plants.

More information is available on the web site http://www.iter.org.

4. The Vlasov-Maxwell and Vlasov-Poisson equations

At the microscopic level, a plasma is composed of a number of particles that evolve followingthe laws of classical or relativistic dynamics. The force that is by far dominant is the electromag-netic interaction coming from the electromagnetic field generated by the particles. This can bemathematically modelled by Newton’s equation of motion for each particle and the direct compu-tation of the electromagnetic field generated by each particle. However there are way too manyparticles in a plasma for this too be usable for numerical computations. Moreover such a detailed

http://www.iter.org

8 1. INTRODUCTION

description is not necessary for most applications. An approximate model which provides a veryaccurate description of the evolution of a plasma is the Vlasov equation obtained with tools fromstatistical physics. It is written for non-relativistic particles

∂fs∂t

+ v · ∂fs∂x

+q

m(E + v ×B) · ∂fs

∂v= 0,

where m is the mass of the particles, q their charge and f ≡ f(x,v, t) represents the particledensity in phase space at point (x,v) and at time t. It has the structure of a transport equationin phase space which includes the three dimensions of physical space and the three dimensions ofvelocity space (or momentum in the relativistic case). The self-consistent electromagnetic field canbe calculated by coupling with Maxwell’s equation with sources that are the charge densities andcurrent calculated from the particles:

− 1

c2

∂E

∂t+∇× B = µ0 J,

∂B

∂t+∇× E = 0,

∇ · E =ρ

ε0,

∇ · B = 0,

with

ρ(x, t) =∑s

qs

∫fs(x,v, t) dv, J(x, t) =

∑s

qs

∫fs(x,v, t)v dv.

There are many situations in which the time evolution of the electromagnetic field is slowcompared to the phenomenon being investigated. In this case a quasi static approximation ofMaxwell’s equations can be used. Often in these cases the electric field is by far dominant in theLorentz force q

m(E + v × B). Then the magnetic field can be neglected and Maxwell’s equationsreduce to

∇× E = 0, ∇ · E =ρ

ε0.

Under some geometric conditions on the computational domain ∇× E = 0 implies that there existsa scalar function φ called electrostatic potential such that E = −∇φ, then φ is a solution of thePoisson equation −∆φ = ρ

ε0and our model reduces to the Vlasov-Poisson system

∂fs∂t

+ v · ∂fs∂x− q

m∇φ · ∂fs

∂v= 0,

ρ(x, t) =∑s

qs

∫fs(x,v, t) dv,

−∆φ =ρ

ε0.

This is the system we will consider throughout the lecture.The macroscopic quantities which can be measured are defined using the first three velocity

moments of the distribution function f(x,v, t). Taking velocity moments of the Vlasov equationone can also derive relations between these quantities, which define a fluid model for plasmas.

• The particle density is defined by

n(x, t) =

∫f(x,v, t) dv,

• The mean velocity u(x, t) verifies

n(x, t)u(x, t) =

∫f(x,v, t)v dv,

5. CHARACTERISTICS OF A TRANSPORT EQUATION 9

• The pressure tensor P(x, t) is defined by

P(x, t) = m

∫f(x,v, t)(v − u(x, t))⊗ (v − u(x, t)) dv.

• The scalar pressure is one third of the trace of the pressure tensor

p(x, t) =m

3

∫f(x,v, t)|v − u(x, t)|2 dv,

• The temperature T (x, t) is related to the pressure and the density by

T (x, t) =p(x, t)

n(x, t).

In some cases inter-particle collisions rather than mere interactions through the mean elec-tromagnetic field play an important role for the evolution of the plasma. This is generally addedthrough a collision operator on the right-hand-side of the Vlasov equation. A fairly general collisionoperator is the Boltzmann operator Q(f, f) which is quadratic in the distribution function. In thislecture we will consider only a linear Fokker-Planck operator of the form

C(f) = ν∂

∂v

(∂f

∂v+v − uT

f

),

with ν a constant collision frequency, u(t, x) and T (t, x) being either given functions or defined selfconsistently from the distribution function by

n(t, x) =

∫f(t, x, v) dv, n(t, x)u(t, x) =

∫f(t, x, v)v dv, n(t, x)e(t, x) =

∫f(t, x, v)v2 dv,

and T (t, x) = e(t, x)− u(t, x)2.

End of lecture 1.

5. Characteristics of a transport equation

The Vlasov equation is a transport equation in phase space. It can be written in abstract form

(1)∂f

∂t+ A · ∇x,vf = 0,

where the gradient is now with respect to all the phase space variables and A = (v,− qm∇φ)T for

the Vlasov-Poisson equation. Note that ∇x,vA = 0 and therefore the Vlasov equation can also bewritten in conservative form

∂f

∂t+∇x,v · (Af) = 0.

An essentiel feature of the scalar transport equation as the Vlasov equation is that they can besolved using the characteristic which are the solutions of the ordinary differential equation

(2)dZ

dt= A(t,Z), Z(s) = z.

where Z = (X,V) denotes the phase space variable. The solution of the Cauchy problem (ODE +initial condition) (2) will be denoted by Z(t; s, z) which is the solution of the ODE at time t whosevalue was z at time s. It can be easily seen that the characteristics can be used for computing thesolution of the advection equation (1). Indeed, assuming Z(t; s, z) satisfies (2), we have

d

dtf(t,Z(t; s, z)) =

(∂f

∂t+dZ

dt· ∇f

)(t,Z(t; s, z)) =

(∂f

∂t+ A · ∇f

)(t,Z(t; s, z)) = 0

as f is a solution of (1).The characteristics can be travelled forward or backward in time and are uniquely determined

by any time-phase-space couple (s, z) which acts as an initial condition. Because they play an

10 1. INTRODUCTION

essential role in the theory and numerical methods for the Vlasov equation we shall study them abit more formally.

Let us recall the classical theorem of the theory of ordinary differential equations (ODE) whichgives existence and uniqueness of the solution of (2). The proof can be found in [2] for example.

Theorem 1. Assume that A ∈ Ck−1(Rd × [0, T ]), ∇A ∈ Ck−1(Rd × [0, T ]) for k ≥ 1 and that

|A(z, t)| ≤ κ(1 + |z|) ∀t ∈ [0, T ] ∀z ∈ Rd.

Then for all s ∈ [0, T ] and z ∈ Rd, there exists a unique solution Z ∈ Ck([0, T ]t × [0, T ]s × Rdz) of(2).

Proposition 1. Assuming that the vector field A satisfies the hypotheses of the previous the-orem, we have the following properties:

(i) ∀t1, t2, t3 ∈ [0, T ] and ∀z ∈ Rd

Z(t3; t2,Z(t2; t1, z)) = Z(t3; t1, z).

(ii) ∀(t, s) ∈ [0, T ]2, the application z 7→ Z(t; s, z) is a C1- diffeomorphism of Rd of inversey 7→ Z(s; t,y).

(iii) The jacobian J(t; s, 1) = det(∇Z(t; s, z)) verifies

∂J

∂t= (∇ ·A)(t; Z(t; s, z))J,

and J > 0. In particular if ∇ ·A = 0, J(t; s, 1) = J(s; s, 1) = det Id = 1, where Id is theidentity matrix of order d.

Proof. (i) The points z = Z(t1; t1, z), Z(t2; t1, z), Z(t3; t1, z) are on the same character-istic curve. This curve is characterized by the initial condition Z(t1) = z. So, taking anyof these points as initial condition at the corresponding time, we get the same solution of(2). We have in particular Z(t3; t2,Z(t2; t1, z)) = Z(t3; t1, z).

(ii) Taking t1 = t3 in the equality (i) we have

Z(t3; t2,Z(t2; t3, z)) = Z(t3; t3, z) = z.

Hence Z(t3; t2, .) is the inverse of Z(t2; t3, .) (we denote by g(.) the function x 7→ g(x)) andboth applications are of class C1 because of the previous theorem.

(iii) Let

J(t; s, 1) = det(∇Z(t; s, z)) = det((∂Zi(t; s, z)

∂zj)1≤i,j≤d).

But Z verifies dZdt = A(Z(t), t). So we get in particular taking the ith line of this equality

dZidt = Ai(Z(t), t). And taking the gradient we get, using the chain rule,

d

dt∇Zi =

d∑k=1

∂Ai∂zk∇Zk.

For a d × d matrix M the determinant of M is a d-linear alternating form taking asarguments the columns of M . So, denoting by (., . . . , .) this alternating d-linear form, wecan write detM = (M1, . . . ,Md) where Mj is the jth column of M . Using this notation

5. CHARACTERISTICS OF A TRANSPORT EQUATION 11

in our case, we get

∂J

∂t(t; s, 1) =

∂

∂tdet(∇Z(t; s, z))

= (∂∇Z1

∂t,∇Z2, . . . ,∇Zd) + · · ·+ (∇Z1,∇Z2, . . . ,

∂∇Zd∂t

)

= (d∑

k=1

∂A1

∂zk∇Zk,∇X2, . . . ,∇Zd) + . . .

+ (∇Z1,∇Z2, . . . ,d∑

k=1

∂Ad∂zk∇Zk)

=∂A1

∂z1J + · · ·+ ∂Ad

∂zdJ,

as (., . . . , .) is alternating and d-linear. Thus we have ∂J∂t (t; s, 1) = (∇ ·A)J . On the other

hand ∇Z(s; s, z) = ∇z = Id and so J(s; s, 1) = det Id = 1. J is a solution of the differentialequation

dJ

dt= (∇ ·A) J, J(s) = 1,

which admits as the unique solution J(t) = e∫ ts ∇·A(τ ;s,z)) dτ > 0 and in particular, if

∇ ·A = 0, we have J(t; s, 1) = 1 for all t.

After having highlighted the properties of the characteristics, we can now express the solutionof the linear advection equation (1) using the characteristics.

Theorem 2. Let f0 ∈ C1(Rd) and A satisfying the hypotheses of the previous theorem. Thenthere exists a unique solution of the linear advection equation (1) associated to the initial conditionf(z, 0) = f0(z). It is given by

(3) f(z, t) = f0(Z(0; t, z)),

where Z represent the characteristics associated to A.

Proof. The function f given by (3) is C1 as f0 and Z are, and Z is defined uniquely. Let’sverify that f is a solution of (1) and that it verifies the initial condition. First taking t = 0 in (3)we get

f(z, 0) = f0(Z(0; 0, z)) = f0(z)

so that the initial condition is verified.Let’s take the time derivative of (3)

∂f

∂t(z, t) =

∂Z

∂s(0; t, z) · ∇f0(Z(0; t, z)),

and taking the gradient of (3)

∇f(z, t) = ∇(f0(Z(0; t, z))

=d∑

k=1

∂f0

∂zk∇Zk(0; t, z)),

= ∇Z(0; t, z)T∇f0(Z(0; t, z)),

in the sense of a matrix vector product with the jacobian matrix

∇Z(0; t, z) = ((∂Zk∂zl

(0; t, z)))1≤k,l≤d.

12 1. INTRODUCTION

We then get

(4) (∂f

∂t+ A · ∇f)(z, t) =

∂Z

∂s(0; t, z) · ∇f0(Z(0; t, z)) + A(z, t) ·

(∇Z(0; t, z)T∇f0(Z(0; t, z))

).

Because of the properties of the characteristics we also have that

Z(t; s,Z(s; r, z)) = Z(t; r, z)

and taking the derivative with respect to s, we get

∂Z

∂s(t; s,Z(s; r, z)) +∇Z(t; s,Z(s; r, z))

∂Z

∂t(s; r, z) = 0.

But by definition of the characteristics ∂Z∂t (s; r, z) = A(Z(s; r, z), s) and as this equation is verified

for all values of t, r, s and so in particular for r = s. It becomes in this case

∂Z

∂s(t; s, z) +∇Z(t; s, z)A(z, s) = 0.

Plugging this expression into (4) we obtain

(∂f

∂t+ A · ∇f)(z, t) = −∇Z(0; t, z)A(z, t)) · ∇f0(Z(0; t, z))

+ A(z, t) ·(∇Z(0; t, z)T∇f0(Z(0; t, z))

).

But for a matrix M ∈ Md(R) and two vectors u,v ∈ Rd, on a (Mu) · v = uTMTv = u · (MTv).Whence we get

∂f

∂t+ A · ∇f = 0,

which means that f defined by (3) is solution of (1).The problem being linear, if f1 and f2 are two solutions we have

∂

∂t(f1 − f2) + A · ∇(f1 − f2) = 0,

and using the characteristics ddt(f1−f2)(Z(t), t) = 0. So if f1 and f2 verify the same initial condition,

they are identical, which gives the uniqueness of the solution which is thus the function given byformula (3).

6. The Particle In Cell (PIC) method

The principle of a particle method is to approximate the distribution function f solution ofthe Vlasov equation by a sum of Dirac masses centered at the particle positions in phase space(xk(t),vk(t))1≤k≤N of a number N of macro-particles each having a weight wk. The approximateddistribution function that we denote by fN then writes

fN (x,v, t) =N∑k=1

wkδ(x− xk(t)) δ(v − vk(t)).

Positions x0k, velocities v0

k and weights wk are initialised such that fN (x,v, 0) is an approximation,in some sense that remains to be precised, of the initial distribution function f0(x,v). The timeevolution of the approximation is done by advancing the macro-particles along the characteristicsof the Vlasov equation, i.e. by solving the system of differential equations

dxkdt

= vk

dvkdt

=q

mE(xk, t)

xk(0) = x0k, vk(0) = v0

k.

6. THE PARTICLE IN CELL (PIC) METHOD 13

Proposition 2. The function fN is a solution in the sense of distributions of the Vlasovequation associated to the initial condition f0

N (x,v) =∑N

k=1wkδ(x− x0k) δ(v − v0

k).

Proof. Let ϕ ∈ C∞c (R3 × R3×]0,+∞[). Then fN defines a distribution of R3 × R3×]0,+∞[in the following way:

〈fN , ϕ〉 =N∑k=1

∫ T

0wkϕ(xk(t),vk(t), t) dt.

We then have

〈∂fN∂t

, ϕ〉 = −〈fN ,∂ϕ

∂t〉 = −

N∑k=1

wk

∫ T

0

∂ϕ

∂t(xk(t),vk(t), t) dt,

butd

dt(ϕ(xk(t),vk(t), t)) =

dxkdt· ∇xϕ+

dvkdt· ∇vϕ+

∂ϕ

∂t(xk(t),vk(t), t),

and as ϕ has compact support in R3 × R3×]0,+∞[, it vanishes for t = 0 and t = T . So∫ T

0

d

dt(ϕ(xk(t),vk(t), t)) dt = 0.

It follows that

〈∂fN∂t

, ϕ〉 =N∑k=1

wk

∫ T

0(vk · ∇xϕ+

q

mE(xk, t) · ∇vϕ) dt

= −〈v · ∇xfN +q

mE(xk, t) · ∇vfN , ϕ〉.

Which means that fN verifies exactly the Vlasov equation in the sense of distributions.

Consequence: If it is possible to solve exactly the equations of motion, which is sometimes thecase for a sufficiently simple applied field, the particle method gives the exact solution for an initialdistribution function which is a sum of Dirac masses.

The self-consistent electromagnetic field is computed on a mesh of physical space using a clas-sical method (e.g. Finite Elements, Finite Differences, ...) to solve the Maxwell or the Poissonequations.

In order to determine completely a particle method, it is necessary to precise how the initialcondition f0

N is chosen and what is numerical method chosen for the solution of the characteristicsequations and also to define the particle-mesh interaction.

Let us detail the main steps of the PIC algorithm:Choice of the initial condition.

• Deterministic method: Define a phase space mesh (uniform or not) and pick as the initialposition of the particles (x0

k,v0k) the barycentres of the cells and for weights wk associated

to the integral of f0 on the corresponding cell: wk =∫Vkf0(x,v) dxdv so that

∑k wk =∫

f0(x,v) dxdv.• Monte-Carlo method: Pick the initial positions in a random or pseudo-random way using

the probability density associated to f0.

Remark 1. Note that randomization occurs through the non-linear processes, which are gener-ally such that holes appear in the phase space distribution of particles when they are started froma grid. Moreover the alignment of the particles on a uniform grid can also trigger some smallphysical, e.g. two stream, instabilities. For this reason a pseudo-random initialization is usuallythe best choice and is mostly used in practice.

14 1. INTRODUCTION

The particle approximation fN of the distribution function does not naturally give an expressionfor this function at all points of phase space. Thus for the coupling with the field solver which isdefined on the mesh a regularizing step is necessary. To this aim we need to use a smooth convolutionkernel S for this regularization procedure. S could typically be a Gaussian or preferably in practice asmooth piecewise polynomial spline function which has the advantage of having a compact support.

The source for Poisson’s equations ρ is defined from the numerical distribution function fN , fora particle species of charge q by

ρN = q∑k

wkδ(x− xk).

We then apply the convolution kernel S to define ρ at any point of space and in particular at thegrid points:

ρh(x, t) =

∫S(x− x′)ρN (x′) dx′ = q

∑k

wkS(x− xk),

Time scheme for the particles. Let us consider first only the case when the magnetic fieldvanishes (Vlasov-Poisson). Then the macro-particles obey the following equations of motion:

dxkdt

= vk,dvkdt

=q

mE(xk, t).

This system being hamiltonian, it should be solved using a symplectic time scheme in order toenjoy long time conservation properties. The scheme which is used most of the time is the Verletscheme, which is defined as follows. We assume xnk , vnk and En

k known.

vn+ 1

2k = vnk +

q∆t

2mEnk(xnk),(5)

xn+1k = xnk + ∆tv

n+ 12

k ,(6)

vn+1k = v

n+ 12

k +q∆t

2mEn+1k (xn+1

k ).(7)

We notice that step (7) needs the electric field at time tn+1. It can be computed after step (6) bysolving the Poisson equation which uses as input ρn+1

h that needs only xn+1k and not vn+1

k .Time loop. Let us now summarize the main stages to go from time tn to time tn+1:

(1) We compute the charge density ρh on the grid using relations (6).(2) We update the electrostatic field using a classical mesh based solver (finite differences,

finite elements, spectral, ....).(3) We compute the fields at the particle positions.(4) Particles are advanced using a numerical scheme for the characteristics for example Verlet

(5)-(7).

CHAPTER 2

Monte Carlo simulation

1. Principle of Monte Carlo method

The basic idea of Monte Carlo methods is to use probability calculations to compute integrals,assuming that a probability can be approximated by a large number of random events. A simpleexample of a Monte Carlo algorithm is given by the computation of π using the area of a quartercircle: Consider the quarter circle Q of radius one centred at zero embedded in the unit squareS = [0, 1]2. The the area of Q is |Q| = π

4 and the area of S is |S| = 1. Considering a uniformly

distributed sample in S, the ratio |Q||S| is the probability of an event being in Q. This can be

approximated by the rationQn , where n is the total number of draws and nQ the number of draws

in Q. Hence|Q||S|

=π

4≈nQn

which yields an approximation of π by 4nQn which is all the better that the number of draws n is

large.Computations of continuous probabilities are strongly related to the computation of integrals,

so that in practice one can recast the computation of integrals in the framework of probabilitiesand then use a large number of samples to approximate them. The purpose of this chapter is toformalise this and also give a way to estimate the error committed in this approximation.

2. Background in probability theory

As the most convenient framework for defining integrals is the Lebegues theory, which startsby defining measurable sets using σ−algebras, the good framework for abstract probability theoryalso needs these objects. However, after having defined them to make the connection with themathematical probability literature, we will only consider probabilities on Rn.

2.1. Probability spaces. Let us recall some standard definitions that can be found in anyprobability textbook.Let Ω be a nonempty set.

Definition 1. A σ-algebra is a collection F of subsets of Ω with the properties

(i) Ω ∈ F ,(ii) If A ∈ F then Ac := Ω\A ∈ F ,

(iii) If A1, A2, · · · ∈ F then⋃i

Ai ∈ F .

Note that axioms (i) and (ii) imply that ∅ ∈ F and axioms (ii) and (iii) imply that if A1, A2, · · · ∈F then

⋂i

Ai ∈ F , as (⋃i

Aci )c =

⋂i

Ai.

Definition 2. Let F be a σ-algebra of subsets of Ω. Then P : F → [0, 1] is called a probabilitymeasure provided:

(ii) For all A ∈ F we have 0 ≤ P (A) ≤ 1,(ii) P (Ω) = 1,

15

16 2. MONTE CARLO SIMULATION

(iii) If A1, A2, · · · ∈ F are disjoint then P (⋃i

Ai) =∑i

P (Ai).

It follows from (ii) and (iii) as Ω and ∅ are both in F and disjoint that P (∅) = 0. If follows from(i) and (iii) that if A ⊂ B then B is the disjoint union of A and B\A, so P (A) ≤ P (A)+P (B\A) =P (B).

End of lecture 2.

Definition 3. A triple (Ω,F , P ) is called probability space provided Ω is any set, F is a σ-algebra and P a probability measure on F .

Terminology. A set A ∈ F is called an event, points ω ∈ Ω are called sample points and P (A)is the probability of event A.

Let B denote the Borel subsets of Rn which is the smallest σ-algebra containing all the theopen subsets of Rn. In particular it contains all the product intervals (open, semi-open, or closed).

Example 1. Let Ω = ω1, ω2, . . . , ωN be a finite set, and suppose we are given N numbers

0 ≤ pi ≤ 1 for i = 1, . . . , N satisfying∑N

i=1 pi = 1. We take F to be all the possible subsets of Ω.Then for each A = ωi1 , . . . ωim ∈ F with 1 ≤ ωi1 < · · · < ωim ≤ N we define

P (A) := pi1 + pi2 + . . . pim .

Let us consider two concrete examples:1) Throwing once a dice can be analysed with the following probability space: Ω = 1, 2, 3, 4, 5, 6,

F consists of all the subsets of Ω and pi = 16 for i = 1, . . . , 6. An event is a subset of Ω, for example

A = 2, 5. The probability of a sample point to be in A is then P (A) = p2 + p5 = 13 .

2) Consider throwing a coin twice. Then the set Ω of all possible events is (H,H), (H,T ), (T,H), (T, T )where H stands for heads and T for tail, F consists of all the subsets of Ω and pi = 1

4 for i = 1, . . . , 4.A possible event A would be to throw heads at least once: A = (H,H), (H,T ), (T,H) and P (A) =34 , and other possible event B would be to throw tail the second time, then B = (H,T ), (T, T )and P (B) = 1

2 .Example 2. The Dirac mass. Let z ∈ Rn fixed and define for sets A ∈ B

P (A) :=

1 if z ∈ A,0 if z /∈ A.

We call P the Dirac mass at z and denote it by P = δz.Example 3. Assume f is a non negative integrable function such that

∫Rn f(x) dx = 1. We

define for sets A ∈ BP (A) :=

∫Af(x) dx.

We call f the density of the probability measure P .

2.2. Random variables. A probability space is an abstract construction. In order to defineobservables it is necessary to introduce mappings X from Ω to Rn.

Definition 4. Let (Ω,F , P ) be a probability space. A mapping

X : Ω→ Rn

is called a n-dimensional random variable if for each B ∈ B, we have

X−1(B) ∈ F .

In other words, X is n-dimensional random variable on the probability space if it is F-measurable.This definition enables to define probabilities of events related to X by inducing a probability

law on (Rn,B).

2. BACKGROUND IN PROBABILITY THEORY 17

Proposition 3. Let X be an n-dimensional random variable. Then PX : B → [0, 1] defined by

PX(B) = P (X−1(B))

is a probability law on (Rn,B).

Proof. For B ∈ B, the measurability of X implies that X−1(B) ∈ B. So the probabilityP (X−1(B)) is well defined and we just need to check the properties of a probability law, which isstraightforward.

Notation 1. The probability PX is often denoted conveniently PX(B) = P (X ∈ B).

2.3. Distribution function. Let X be a n-dimensional random variable on the probabilityspace (Ω,F , P ). Let us say that for two vectors x ≤ y if xi ≤ yi all the components of the vectors.

Definition 5. We call (cumulative) distribution function (CDF) of a random variable X thefunction FX : Rn → [0, 1] defined by

FX(x) = P (X ≤ x), for x ∈ Rn.

Definition 6. Assume X is a n-dimensional random variable and F = FX its distributionfunction. If there exists a non negative, integrable function f : Rn → R such that

F (x) = F (x1, . . . , xn) =

∫ x1

−∞. . .

∫ xn

−∞f(y1, . . . , yn) dy1 . . . dyn,

then f is called the (probability) density function (PDF) for X.

It follows, in this case, that all probabilities related to X can be expressed as integrals on Rnusing the density function:

PX(B) = P (X ∈ B) =

∫Bf(x) dx for all B ∈ B.

Probability measures for which a density exists are called absolutely continuous. We shall onlyconsider such probability measures in the sequel.

Note that if we work directly in the probability space (Rn,B, P ), we can take the randomvariable to be the identity and a probability density directly defines the probability.

Note also that a random variable X induces a σ-algebra FX on Ω, which is defined by FX =X−1(B)|B ∈ B. FX is the smallest σ-algebra which makes X measurable.

Examples:

(1) The uniform distribution on interval [a, b] is given by the PDF

f(x) =

1b−a if x ∈ [a, b],

0 else.

The associated distribution function is

F (x) =

∫ x

−∞f(x) dx =

0 if x < a,x−ab−a if x ∈ [a, b],

1 if x > b.

(2) The normal or gaussian distribution is defined by a PDF of the form

f(x) =1

σ√

2πe−

(x−µ)2

2σ2 .


2.4. Expected value, variance. The integration on a probability space is similar to thedefinition of the Lebesgues integral. One starts by defining the integral for simple functions of theform X =

∑i aiχAi , where χAi is the characteristic functions of the set Ai ∈ F , i.e. χAi(ω) = 1 if

ω ∈ Ai and 0 else. Then we define

E(X) :=

∫X dP :=

∑i

aiP (Ai).

Then, because any measurable functions is the limit of a sequence of simple functions, this definitioncan then be extended by taking limits to any random variable (which is a F-measurable function).For vector valued random variables, the integration is performed component by component.

Definition 7. A random variable X is said integrable with respect to the probability measureP , if E(|X|) < +∞. Then the value E(X) :=

∫X dP is called expected value (or expectation, or

mean value) of the random variable X.If E(|X|2) < +∞, the value

V(X) = E(|X− E(X)|2) =

∫Ω|X− E(X)|2 dP ≥ 0

is called variance of the random variable X, and

σ(X) =√V(X)

is called standard deviation of the random variable X.

The variance can be also expressed by V(X) = E(|X|2)− E(X)2. Indeed

V(X) =

∫Ω|X− E(X)|2 dP =

∫Ω

(|X|2 − 2X · E(X) + |E(X)|2) dP = E(|X|2)− E(X)2.

If the probability measure is absolutely continuous its density provides a convenient way forevaluation of expectations using the so-called transfer theorem.

Theorem 3 (Transfer theorem). Let g be a measurable function of Rn and X an n-dimensionalrandom variable. Then, if f is the density of the law of X

E(g(X)) =

∫Ωg(X) dP =

∫Rng(x) dPX(x) =

∫Rng(x)f(x) dx.

Formally dPX(x) = f(x) dx. If f depends on x the probability measure PX is not translationinvariant. This is highlighted by the notation dPX(x).

Proof. Let us check the formula for positive simple random variables. The general case isthen obtained using the appropriate limit theorems.

So let g =∑n

i=1 aiχAi be a positive simple function. Then

g(X(ω)) =

n∑i=1

aiχAi(X(ω)) =n∑i=1

aiχX−1(Ai)(ω).

Hence

E(g(X)) =

n∑i=1

aiP (X−1(Ai)) =

n∑i=1

aiPX(Ai) =

∫Rng(x) dPX(x).

The last definition is just the definition of the integral for simple functions. Moreover, if PX hasdensity f , then by definition of the density

n∑i=1

aiPX(Ai) =n∑i=1

ai

∫Ai

f(x) dx =

n∑i=1

ai

∫RnχAi(x)f(x) dx =

∫Rng(x)f(x) dx.

2. BACKGROUND IN PROBABILITY THEORY 19

The formula given by the transfer theorem will be used for actual computations. In particularfor the variance

V(X) =

∫Ω|X− E(X)|2 dP =

∫Rn

(x− E(X))2f(x) dx.

The variance can help quantify the deviation of a random variable X from its mean:

Proposition 4 (Chebyshev inequality). Assume E(X2) < +∞. Then for any ε > 0

P(|X− E(X)| ≥ ε

)≤ V(X)

ε2.

Proof. Denote by A = |X− E(X)| ≥ ε. Then

V(X) =

∫Ω|X− E(X)|2 dP ≥

∫A|X− E(X)|2 dP ≥

∫Aε2 dP = ε2P (A)

which gives the result.

2.5. Conditional probabilities and independence. Let (Ω,F , P ) be a probability spaceand let A and B be two events.

Knowing that some random sample point ω ∈ Ω is in A we are interested in obtaining theprobability that ω ∈ B. This defines the conditional probability:

Definition 8. Let A an event of probability P (A) > 0. Then the probability of B given A isdefined by

P (B|A) =P (A ∩B)

P (A).

Let us verify that P (·|A) defines a probability measure on (Ω,F): As A ∩B ⊂ A, P (A ∩B) ≤P (A). Hence 0 ≤ P (B|A) ≤ 1 and axiom (i) is verified. Ω ∩ A = A hence P (Ω|A) = 1 and axiom(ii) is verified. If B1, B2, . . . are disjoint, so are their intersections with A and axiom (iii) follows.

Two events A,B are independent if P (B|A) = P (B). Then by definition of the conditional

probability P (B) = P (A∩B)P (A) and we get the more symmetric definition of the independence of A

and B.

Definition 9. Two events A and B are said to be independent if

P (A ∩B) = P (A)P (B).

This definition extends to random variables:

Definition 10. We say that the random variables Xi : Ω→ Rn, i = 1, . . . ,m are independent,if for all choices of Borel sets B1, . . . , Bm ⊆ Rn

P (X1 ∈ B1, . . . ,Xm ∈ Bm) = P (X1 ∈ B1) · · ·P (Xm ∈ Bm).

Theorem 4. The random variables Xi : Ω→ Rn, i = 1, . . . ,m are independent, if and only iftheir distribution functions verify

FX1,...,Xm(x1, . . . ,xm) = FX1(x1) · · ·FXm(xm) for all x1, . . . ,xm ∈ Rn.If the random variables have densities this is equivalent to

fX1,...,Xm(x1, . . . ,xm) = fX1(x1) · · · fXm(xm) for all x1, . . . ,xm ∈ Rn.

The marginal densities fXi are obtained from the joined density fX1,...,Xm by integrating on Rnover all the other variables, for example

fX1(x1) =

∫fX1,...,Xm(x1, . . . ,xm) dx2 . . . dxm.

From this theorem follows the following important result:


Theorem 5. If X1, . . . , Xm are independent real valued random variables with E(|Xi|) < +∞then E(|X1 . . . Xm|) < +∞ and

E(X1 · · ·Xm) = E(X1) · · ·E(Xm).

Proof. The results is easy to prove by applying the previous theorem assuming that each Xi

is bounded and has a density:

E(X1 · · ·Xm) =

∫Rm

x1 · · ·xmfX1,...,Xm(x1, . . . , xm) dx1 . . . dxm,

=

∫Rm

x1fX1(x1) · · ·xmfXm(xm) dx1 . . . dxm,

= E(X1) · · ·E(Xm).

End of lecture 3.Moreover for independent variables the variance of the sum is the sum of variances. This is

known as Bienayme’s equality:

Theorem 6 (Bienayme). If X1, . . . , Xm are independent real valued random variables withV(|Xi|) < +∞ then

V(X1 + · · ·+Xm) = V(X1) + · · ·+ V(Xm).

Proof. This can be proved by induction. We prove it only the case of two random variables.Let m1 = E(X1), m2 = E(X2). Then by linearity of the integral m1 +m2 = E(X1 +X2) and

V(X1 +X2) =

∫Ω

(X1 +X2 − (m1 +m2))2 dP,

=

∫Ω

(X1 −m1)2 dP +

∫Ω

(X2 −m2)2 dP + 2

∫Ω

(X1 −m1)(X2 −m2) dP,

= V(X1) + V(X2) + 2E(X1 −m1)E(X2 −m2)

using the independence of the random variables and the previous theorem in the last line. We thenget the desired result by noticing that E(X1 −m1) = E(X2 −m2) = 0.

Definition 11. Let X and Y be two square integrable real valued random variables, then theircovariance is defined by

Cov(X,Y ) = E((X − E(X))(Y − E(Y ))).

By linearity of the expected value, we easily get

Cov(X,Y ) = E(XY )− E(X)E(Y ).

Looking at the proof of the Bienayme equality we see that in the general case we have

V(X + Y ) = V(X) + V(Y ) + 2Cov(X,Y ),

the last term vanishing if the two random variables are independent. A more precise measure ofthe linear independence of two random variables is given by the correlation coefficient defined by

ρ(X,Y ) =Cov(X,Y )

σ(X)σ(Y ).

3. MONTE CARLO SIMULATION 21

3. Monte Carlo Simulation

3.1. Principle. We want to define a Monte Carlo algorithm to approximate some real numbera which represents for example the value of an integral. To this aim, we need to construct a realvalued random variable X such that

E(X) = a.

Then we define an approximation by considering a sequence of independent random variables (Xi)idistributed like X and approximate E(X) by the sample mean

(8) MN =1

N

N∑i=1

Xi.

In order for this procedure to be useful, we need first to be able to recast our problem in theform of the computation of an expected value of an adequate random variable X that we need todefine. Then we need to be able to draw independent variables distributed like X and finally weneed to check that the approximation we defined converges in some sense to the exact value andpossibly estimate the speed of convergence.

Here the sample mean is an example of what is called an estimator in statistics, which is arule for computing some statistical quantity, which is a function of the random variable, here theexpected value, from sample data.

Definition 12. The difference between the expected value of the estimator and the statisticalquantity it approximates is called bias. If this difference is zero, the estimator is said to be unbiased.

Let us compute the bias of the sample mean given by (8), we easily get as the Xi are alldistributed like X and thus have the same expected value that

E(MN ) =1

N

N∑i=1

E(Xi) = E(X)

so that the bias is zero and our sample mean is unbiased.Assuming the sample number N ≥ 2 an unbiased estimator of the variance is given by the

following sample variance

(9) VN =1

N − 1

N∑i=1

(Xi −MN )2 =1

N − 1

N∑i=1

(Xi −

1

N

N∑i=1

Xi

)2

.

Indeed, let us compute the expected value of VN . Denoting by a = E(Xi) for i = 1, . . . , N , we have

VN =1

N − 1

N∑i=1

((Xi − a) + (a−MN ))2 =1

N − 1

N∑i=1

(Xi − a)2 − N

N − 1(MN − a)2,

as 2∑N

i=1(Xi − a)(a−MN ) = −2N(MN − a)2 . Hence

E(VN ) =1

N − 1

N∑i=1

E((Xi − a)2)− N

N − 1E((MN − a)2) =

1

N − 1

N∑i=1

V(Xi)−N

N − 1V(MN ).

And because of Bienayme’s theorem

N2V(MN ) = V(

N∑i=1

Xi) =

N∑i=1

V(Xi) = NV(X).

Hence

E(VN ) =N

N − 1V(X)− 1

N − 1V(X) = V(X).


Remark 2. Note the 1/(N − 1) factor in the variance estimator instead of the 1/N that onewould expect at the first glance. Using 1/N instead would also yield an estimator of the variance,but this one would be biased, i.e. it would not have the right expected value.

End of lecture 4.

3.2. Estimation of the error in a Monte Carlo simulation. Let us first compute in ageneral way the mean square error (MSE) of an estimator. The MSE is defined by

MSE(θ) = E((θ − θ)2) =

∫(θ − θ)2dP.

Note that the root mean square error or RMS error, which is the square root of the MSE, is theclassical L2 error.

Assume θ is an estimator for the statistical quantity θ which is a real number that can becomputed as a function of a random variable X.

Lemma 1. Assume the random variable θ is an estimator for θ and E(θ2) < +∞. Then

(10) MSE(θ) = E((θ − θ)2) = V(θ) +Bias(θ)2.

Proof. A straightforward calculation yields

MSE(θ) = E((θ − θ)2) = E(θ2) + θ2 − 2θE(θ)

= E(θ2)− E(θ)2 + E(θ)2 + θ2 − 2θE(θ)

= (E(θ2)− E(θ)2) + (E(θ)− θ)2

= V(θ) + (Bias(θ))2.

Assume that the random variable X defining our Monte Carlo simulation verifies E(X2) < +∞.Then we can apply the previous lemma to MN as an estimator of E(X), which yields

MSE(MN ) = V(MN ) + (E(MN )− E(X))2.

So the RMS error is composed of two parts, the error coming from the variance of the sampleand the possible bias on the sample occurring when the expected value of MN is not exactly equalto the expected value of the random variable X being approximated.

In many cases the bias can be made to be zero, but in some cases it can be useful to introducesome bias in order to decrease the variance of the sample and the total error.

Lemma 2. Assume E(X2) < +∞. Then the RMS error for an unbiased simulation based onthe random variable X is

erms = σ(MN ) =σ(X)√N.

Proof. The formula (10) giving the mean squared error of an estimator shows that if thesimulation is unbiased E(MN ) = E(X) and

erms =√V(MN ) = σ(MN ).

Now using Bienayme’s theorem we also have

N2V(MN ) = V(

N∑i=1

Xi) =

N∑i=1

V(Xi) = NV(X).

And thus V(MN ) = V(X)/N , which gives the result.


On the other hand, Chebyshev’s inequality gives us, assuming E(X2) < +∞ that for any ε > 0we have, as E(X) = E(MN )

P(|MN − E(X)| ≥ ε

)≤ V(MN )

ε2=σ2(X)

Nε2.

Hence when N → +∞, we have that

P(|MN − E(X)| ≥ ε

)→ 0.

This means that MN converges to E(X) in probability. This is called the weak law of large numbers.The corresponding strong law of large numbers, the proof of which is more involved, states thatMN converges to E(X) almost surely, which means that

P

[ω| lim

N→+∞MN (ω) = E(X)

]= 1.

The law of large numbers, strong or weak, implies that the sample mean converges towards thedesired expected value, which justifies the Monte Carlo method.

Another major theorem of probability theory, the central limit theorem, gives a precise estima-tion of the error committed by an approximation. It claims that

(11) limN→+∞

P

[MN − E(X)

σ(X)/√N≤ λ

]=

1√2π

∫ λ

−λe−u

2/2 du.

This tells us that the asymptotic distribution of MN−E(X)

σ(X)/√N

is a unit normal distribution, or equiva-

lently that MN is a normal distribution with mean E(X) and standard deviation σ(X)/√N .

The right hand side of (11) is a number that can be computed explicitly, and that is calledconfidence coefficient. For λ = 3 the confidence coefficient is 0.9973 and for λ = 4 the confidencecoefficient is 0.9999 (see e.g. [5] for other values). This is the probability that the true mean lies

in the so-called confidence interval [MN − λσ(X)/√N,MN + λσ(X)/

√N ]. Note that as opposite

to deterministic error estimates, which are generally of the form hp or 1/Np, where h is a cell sizeand N a number of discretisation points, and lie on a deterministic curve. The error estimate ina Monte Carlo method is random, but it is alway a normal distribution with variance which tendsto 0 when the number of sample points tends to +∞. In practice a good estimate of the error isgiven by σ(X)/

√N , which is all the more interesting that the variance (or standard deviation) can

be well estimated by the sample variance (or sample standard deviation), which is an a posterioriestimate that can be directly used in actual computations to measure the error.

3.3. Error monitoring in PIC codes. In order to check the validity of simulation, it is im-portant to monitor the evolution of some key quantities. In particular quantities that are conservedin the continuous model should be computed and the accuracy with which they are conserved willgive a good indicator of the accuracy of the code: For the Vlasov Poisson system, key conservedquantities are total number of particles N = 1 =

∫f dx dv, total momentum P =

∫fv dx dv and

total energy E = 12

∫fv2 dx dv + 1

2

∫ρφ dx.

In our Monte Carlo approximation, assuming the particles are distributed according to the PDFf , we have

N = E(1), P = E(V ), E = E(1

2(V 2 + φ(X))).

E(1) = 1NN = 1 is conserved by construction, so there is nothing to monitor. For the others we can

compute for a given initial condition the error due to sampling and for subsequent time steps thesample mean and sample standard deviation divided by

√N can be monitored to give a measure

of the error. This can of coarse be compared to the error given by the actual sample with respectto the conserved value known from the initial condition.


Example. Consider a Landau damping initial condition, on a 1-periodic interval in x:

f0 = (1 + α cos(kx))1√2πe

−v22 , (α < 1).

The random variable (X,V ) is then randomly drawn according to this distribution. We can thencompute

E(V ) =

∫vf0(x, v) dx dv = 0, V(V ) =

∫v2f0(x, v) dx dv − 0 = 1.

So the RMS error committed by approximating P by the unbiased estimator PN = 1N

∑Ni=1 Vi will

be 1/√N .

Let us now consider the total energy. First for the kinetic energy, we need to compute

E(V 2) =

∫v2f0(x, v) dx dv = 1, V(V 2) =

∫v4f0(x, v) dx dv − E(V 2)2 = 3− 1 = 2.

On the other hand he potential φ associated to the initial condition is solution of φ′′ = α cos(kx).Assuming 0 average, we get φ(x) = − α

k2cos(kx). We then can compute

E(φ(X)) = − αk2

∫cos(kx)f0(x, v) dx dv = − α2

2k2,

V(φ(X)) =α2

k4

∫cos2(kx)f0(x, v) dx dv − α4

4k4=α2(2− α2)

4k4.

It follows that the total energy of the initial condition is

E = E(1

2(V 2 + φ(X))) =

1

2− α2

4k2,

and as the random variable V and φ(X) are independent, the variance of the total energy is thesum of the variances of the kinetic energy and the potential energy.

A natural estimator for the energy based on the sample (Xi, Vi)1≤i≤N , distributed like (X,V ),used for the Monte Carlo simulation is here

EN =1

N

N∑i=1

1

2(V 2i + φ(Xi)),

from which it easily follows that E(EN ) = E(E) so that the estimator is unbiased. Moreover we cancompute the variance of the estimator using Bienayme’s equality

V(EN ) = V(1

2(V 2 + φ(X))/N =

1

4(V(V 2) + V(φ(X))))/N =

1

4N

(2 +

α2(2− α2)

4k4

).

which is also the MSE error of the estimator as the simulation is unbiased.After the initial time step, the exact distribution is not known, so that only empirical estimations

can be computed. In order to monitor the noise (or error) on each computed quantity θ(X), wedefine the relative error

R =σ(θN )

θN,

which is the inverse ratio of the estimated value and its standard deviation. We have

R =1

N − 1

√θ2 − θ2

θ,

where ¯θ(X) = 1N

∑Ni=1 θ(Xi)


3.4. Error on the probability density. A standard way of estimating a probability densityin Rd from a sample is the kernel density estimator. It relies on a kernel which we shall call Sd,which is a real function in Rd, that we shall assume to be the product of d identical functions:Sd(x1, x2, . . . , xd) = S(x1)S(x2) . . . S(xd) verifying

∫S(x) dx = 1 and S(x) = S(−x) which implies∫

xS(x) dx = 0.

We then define for a given density f : Rd → R

fh(x1, . . . , xd) =1

hd

∫S

(x1 − y1

h

). . . S

(xd − ydh

)f(y1, y2, . . . , yd) dy1 . . . dyd,

=1

hdE(S

(x1 − Y1

h

). . . S

(xd − Yd

h

))where (Y1, . . . , Yd) are distributed according to the density f . From this we can define the followingestimator for f(x1, . . . , xd), we also use the fact that S is even:

fh,N (x1, . . . , xd) =1

Nhd

N∑i=1

S

(Y1,i − x1

h

). . . S

(Yd,i − xd

h

).

As usual to estimate the mean squared error committed by this estimator, we compute its bias andvariance

Bias(fh,N (x1, . . . , xd)) = E(fh,N (x1, . . . , xd))− f(x1, . . . , xd)

=1

hdE(S

(Y1 − x1

h

). . . S

(Yd − xd

h

))− f(x1, . . . , xd)

=1

hd

∫ (S

(y1 − x1

h

). . . S

(yd − xdh

))f(y1, . . . , yd) dy1 . . . dyd − f(x1, . . . , xd)

=

∫S(z1) . . . S(zd)f(x1 + hz1, . . . , xd + hzd) dz1 . . . dzd − f(x1, . . . , xd),

making the change of variables z1 = y1−x1h , . . . , zd = yd−xd

h . Finally as∫S(z) dz = 1, and Taylor

expanding f assuming enough smoothness we get

Bias(fh,N (x1, . . . , xd)) =

∫S(z1) . . . S(zd)(f(x1 + hz1, . . . , xd + hzd)− f(x1, . . . , xd)) dz1 . . . dzd

=

∫S(z1) . . . S(zd)h(z1

∂f

∂x1(x1, . . . , xd) + . . . zd

∂f

∂xd(x1, . . . , xd)

+h2

2zTH(f)z +O(h3)) dz1 . . . dzd,

where H(f) = ( ∂2f∂xi∂xj

)1≤i,j≤d is the Hessian matrix of f and z = (z1, . . . , zd)T . Because of the

symmetry of S, the terms in h as well as the off-diagonal second order terms and the third orderterms vanish. Hence the bias can be written

(12) Bias(fh,N (x1, . . . , xd)) =h2

2κ2(S)∆f(x1, . . . , xd) +O(h4),

where κ2(S) =∫x2S(x) dx is the second order moment of the kernel S and ∆f = ∂2f

∂x21+ · · ·+ ∂2f

∂x2dthe Laplace operator. We note that the bias depends only on h the width of the kernel, and noton the number of particles N . It goes to zero when h goes to 0.

Let us now compute the variance of the estimator. With Bienayme’s equality we get

V(fh,N (x1, . . . , xd)2) =

1

NV(

1

hdS

(Y1 − x1

h

). . . S

(Yd − xd

h

))


Then with the same change of variables as for the bias,

E(

1

h2dS2

(Y1 − x1

h

). . . S2

(Yd − xd

h

))=

1

hd

∫S2(z1) . . . S2(zd)f(x1 + hz1, . . . , xd + hzd) dz1 . . . dzd

=1

hd

∫S2(z1) . . . S2(zd) dz1 . . . dzd(f(x1, . . . , xd) +O(h))

=1

hdR(S)d(f(x1, . . . , xd) +O(h))

where the R(S) =∫S2(x) dx is called the roughness of the kernel S. On the other hand, using the

previous computation of the bias and the fact that∫S(x) dx = 1, we have

E(1

hdS

(Y1 − x1

h

). . . S

(Yd − xd

h

))2 = (f(x1, . . . , xd) +O(h))2.

When h→ 0 this term can be neglected compared to the other contribution to the variance. Hence

(13) V(fh,N (x1, . . . , xd)) =R(S)d

Nhdf(x1, . . . , xd) +O(

1

N).

And finally, the mean squared error of the estimator is the sum of its variance and squared bias,which yields

(14) MSE(fh,N (x1, . . . , xd)) =R(S)d

Nhdf(x1, . . . , xd)+

h4

4κ2

2(S)(∆f)2(x1, . . . , xd)+O(1

N)+O(h6).

Note that for the MSE to converge, one needs obviously the number of samples N → +∞,h → 0, but also Nhd → +∞ for the first term to tend to 0. As hd is a measure of the cell sizein a d-dimensional space, this means that the number of particles per cell needs to converge to+∞. In general in PIC methods, one is not really interested in the convergence of the distributionfunction, but it is essential to have a good convergence of the density in physical space. For thisreason, one generally imposes the number of particles per cell in physical space to be large enough,and all the larger that the cells become smaller. Keeping the number of particles per cell constantwhen h decreases does not yield convergence of the method.

To get a unique parameter yielding an order of convergence, one can minimise the dominatingterms of MSE(fh,N (x1, . . . , xd)) with respect to h, yielding an expression of h in function of N .

Standard kernels in statistics beyond the top hat kernel, are the Gaussian kernel and Epanech-nikov type kernels of the form S(x) = cs(1−x2)s for |x| < 1 and 0 else, where cs is a normalisationconstant insuring that

∫S(x) dx = 1. s is a small integer, typically 1,2 or 3 giving the smoothness

of the kernel.In PIC codes S is generally chosen to be a spline function. A spline function of degree m is

a piecewise polynomial of degree m and which is in Cm−1. It can be defined by recurrence: Thedegree 0 B-spline that we shall denote by S0 is defined by

S0(x) =

1 if − 1

2 ≤ x <12 ,

0 else.

Higher order B-splines are then defined by:For all m ∈ N∗,

Sm(x) = (S0)∗m(x),

= S0 ∗ Sm−1(x),

=

∫ x+ 12

x− 12

Sm−1(y) dy.


In particular the degree 1 spline is

S1(x) =

(1− |x|) if |x| < 1,0 else,

the degree 2 spline is

S2(x) =

12(3

2 − |x|)2 if 1

2 < |x| <32 ,

34 − x

2 if |x| < 12 ,

0 else,

the degree 3 spline is

S3(x) =1

6

(2− |x|)3 if 1 ≤ |x| < 2,4− 6x2 + 3|x|3 if 0 ≤ |x| < 1,0 else.

3.5. Aliasing. In PIC codes, where the evolution of the density function given by the Vlasovequation, needs to be coupled with the computation of the electric field on a grid, aliasing whichis inherent to sampling on a grid plays an important role in the choice of the kernel.

Theorem 7 (Shannon). If support of f is included in[−πh ,

πh

], then

f(t) =

+∞∑k=−∞

f(kh) sinc (π(t− kh)

h),

where sinc t = sin tt is called the sinus cardinal function.

This means that f is completely determined by sampling with uniform step h if it has boundedsupport in

[−πh ,

πh

]. However the support of an arbitrary function is generally not in [−π

h ,πh ]. If

the support is bounded, it is enough to take h small enough. If the support is not bounded butf tends to 0 fast enough at infinity one also gets a good approximation if h is small enough. Thequestion is what happens when h is not small enough to get a good approximation of f in [−π

h ,πh ].

In the case when supp(f) 6⊂ [−πh ,

πh ], in the formula giving the Fourier transform of a sampled

function

fh(ω) =1

h

+∞∑n=−∞

f(ω − 2nπ

h).

the supports of f(ω− 2nπh ) of different n will have a non empty intersection. In particular f(ω− 2nπ

h )intersects [−π

h ,πh ] for |n| ≥ 1. Which means that high frequencies will appear in a low frequency

interval. This is called ’aliasing.In this case with the reconstruction formula of Shannon’s theorem

f(t) = (gh ? fh)(t) =

+∞∑k=−∞

f(kh)gh(t− kh),

whose Fourier is

ˆf(ω) = fh(ω)gh(ω) = hfh(ω)χ[−π

h,πh

] = χ[−πh,πh

]

+∞∑k=−∞

f(ω − 2nπ

h)

which can be very different of f(ω) because of the high frequency contributions.

To suppress aliasing, f needs to be approximated by f which is the closest function in L2 whoseFourier transform is in [−π

h ,πh ].


Due to Plancherel’s formula

‖f − f‖22 =1

2π

∫ +∞

−∞|f(ω)− ˆ

f |2 dω

=1

2π

∫|ω|>π

h

|f(ω)|2 dω +1

2π

∫|ω|<π

h

|f(ω)− ˆf |2 dω.

The distance between the two functions is minimal when the second integral vanishes. We hence

takeˆf to be the restriction of f to [−π

h ,πh ], which writes

ˆf(ω) = f(ω)χ[−π

h,πh

](ω) =1

hf(ω)gh(ω),

with gh(t) = sinc πth . It then follows f = 1

hf ? gh.Using the sinc would thus suppress all aliasing problems. However it has the problem that its

support in physical space is unbounded, which means that all particles would contribute to all gridpoints, which is very time consuming in practice. Such an algorithm can only be used in practice,when working directly in Fourier space and only a few Fourier modes are needed.

On the other hand, the Fourier transforms of the B-splines are

Sm(k) = sincm+1(k

2),

which means that Sm decays like 1/km+1 in Fourier space, which is quite fast for quadratic or cubicsplines, thus limiting the aliasing problems.

4. Initialisation of given PDF

A Monte Carlo simulation relies on a random sequence following some given probability law.Such a sequence can be generated from a uniform random sequence on [0, 1]. Obtaining a goodapproximations of uniform random sequence is a complex task, but some good solutions are givenby libraries included with standard compilers or numerical software. We will rely on those. Letus just mention that a computer cannot generate a truly random sequence, but generates twokind of random sequences: the first one called pseudo-random has the objective to provide goodapproximations of truly random sequences and the other one called quasi-random is designed to fillin the interval as uniformly as possible, yielding a smaller variance.

Then having a good random generator for a uniform random sequence in [0, 1], there are differentways to draw values for any other probability density function. Some are specific to a given form ofPDF like normal distributions, other are limited to some class of PDF like products of 1D functionsand others are very general. A large number of techniques is described in the book [5]. We willdescribe the techniques that are the most useful for PIC simulations.

4.1. Inversion of the CDF. Let F be the cumulative distribution function (CDF) of therandom variable we wish to simulate.

Proposition 5. Assume F : [a, b] → [0, 1] is a strictly increasing function. Let U be a uni-formly distributed random variable on [0, 1], then X = F−1(U) is a real value random variable withdistribution function F .

Proof. Let x ∈ [a, b]. Then F−1(U) ≤ x⇔ U ≤ F (x).The distribution function of X is defined by

FX(x) = P (X ≤ x) = P (U ≤ F (x)) = F (x)

as U has a uniform distribution.

4. INITIALISATION OF GIVEN PDF 29

In many cases F can be inverted analytically and when F (x) can be computed, it can beinverted numerically using a fine grid and the assumption that F grows linearly between two gridpoints.

Examples:

(1) Uniform distribution on [a, b]The uniform distribution on [a, b] has the distribution function F (x) = x−a

b−a , and to

get its inverse we solve the equation y = F (x) = x−ab−a . The solution is

x = a+ (b− a)y = F−1(y).

(2) Numerical inversion of an analytically know distribution function F .This amounts for a given point y which is obtained from a uniform distribution in [0, 1]

to compute x such that F (x) = y, which means solving y − F (x) = 0. The most efficientway, in general, to do this numerically is Newton’s method which computes x as the limitof the iterations

xn+1 = xn −y − F (x)

−F ′(x).

(3) Numerical inversion of a function known at discrete grid points.We assume the values of F are known on a grid a = x0 < x1 < · · · < xNx = b. Because

an approximation is involved in interpolating the values between the grid points, ratherthan computing directly the inverse for each value y given by the uniform random gener-ator, we start by computing F−1(yj) where 0 = y0 < y1 < · · · < yNy = 1 is a uniform gridof [0, 1]. This can be done very easily as F is an increasing function using the followingalgorithm:

• F−1(0) = a, F−1(1) = a, i=0• For j = 1, . . . , Ny − 1

– Find i such that F (xi−1) < yi ≤ F (xi) (while (F (xi) < yj) do i = i+ 1)– Interpolate F−1(yj) linearly (in order to maintain that F−1 is non decreasing

between F (xi−1) and F (xi)).Once F−1(yj) is known on the grid 0 = y0 < y1 < · · · < yNy = 1, for any y drawn

uniformly on [0, 1], find j such that yj ≤ y < yj+1 and interpolate linearly F−1(y).

Remark 3. This method can also be used when F is analytically known by first com-puting its values on a fine grid. This is generally more efficient that Newton’s method andmost of the time accurate enough.

4.2. Acceptance-rejection method. This also sometimes simply called the rejection method.Assume, we want to draw according to the PDF f and we know how to draw from the PDF g

with f(x) ≤ cg(x) for some given constant c. If the support of f vanishes outside of a compact setF we can take for example g uniform in K and c = max(f/g).

The the rejection algorithm is the following

(1) Draw x from g(2) Draw a uniform random number on [0, 1] u(3) If u ≤ f(x)/(cg(x)), accept x,(4) else reject x and start again from (1).

The rejection method is very general, but in order to be efficient the number of rejections shouldbe held as small as possible and cg chosen as close as possible to f , with the constraint of coursethat one needs to be able to draw from g.


4.3. Composition method. This method is also known as the probability mixing methodand can be used when the PDF that ones wants to sample from is the sum of two simpler PDF.Given two PDF f1, f2 that we know how to sample from, and

f(x) = αf1(x) + (1− α)f2(x), with α < 1.

A value x can be sampled from f by the following procedure

(1) Select a random number ri from a uniform distribution on [0, 1],(2) If ri < A1 draw xi according to the PDF f1,(3) Else draw xi according to the PDF f2.

This can be extended to the weighted sum of an arbitrary number of probability density functions.If

f(x) = α1f1(x) + · · ·+ αnfn(x), with α1 + · · ·+ αn = 1.

One can then draw from f by drawing a random number r from a uniform distribution on [0, 1]and then drawing from fi if α0 + · · ·+ αi−1 < r < α0 + · · ·+ αi, 1 ≤ i ≤ n, denoting by α0 = 0.

5. Variance reduction techniques

As we saw the Monte Carlo error for the approximation of the expected value of a randomvariable X is in σ(X)/

√N . Apart from increasing the number of realisations N , the most efficient

method to reduce the error is to use available information to replace X by an other random variablewith the same expected value but a lower variance. We shall describe a few techniques to do thatin the context of Particle in Cell methods.

5.1. Control variates. Consider the standard Monte Carlo problem of approximating a =E(X), for a given random variable X, by a sample mean.

Assume now that there exists a random variable Y the expected value of which is known, thatis somehow correlated to X. For a given α ∈ R, let us define the new random variable

Zα = X − α(Y − E(Y )).

Obviously, we have for any α that E(Zα) = E(X) = a, which means that the sample mean of Zα

MN,α =1

N

N∑i=1

(Xi − α(Yi − E(Y ))) = αE(Y ) +1

N

N∑i=1

(Xi − αYi)

could be used instead of the sample mean of X to approximate a. The random variable αY iscalled a control variate for X.

End of lecture 5.Let us know look under what conditions the variance of Zα is lower than the variance of X.

We assume that both V(X) > 0 and V(Y ) > 0.

Lemma 3. If the random variables X and Y are not independent, there exists a value of α forwhich the variance of Zα is smaller than the variance of X. More precisely

minα∈R

V(Zα) = V(X)(1− ρ2(X,Y )) = V(Zα∗), with α∗ =Cov(X,Y )

V(Y ).

Moreover

V(Zα) < V(X)⇔∣∣∣∣ α < 2α∗ if α > 0,α > 2α∗ if α < 0.

5. VARIANCE REDUCTION TECHNIQUES 31

Proof. As Zα = X − αY + αE(Y ), and E(Zα) = E(X) we have

V(Zα) = E(Z2α)− E(X)2,

= E((X − αY )2) + 2αE(Y )E(X − αY ) + α2E(Y )2 − E(X)2,

= E(X2)− 2αE(XY ) + α2E(Y 2) + 2αE(Y )E(X)− 2α2E(Y )2 + α2E(Y )2 − E(X)2,

= V(X)− 2αCov(X,Y ) + α2V(Y ),

= σ2(X)− 2ασ(X)σ(Y )ρ(X,Y ) + α2σ2(Y ),

introducing the standard deviation of a random variable σ2(X) = V(X) and the correlation coeffi-cient of two random variables ρ(X,Y ) = Cov(X,Y )/(σ(X)σ(Y )).

So the variance of Zα is a second order polynomial in α the minimum of which is reached for

α∗ =σ(X)

σ(Y )ρ(X,Y ) =

Cov(X,Y )

σ2(Y ),

and plugging this into the expression of V(Zα), we get

V(Zα∗) = σ2(X)− 2σ(X)2ρ(X,Y )2 + σ2(X)ρ(X,Y )2 = V(X)(1− ρ2(X,Y )).

On the other hand

V(Zα)− V(X) = ασ(Y )(ασ(Y )− 2σ(X)ρ(X,Y )).

Hence for α > 0,

V(Zα) < V(X)⇔ α < 2σ(X)

σ(Y )ρ(X,Y ) = 2α∗,

and for α < 0, V(Zα) < V(X)⇔ α > 2α∗.

Remark 4. This results means that provided Cov(X,Y ) 6= 0, i.e. X and Y are not independent,there is always an interval around the optimal value α∗ for which Zα has a lower variance thanX. The more correlated X and Y are, the larger this interval is. So the most important is to finda random variable Y the expectation of which is known, that is as correlated with X as possible.Then if a good approximation of Cov(X,Y ) can be computed, one can use this to get closer to α∗

and minimise the variance as much as possible with the random variable Y .

A typical example is when X = Y + εY , where ε is small and E(Y ) is known and for simplicity

Y and Y are independent. Plugging this in the expression of V(Zα) in the above proof yields

V(Zα) = V(Y ) + ε2V(Y )− 2αV(Y ) + α2V(Y ) = (1− α)2V(Y ) + ε2V(Y ).

So that taking α = 1 yields that V(Zα) is of order ε2 assuming V(Y ) of order 1. This is typicallythe form that is used in PIC simulations.

5.2. Importance sampling. We are interested in computing, for some given probability den-sity f , quantities of the form ∫

ψ(z)f(z) dz.

The standard Monte Carlo method for doing this is to define our integral as an expected valueusing a random variable Z of density f . Then∫

ψ(z)f(z) dz = E(ψ(Z)).

Depending on the function ψ it might not be the best approach to use directly the density f fordrawing the random variable used in the simulation. Indeed if g is any other probability density


that does not vanish in the support of f one can express our integral as an expectation using arandom variable Z of density g:∫

ψ(z)f(z) dz =

∫ψ(z)

f(z)

g(z)g(z) dz = E(W (Z)ψ(Z)),

where the random variable W (Z) = f(Z)/g(Z) is called weight.The Monte Carlo approximation using independent random variables distributed identically

with density g can be expressed as

MN =1

N

N∑i=1

W (Zi)ψ(Zi),

from which we get

E(MN ) = E(W (Z)ψ(Z)) =

∫ψ(z)f(z) dz.

So MN is another unbiased estimator of the integral we wish to compute and the approximationerror for a given number of samples N is determined by its variance.

Let us now investigate how g can be chosen to get a smaller variance. For this we need tocompare the variance of W (Z)ψ(Z) and the variance of ψ(Z) knowing that both have the sameexpected value.

E(W (Z)2ψ(Z)2) =

∫ψ(z)2W (z)2g(z) dz =

∫ψ(z)2W (z)f(z) dz.

On the other hand

E(ψ(Z)2) =

∫ψ(z)2f(z) dz.

So we see that there is a factor W difference between the two expressions and obviously if W < 1in regions where ψ is larger, the procedure will lead to a smaller variance. Note that because fand g both have an integral one, we cannot have W < 1 everywhere.

We also remark that, assuming that ψ(z) does not vanish, if we take W (z) = E(ψ(Z))/ψ(z)which corresponds to g(z) = f(z)ψ(z)/E(ψ(Z)), we get

E(W (Z)2ψ(Z)2) = E(ψ(Z))

∫ψ(z)f(z) dz = E(ψ(Z))2 = E(W (Z)ψ(Z))2

so that V(W (Z)ψ(Z)) = 0. This of course cannot be done in practice as E(ψ(Z)) is the unknownquantity we wish to approximate, but it can be used as a guideline to find a density g that reducesthe variance as much as possible and tells us that the density g should be proportional to theintegrand fψ, i.e. that markers should be distributed according to the integrand.

5.3. Application to the PIC method. For the PIC method, we can combine the importancesampling method and the control variates method.

5.3.1. Importance sampling. The choice of a density for importance sampling depends on theexpected value that we are interested in. There are many of those in a PIC code, but arguably theaccurate computation of the electric field, which determines the self-consistent dynamics is the mostimportant. Depending on the physical problem we want to deal with more particles will be neededin some specific phase space areas, like for example in some region of the tail for a bump-on-tailinstability. For this reason, it is interesting in a PIC code to have the flexibility of drawing theparticles according to any density, but one needs to be careful with the choice of this density asthe results can become better or worse.

Initialisation. Assume we know the density g0 according to which we want to draw themarkers. Then we initialise the marker’s phase space positions z0

i = (x0i ,v

0i ) as realisations of a

random variable Z0 with density g0.

5. VARIANCE REDUCTION TECHNIQUES 33

Time stepping. The markers evolve along the characteristics of the Vlasov equation so thatat time t the random variable Zt = (Xt,Vt) is distributed according to the density g(t, z), that isthe solution of the Vlasov-Poisson equation with initial condition g0.

Then as we saw, the different quantities we need to compute using the Monte Carlo approxi-mation are of the form

(15)

∫ψ(z)f(t, z) dz =

∫ψ(z)

f(t, z)

g(t, z)g(t, z) dz = E

(ψ(Z)

f(t,Z)

g(t,Z)

)for some analytically known function ψ(z). This means that we need to simulate the random vari-

able Yt = ψ(Zt)f(t,Zt)g(t,Zt)

= ψ(Zt)W , where the random variable W is defined by W = f(t,Zt)/g(t,Zt).

Because f and g are conserved along the same characteristics we have

W =f(t,Zt)

g(t,Zt)=f0(Z0)

g0(Z0),

so that the random variable W does not depend on time and is set once for all at the initialisation.Using importance sampling, we obtain the so-called weighted PIC method, in which the particles

or markers are advance like in the standard PIC method, but have in addition an importance weightwhich does not evolve in time. The drawback of this method is that the variance can increase whenlarge importance weights and small importance weights are mixed close together in phase spacewhich often happens in long nonlinear simulations.

5.3.2. Control variates. We combine here control variates with importance sampling for mostgenerality, but it can also be used without importance sampling by taking g0 = f0.

In the PIC method expected values of the form (15) cannot be exactly computed because theparticle density in phase space f(t, z) is not analytically known except at the initial time. Howeverin many problems, e.g. Landau damping, bump-on-tail instability the distribution function staysclose to an analytically known distribution function f(t, z). Next to the control variate Yt associated

to f(t, z), this can be used to build the control variate Yt associated to f(t, z) such that

Yt = ψ(Z)f(t,Z)

g(t,Z), Yt = ψ(Z)

f(t,Z)

g(t,Z).

Indeed we have

E(Yt) =

∫ψ(z)

f(t, z)

g(t, z)g(t, z) dz =

∫ψ(z)f(t, z) dz

which can be computed analytically for simple enough functions ψ and f . Moreover if f is closeenough to f then Yt will be close to Yt and from the previous discussion a variance reduction ofthe order of the squared distance between the two random variables can be expected.

Let us now explain how this can be implemented in a PIC simulation.Initialisation. As for importance sampling, the initial phase space positions of the markers are

sampled as realisations (z0i )1≤i≤N of the random variable Z0 of density g0. The importance weights

are then defined by the corresponding realisations of the random variable W = f0(Z0)/g0(Z0), i.e.wi = f0(z0

i )/g0(z0i ).

We also initialise the importance weights for δf = f − f , which are defined by the randomvariable

W 0α =

f0(Z0)− αf(tn,Zn)

g0(Z0)= W − αf(0,Z0)

g0(Z0).

Time stepping. The markers Z are advanced by numerically solving the characteristics ofthe Vlasov equation. This means that given their positions Zn at time tn, an ODE solver is usedto compute an approximation of their position Zn+1 at time tn+1. Because f and g satisfy the


same Vlasov-Poisson equation, they are conserved along the same characteristics so that, as forimportance sampling

W =f(tn,Z

n)

g(tn,Zn)=f0(Z0)

g0(Z0)

is a random variable which does not depend on time. On the other hand, we know f analytically andknow that f and g are conserved along the characteristics, so that we can compute the importanceweight for δf at time tn from the phase space positions of the markers at the same time:

Wnα =

f(tn,Zn)− αf(tn,Z

n)

g(tn,Zn)=f0(Z0)− αf(tn,Z

n)

g0(Z0)= W − αf(tn,Z

n)

g0(Z0).

So Wnα is a time dependent random variable which can be computed explicitly using the analytical

functions f , f0 and g0. These values can be used to express the sample mean for the new simulatedrandom variable Yα = Y − α(Y − E(Y )). This is defined by

Mnα,N =

1

N

N∑i=1

(Y ni − αY n

i ) + αE(Y ).

Plugging in the values for Y ni and Y n

i we get

Mnα,N =

1

N

N∑i=1

(ψ(ZNi )

f(tn,Zni )− αf(tn,Z

ni )

g(tn,Zni )

)+ αE(Y ) =

1

N

N∑i=1

Wnα,iψ(ZNi ) + αE(Y ).

This yields an estimator for ψ(Z) based on the weights Wnα and the expected value that can be

computed analytically E(Y ). If no estimation of the optimal α∗ is available this method is usedwith α = 1.

This is classically known as the δf method in the PIC literature [3, 1], as its interest lies in

the expression f = f + δf with f known. A large variance reduction for α = 1 is obtained as longas δf f , else one can also achieve some variance reduction by optimising for α [6].

6. Coupling the Monte Carlo Vlasov solver with a grid based Poisson solver

The steps of the PIC algorithm are the following

(1) Initialisation:(a) Draw markers (xi,vi) according to the probability density g0(x,v), if g0 is not the ini-

tial particle distribution f0 compute the importance weights wi = f0(xi,vi)/g0(xi,vi).(b) Compute the initial electric field corresponding to the particles positons by solving

the Poisson equation on a grid of physical space. For this a discrete value, dependingon the Poisson solver being used, of the charge density ρ(t,x) = 1−

∫f(t,x,v) dv is

needed.(2) Time stepping to go from tn to tn+1:

(a) Push the particles from tn to tn+1 using the known discrete electric field. For this theelectric field needs to be evaluated at the particle postions.

(b) Compute the electric field corresponding to the new particle positions.

Next to the Monte Carlo solver for the Vlasov equation an important building block is the gridbased Poisson solver and the interaction between the two. We shall distinguish here Poisson solversneeding values at discrete points like Finite Difference or spectral collocation methods and solversusing finite dimensional function spaces like Finite Elements which are coupled with markers in adifferent manner.

The two steps linked to the coupling, are on the one hand the computation of the discretecharge density needed by the Poisson solver from the particle positions and on the other hand thecomputation of the electric field at the particle positions.

6. COUPLING THE MONTE CARLO VLASOV SOLVER WITH A GRID BASED POISSON SOLVER 35

6.1. Finite Difference PIC methods. We consider the 1D Poisson equation on the interval[0, L]

−∆φ = ρ = 1−∫f(x, v) dv,

with periodic boundary conditions. This is well posed provided the average of φ,∫ L

0 φ(x) dx = 0.We consider a uniform Nx points discretisation of the periodic interval [0, L], xj = j∆x = jL/Nx

for 0 ≤ j ≤ Nx− 1. Because of the periodicity we have for any discrete function (gj)0≤j≤Nx−1 thatgj+kNx = gj for any k ∈ Z, where we denote by gj an approximation of g(xj). The standard secondorder centred Finite Difference for solving this equation reads

(16)−φj+1 + 2φj − φj−1

∆x2= ρj for 0 ≤ j ≤ Nx − 1.

This yields a system of N equation with N unknowns. However all constant vectors are inthe kernel of the associated matrix. Hence we need to set the constant to get a unique solution.This can be done thanks to the vanishing average hypothesis on φ, which implies on the discretefunction that

∑Nx−1j=0 φj = 0. A second order Finite Difference formula for computing the electric

field then writes

(17) Ej = −φj+1 − φj−1

2∆xfor 0 ≤ j ≤ Nx − 1.

Proposition 6. Assume the electrostatic potential is computed from (ρ0, . . . , ρNx−1) using (16)and the electric field using (17) with periodic boundary conditions. Then we have the followingrelations:

Nx−1∑j=0

Ejρj = 0,

Nx−1∑j=0

φjρj =

Nx−1∑j=0

(φj+1 − φj)2

∆x2.

Proof. Using (16) and (17) we first compute

Nx−1∑j=0

Ejρj = −Nx−1∑j=0

(−φj+1 + 2φj − φj−1)

∆x2

(φj+1 − φj−1)

2∆x,

=1

2∆x3

Nx−1∑j=0

(φ2j+1 − φ2

j−1 − 2φjφj+1 + 2φjφj−1),

= 0.

Indeed, by change of index, using the periodicity, we have

Nx−1∑j=0

φ2j+1 =

Nx−1∑j=0

φ2j−1 and

Nx−1∑j=0

φjφj+1 =

Nx−1∑j=0

φjφj−1.

Now multiplying (16) by φj we get, using again periodicity and change of index

Nx−1∑j=0

φjρj = −Nx−1∑j=0

φj(φj+1 − φj)− (φj − φj−1)

∆x2,

=

Nx−1∑j=0

(φj+1 − φj)2

∆x2.


Remark 5. The properties in this proposition are discrete versions of the following propertiesverified by the continuous equations with periodic boundary conditions:

−∫ L

0ρ∇φ dx =

∫ L

0∆φ∇φ dx = 0, and

∫ L

0ρφ dx = −

∫ L

0∆φφ dx =

∫ L

0(∇φ)2 dx.

These are necessary conditions for the conservation laws and to have them satisfied at the discretelevel, one needs a discrete version of these. As we verified a standard centred second order schemeprovides them, but there are many others, like higher order centred schemes or classical spectralFourier schemes.

6.2. Finite Element PIC methods. Still for the 1D Poisson equation on the interval [0, L]

− d2φ

dx2= ρ = 1−

∫f(x, v) dv,

with periodic boundary conditions.A variational formulation of this equation is obtained by multiplied by a smooth test function

and integrating by parts the left hand side. Then the variational formulation reads:Find φ ∈ H1

] (0, L) such that

(18)

∫ L

0φ′(x)ψ′(x) dx =

∫ L

0ρ(x)ψ(x) dx, ∀ψ ∈ H1

] (0, L),

where we denote H1] (0, L) the space of L-periodic functions with vanishing mean.

A Finite Element approximation, is a Galerkin approximation of (18), which means that weare looking for a function φh ∈ Vh, with Vh a finite dimensional subspace of H1

] (0, L), the testfunctions ψh also being in Vh. Expressing the unknown functions φh and the test functions ψh inthe same finite dimensional basis of size Nx, the variational formulation in the finite dimensionalspace is algebraically equivalent to a non singular linear system of size Nx.

We consider now a Finite Element discretisation using the finite dimensional subspace of peri-odic spline functions of degree p on the uniform grid xj = j∆x = jL/Nx:

Sph = φh ∈ Cp−1] (0, L) | φh|[xj ,xj+1] ∈ Pp([xj , xj+1]), ,

where Cp−1] (0, L) is the space of L-periodic p−1 time continuously derivable functions and Pp([xj , xj+1])

the space of polynomials of degree p on the interval [xj , xj+1]. Then a finite dimensional subspaceof H1

] (0, L) is

Vh = φh ∈ Sph |∫ L

0φh(x) dx = 0.

A basis of Sph can be defined using the B-splines of degree p on a uniform periodic grid of step∆x. Those are defined by induction by the de Boor recursion formula S0

j = 1 if j∆x ≤ x < (j+1)∆xand 0 else. And for all p ∈ N∗,

(19) Spj (x) =x/∆x− j

pSp−1j (x) +

(j + p+ 1)− x/∆xp

Sp−1j+1 (x).

From this definition, it also easily follows the formula for the derivative of a uniform B-spline

(20)dSpj (x)

dx=Sp−1j (x)− Sp−1

j+1 (x)

∆x.

Using this B-spline basis a function φh ∈ Vh writes φh =∑Nx−1

j=0 φjSpj (x), with

∑Nx−1j=0 φj = 0

so that the average of φ vanishes. Plugging this into the variational formulation (18) with test


function ψh = Spk for k = 0, . . . , Nx− 1 we get the Galerkin approximation of the Poisson equation

Nx−1∑j=0

φj

∫ L

0S′j(x)S′k(x) dx =

∫ L

0ρ(t, x)Sk(x) dx = 1−

∫ L

0

∫ +∞

−∞f(t, x, v)Sk(x) dx dv.

Now for a random variable (Xt, Vt) having density g(t, x, v) and importance weight W with respectto the density f(t, x, v), we have∫ L

0

∫ +∞

−∞f(t, x, v)Sk(x) dx dv = E(WSk(X)) ≈ 1

Np

Np∑i=1

wiSk(xi)

with our Monte Carlo approximation. Hence the Poisson equation we need to solve, with a sourceterm coming from the Monte Carlo approximation for Vlasov becomes

(21)

Nx−1∑j=0

φj

∫ L

0S′j(x)S′k(x) dx =

1

Np

Np∑i=1

wiSk(xi).

This yields the linear system Kφ = b, the coefficients of K being∫ L

0 S′j(x)S′k(x) dx, the components

of the column vector φ being φj and the components of the column vector b being 1Np

∑Npi=1wiSk(xi).

The matrix K is called the stiffness matrix in Finite Element terminology. In our case because ofthe periodic boundary conditions, K is singular of rank Nx − 1 as all constant vectors are in itskernel. To get a unique solution of the system we need the additional condition

∑Nx−1j=0 φj = 0 (for

translation invariant basis functions).As opposed to the Finite Difference discretisation where we need an additional smoothing

kernel, the Finite Element basis functions provide the needed regularisation naturally.

Remark 6. The Galerkin procedure also provides a kernel density estimate by projecting or-thogonally in L2 the density on the span of Sj, 0 ≤ j ≤ Nx − 1. This reads

Nx−1∑j=0

ρj

∫ L

0Sj(x)Sk(x) dx =

1

Np

Np∑i=1

wiSk(xi),

or as a linear system Mρ = b, where M is the matrix with coefficients∫ L

0 Sj(x)Sk(x) dx ρ is thecolumn vector of components ρj and b is defined as above. M is called mass matrix in the FiniteElement terminology.

6.3. Conservation properties.6.3.1. Total number of particles. The total number of particles, in the importance sampling

formulation, is

N =

∫f(t, x, v) dx dv =

∫f(t, x, v)

g(t, x, v)g(t, x, v) dx dv = E(W ) = 1,

the importance weight being defined by W = f(t,X, V )/g(t,X, V ). This is not depending on timeas f and g evolve along the same characteristics.

Its estimator is defined by

NN =1

N

N∑i=1

Wi.

Because the weights Wi are drawn at the initialisation and not changed during the time stepping,NN is exactly conserved during the simulation.


With a control variate the estimator for the total number of particles is

NαN =

1

N

N∑i=1

W tα,i + α, with W t

α,i = Wi − αf(t,Xt

i , Vti )

g0(X0, V0).

Here W tα,i evolves in time and there is no reason why Nα

N should be exactly conserved. However,

as the W tα,i are identically distributed, we can verify the the estimator is unbiased:

E(NαN ) =

1

N

N∑i=1

E(W tα,i) + α = E(W t

α) + α = E(Wi) + α(1− E

(f(t,Xt

i , Vti )

g0(X0, V0)

)) = E(Wi) = 1.

Indeed, as g is conserved along the characteristics

E

(f(t,Xt

i , Vti )

g0(X0, V0)

)= E

(f(t,Xt

i , Vti )

g(t,Xti , V

ti )

)=

∫f(t, x, v

g(t, x, v)g(t, x, v) dx dv =

∫f(t, x, v) dx dv = 1.

As it is important in practice to have exact conservation of the number of particles in order toavoid the build up of a spurious electric field which can become important in long times, we preferto modify the estimator in order to have exact conservation of the number of particles. To thisaim, we replace P ti = f(t,Xt

i , Vti )/g0(X0, V0) by

P ti = 1 +f(t,Xt

i , Vti )

g0(X0, V0)− 1

N

N∑j=1

(f(t,Xt

j , Vtj )

g0(X0, V0)

).

As the (Xti , V

ti ) are identically distributed and the expected value is linear it follows that E(P ti ) = 1.

Then, for any function ψ, the estimator

MαN =

1

N

N∑i=1

(Wi − αP ti )ψ(Zti) + αE(Y ).

is unbiased and the total number of particles is exactly conserved by construction as 1N

∑Ni=1 P

ti = 1.

6.4. Total momentum. In order to couple this Poisson solver to our Monte Carlo Vlasovsolver, we need a way to define ρj , the value of the density at the grid points xj . This is classicallydone using a kernel density estimator S∆x, where S is any smooth function with integral one,typically in PIC simulations S are chosen to be B-spline functions, or tensor products of B-splinefunctions in several dimensions, and S∆x = S(x/∆x)/∆x. Then we get that

(22) ρj(t) = 1−∫S∆x(xj − y)f(t, y, v) dy = 1−

∫S∆x(xj − y)w(t, y, v)g(t, y, v) dy

= 1− 1

Np

Np∑i=1

wiS∆x(xj − xi(t))

where (Xi(t), Vi(t)) are random variables of density g(t, y, v), and the importance weight is Wj =f(t,Xi(t), Vi(t))/g(t,Xi(t), Vi(t)), which is one for standard PIC method where the particles densityis f . The realisations of the random variables are the particles, or markers, positions at time t thatare computed by numerical integrating the characteristics of the Vlasov equation.

The reverse part of the coupling consists in obtaining consists in computing the electric field atthe particle positions. The electric field field is defined by E(t, x) = −dφ

dx (x, t) and can be obtainedat the grid points xj by finite differences of the potential, for example centred finite differences for

second order accuracy Ej =φj−1−φj+1

2∆x . Then to compute the electric field at the particle position astandard Lagrange interpolation could be used. However, in order to achieve conservation of total


momentum, we need to use a kernel smoothing using the same kernel as we use for computing ρj .Hence we define for any x ∈ [0, L]

(23) E(x) =

Nx∑j=0

EjS∆x(x− xj).

Note that in practice for a given x only very few j such that x− xj are in the support of S∆x havea non vanishing contribution.

Proposition 7. A PIC code for which ρj is computed from the particles using (22) and theelectric field is computed at the particle position using (23) with the same smoothing kernel S∆x,with a velocity update with steps of the form vn+1

i = vni − hE(xi) and such that the Poisson solver

verifies∑Nx

i=0Ejρj = 0 and∑Nx

j=0Ej, conserves the total momentum, i.e.

1

Np

Np∑i=1

wivn+1i =

1

Np

Np∑i=1

wivni .

Proof. Multiplying by wi and summing over all the particles vn+1i = vni − hE(xi) becomes

1

Np

Np∑i=1

wivn+1i =

1

Np

Np∑i=1

wivni −

h

Np

Np∑i=1

wiE(xi).

We hence need to proof that 1Np

∑Npi=1wiE(xi) = 0. Using (23)

1

Np

Np∑i=1

wiE(xi) =1

Np

Np∑i=1

Nx∑j=0

wiEjS∆x(xni − xj)

=

Nx∑j=0

Ej1

Np

Np∑i=1

wiS∆x(xni − xj)

=

Nx∑j=0

Ej(1− ρnj )

= 0.

using (22),∑Nx

i=0Ejρj = 0 and∑Nx

j=0Ej .

Remark 7. Total momentum conservation conservation implies that a particle cannot generatea force on itself as is physically correct. This is obtained by applying the proposition for only oneparticle. This is known as the self force issue that has raised many discussions in the Particle InCell literature [4]. Obviously having only one particle is not useful in a non trivial Monte Carlo,however having an unbiased estimator of the total momentum is very important avoid non physicaldrifts. Following the above proof this is achieved if the electric field verifies E(WE(X)) = 0.

6.4.1. Total energy. At the continuous level, the total energy defined by

E =1

2

∫v2f(t, x, v) dx dv +

1

2

∫φ′(x)2 dx,

is conserved.Taking ψh = φh in the variational formulation (18), immediately yields∫

φ′h(x)2 dx =

∫ρφh dx = −

∫f(t, x, v)φh dx dv = E(φh(X)).


So thatEh = E(V 2/2)− E(φh(X)).

Bibliography

[1] Simon J. Allfrey and Roman Hatzky. A revised δf algorithm for nonlinear PIC simulation. Computer physicscommunications, 154(2):98–104, 2003.

[2] Herbert Amann. Ordinary differential equations: an introduction to nonlinear analysis, volume 13. Walter deGruyter, 1990.

[3] Ahmet Y. Aydemir. A unified monte carlo interpretation of particle simulations and applications to non-neutralplasmas. Physics of Plasmas, 1(4):822–831, 1994.

[4] Charles K. Birdsall and A. Bruce Langdon. Plasma physics via computer simulation. CRC Press, 2004.[5] William L. Dunn and J. Kenneth Shultis. Exploring Monte Carlo Methods. Academic Press, 2012.[6] Ralf Kleiber, Roman Hatzky, Axel Konies, Karla Kauffmann, and Per Helander. An improved control-variate

scheme for particle-in-cell simulations with collisions. Comput. Phy. Comm., 182:1005–1012, 2011.

41

Documents

Monte Carlo Methods with applications to plasma physics€¦ · Monte Carlo Methods with applications to plasma physics ... Introduction 5 1. Plasmas 5 2. Controlled ... Plasma_physics