67
Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing Axel Gandy Department of Mathematics, Imperial College London http://www2.imperial.ac.uk/~agandy London Taught Course Centre for PhD Students in the Mathematical Sciences Autumn 2014 Lectures 2-5: Optimisation, MCMC, Bootstrap, Particle Filtering

Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Advanced Computational Methods in Statistics:Lecture 1

Monte Carlo Simulation & Parallel Computing

Axel Gandy

Department of Mathematics, Imperial College Londonhttp://www2.imperial.ac.uk/~agandy

London Taught Course Centrefor PhD Students in the Mathematical Sciences

Autumn 2014

Lectures 2-5: Optimisation, MCMC, Bootstrap, Particle Filtering

Page 2: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Today’s Lecture

Part I Monte Carlo Simulation

Part II Introduction to Parallel Computing

Axel Gandy 2

Page 3: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Part I

Monte Carlo Simulation

Random Number Generation

Computation of Integrals

Variance Reduction Techniques

Axel Gandy Monte Carlo Simulation 3

Page 4: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

OutlineRandom Number Generation

Uniform Random Number GenerationNonuniform Random Number Generation

Computation of Integrals

Variance Reduction Techniques

Axel Gandy Monte Carlo Simulation 4

Page 5: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Uniform Random Number Generation

I Basic building block of simulation:stream of independent rv U1, U2, . . . ∼ U(0, 1)

I “True” random number generators:I based on physical phenomenaI Example http://www.random.org/; R-package random: “The

randomness comes from atmospheric noise”I Disadvantages of physical systems:

I cumbersome to install and runI costlyI slowI cannot reproduce the exact same sequence twice [verification,

debugging, comparing algorithms with the same stream]

I Pseudo Random Number Generators: Deterministic algorithmsI Example: linear congruential generators:

un =snM, sn+1 = (asn + c)modM

Axel Gandy Monte Carlo Simulation 5

Page 6: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

General framework for Uniform RNG

(L’Ecuyer, 1994)

s s1T

u1

G

s2T

u2

G

s3T

u3

G

. . .T

I s initial state (’seed’)I S finite set of statesI T : S → S is the transition functionI U finite set of output symbols

(often 0, . . . ,m − 1 or a finite subset of [0, 1])I G : S → U output functionI si := T (Si−1) and ui := G (si ).I output: u1, u2, . . .

Axel Gandy Monte Carlo Simulation 6

Page 7: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Some Notes for Uniform RNG

I S finite =⇒ ui is periodic

I In practice: seed s often chosen by clock time as default.

I Good practice to be able to reproduce simulations:

Save the seed!

I Default random number generator in R :Matsumoto, M. and Nishimura, T. (1998) Mersenne Twister: A623-dimensionally equidistributed uniform pseudo-randomnumber generator, ACM Transactions on Modeling andComputer Simulation, 8, 3-30.The ’state’ is a 624-dimensional set of 32-bit integers plus acurrent position in that set.

Axel Gandy Monte Carlo Simulation 7

Page 8: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Quality of Random Number Generators

I “Random numbers should not be generated with a methodchosen at random” (Knuth, 1981, p.5)Some old implementations were unreliable!

I Desirable properties of random number generators:I Statistical uniformity and unpredictabilityI Period LengthI EfficiencyI Theoretical SupportI Repeatability, portability, jumping ahead, ease of implementation

(more on this see e.g. Gentle (2003), L’Ecuyer (2004), L’Ecuyer(2006), Knuth (1998))

I Usually you will do well with generators in modern software (e.g.the default generators in R).Don’t try to implement your own generator!(unless you have very good reasons)

Axel Gandy Monte Carlo Simulation 8

Page 9: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Nonuniform Random Number Generation

I How to generate nonuniform random variables?

I Basic idea:

Apply transformations to a stream of iid U[0,1] random variables

Axel Gandy Monte Carlo Simulation 9

Page 10: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Inversion Method

I Let F be a cdf.

I Quantile function (essentially the inverse of the cdf):

F−1(u) = infx : F (x) ≥ u

I If U is uniform on [0,1] then F−1(U) ∼ F . Indeed, assuming F isstrictly increasing,

P(X ≤ x) = P(F−1(U) ≤ x) = P(U ≤ F (x)) = F (x)

I Only works if F−1 (or a good approximation of it) is available.Numerical approximation if only F is available: Solve F(x)=U forx using your favourite numerical root-finder.

Axel Gandy Monte Carlo Simulation 10

Page 11: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Acceptance-Rejection Method

0 5 10 15 20

0.00

0.10

0.20

0.30

x

f(x)Cg(x)

I target density fI Proposal density g (easy to generate from) such that for some

C <∞:f (x) ≤ Cg(x)∀x

I Algorithm:1. Generate X from g .

2. With probability f (X )Cg(X ) return X - otherwise goto 1.

I 1C = probability of acceptance - want it to be as close to 1 aspossible.

Axel Gandy Monte Carlo Simulation 11

Page 12: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Further Algorithms

Examples:

I Ratio-of-Uniforms

I Use of the characteristic function

I MCMC

For many of those techniques and techniques to simulate specificdistributions see e.g. Gentle (2003).

Axel Gandy Monte Carlo Simulation 12

Page 13: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

OutlineRandom Number Generation

Computation of IntegralsNumerical IntegrationMonte Carlo IntegrationQuasi Monte CarloComparison

Variance Reduction Techniques

Axel Gandy Monte Carlo Simulation 13

Page 14: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Evaluation of an Integral

I Want to evaluate

I :=

∫[0,1]d

g(x)dx

I Importance for statistics: computation of expected values(posterior means), probabilities (p-values), variances, normalisingconstants, .... For example, let X be a r.v. with pdf f . Then

E(X ) =

∫xf (x)dx and P(x ∈ A) =

∫I(x ∈ A)f (x)dx

I Often, d is large. In a random sample, often d =sample size.I How to solve it?

I Symbolical (programs such as Maple, Mathematica may help)I Numerical IntegrationI Quasi Monte CarloI Monte Carlo Integration

Axel Gandy Monte Carlo Simulation 14

Page 15: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Numerical Integration/Quadrature

I Main idea: approximate the function locally with simplefunction/polynomials

I Advantage: good convergence rate

I Not useful for high dimensions - curse of dimensionality

Axel Gandy Monte Carlo Simulation 15

Page 16: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Midpoint Formula

I Basic: ∫ 1

0f (x)dx ≈ f

(1

2

)(1− 0)

I Composite: apply the rule in n subintervals (similar to RiemannSums) R-Demo

0.0 0.5 1.0 1.5 2.0

−1.

0−

0.5

0.0

0.5

1.0

Error: O( 1n2 ) if f is twice continuously differentiable.

Axel Gandy Monte Carlo Simulation 16

Page 17: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing
Page 18: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing
Page 19: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Trapezoidal Formula

I Basic: ∫ 1

0f (x)dx ≈ 1

2(f (0) + f (1))

I Composite:

0.0 0.5 1.0 1.5 2.0

−1.

0−

0.5

0.0

0.5

1.0

Error: O( 1n2 ).

Axel Gandy Monte Carlo Simulation 17

Page 20: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Simpson’s rule

I Approximate the integrand by a quadratic function∫ 1

0f (x)dx ≈ 1

6[f (0) + 4f (

1

2) + f (1)]

I Composite Simpson:

0.0 0.5 1.0 1.5 2.0

−1.

0−

0.5

0.0

0.5

1.0

Error: O( 1n4 ). Axel Gandy Monte Carlo Simulation 18

Page 21: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Advanced Numerical Integration Methods

I Newton Cotes formulas (assume existence of higher derivatives)

I Adaptive methods (more points in areas where function changesquickly)

I Unbounded integration interval: transformations(use substitution to transform unbounded interval to boundedinterval)

R-Demo

Axel Gandy Monte Carlo Simulation 19

Page 22: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Curse of dimensionality - Numerical Integration in HigherDimensions

I :=

∫[0,1]d

g(x)dx

I Naive approach:I write as iterated integral

I :=

∫ 1

0

. . .

∫ 1

0

g(x)dxn . . . dx1

I use 1D scheme for each integral with, say g points .I n = gd function evaluations needed

for d = 100 (a moderate sample size) and g = 10 (which is not alot):n > estimated number of atoms in the universe!

I Suppose we use the trapezoidal rule, then the error = O( 1n2/d )

I More advanced schemes are not doing much better!

Axel Gandy Monte Carlo Simulation 20

Page 23: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Monte Carlo Integration

∫[0,1]d

g(x)dx ≈ 1

n

n∑i=1

g(Xi ),

where X1,X2, · · · ∼ U([0, 1]d) iid.

I SLLN:1

n

n∑i=1

g(Xi )→∫

[0,1]dg(x)dx (n→∞)

I CLT: error is bounded by OP( 1√n

).

independent of d

I Can easily compute asymptotic confidence intervals.

I Note: Trapezoidal rule has faster convergence for d ≤ 3.

Axel Gandy Monte Carlo Simulation 21

Page 24: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Quasi-Monte-Carlo

I Similar to MC, but instead of random Xi : Use deterministic xi

that fill [0, 1]d evenly.so-called “low-discrepancy sequences”.

R-package randtoolbox

Axel Gandy Monte Carlo Simulation 22

Page 25: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Comparison between Quasi-Monte-Carlo and Monte Carlo- 2D

1000 Points in [0, 1]2 generated by a quasi-RNG and a Pseudo-RNG

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Quasi−Random−Number−Generator

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Standard R−Randon Number Generator

Axel Gandy Monte Carlo Simulation 23

Page 26: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Comparison between Quasi-Monte-Carlo and Monte Carlo

∫[0,1]4

(x1 + x2)(x2 + x3)2(x3 + x4)3dx

Using Monte-Carlo and Quasi-MC

0e+00 2e+05 4e+05 6e+05 8e+05 1e+06

2.55

2.60

2.65

2.70

2.75

n

quasi−MCMC

Axel Gandy Monte Carlo Simulation 24

Page 27: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Bounds on the Quasi-MC error

I Koksma-Hlawka inequality (Niederreiter, 1992, Theorem 2.11)

‖1

n

∑g(xi )−

∫[0,1]d

g(x)dx‖ ≤ Vd(g)D∗n ,

whereVd(g) is the so-called Hardy and Krause variation of g

andDn is the discrepancy of the points x1, . . . , xn in [0, 1]d

given by

D∗n = supA∈A|#xi ∈ A : i = 1, . . . , n − λ(A)|

whereI λ is Lebesgue measureI A is the set of all subrectangles of [0, 1]d of the

form∏d

i=1[0, ai ]d

I Many sequences have been suggested, e.g. the Halton sequence(other sequences: Faure, Sobol, ...) with:

D∗n = O(log(n)d−1

n)

→ better convergence rate than MC integration!However, it does depend on d

I Conjecture: for all sets of points Dn

D∗n ≥ O(log(n)d−1

n)

(Niederreiter, 1992, p.32)

Axel Gandy Monte Carlo Simulation 25

Page 28: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Bounds on the Quasi-MC error

I Koksma-Hlawka inequality (Niederreiter, 1992, Theorem 2.11)

‖1

n

∑g(xi )−

∫[0,1]d

g(x)dx‖ ≤ Vd(g)D∗n ,

I Many sequences have been suggested, e.g. the Halton sequence(other sequences: Faure, Sobol, ...) with:

D∗n = O(log(n)d−1

n)

→ better convergence rate than MC integration!However, it does depend on d

I Conjecture: for all sets of points Dn

D∗n ≥ O(log(n)d−1

n)

(Niederreiter, 1992, p.32)

Axel Gandy Monte Carlo Simulation 25

Page 29: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Comparison

R-Demo

The consensus in the literature seems to be:

I use numerical integration for small d

I Quasi-MC useful for medium d

I use Monte Carlo integration for large d

Axel Gandy Monte Carlo Simulation 26

Page 30: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

OutlineRandom Number Generation

Computation of Integrals

Variance Reduction TechniquesImportance SamplingControl VariatesFurther Variance Reduction Techniques

Axel Gandy Monte Carlo Simulation 27

Page 31: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Importance Sampling

I Main idea: Change the density we are sampling from.I Interested in E(φ(X )) =

∫φ(x)f (x)dx

I For any density g ,

E(φ(X )) =

∫φ(x)

f (x)

g(x)g(x)dx

I Thus an unbiased estimator of E(φ(X )) is

I =1

n

n∑i=1

φ(Xi )f (Xi )

g(Xi ),

where X1, . . . ,Xn ∼ g iid.I How to choose g?

I Suppose g ∝ φf then Var(I ) = 0.However, the corresponding normalizing constant is E(φ(X )), thequantity we want to estimate!

I A lot of theoretical work is based on large deviation theory.

Axel Gandy Monte Carlo Simulation 28

Page 32: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Importance sampling - Comments

I Importance sampling can greatly reduce the variance forestimating the probability of rare events, i.e. φ(x) = I(x ∈ A)and E(φ(X )) = P(X ∈ A) small.

I It is not just useful for variance reduction - it can also be veryuseful to generate random variables.

I Be careful with support/tails!

Axel Gandy Monte Carlo Simulation 29

Page 33: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Control Variates

I Interested in I = E XI Suppose we can also observe Y and know E Y .I Consider T = X + a(Y − E(Y ))I Then E T = I and

Var T = Var X + 2a Cov(X ,Y ) + a2 Var Y

Minimized for a = −Cov(X ,Y )Var Y .

I usually, a not known → estimateI For Monte Carlo sampling:

I generate iid sample (X1,Y1), . . . , (Xn,Yn)I estimate Cov(X ,Y ), Var Y based on this sample → aI I = 1

n

∑ni=1[Xi + a(Yi − E(Y ))]

I I can be computed via standard regression analysis.Hence the term“regression-adjusted control variates”.

I Can be easily generalised to several control variates.

Axel Gandy Monte Carlo Simulation 30

Page 34: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Random Number Generation Computation of Integrals Variance Reduction Techniques

Further Techniques

I If density symmetric around point: Antithetic SamplingUse X and −X if symmetric around 0.

I Conditional Monte CarloEvaluate parts explicitly

I Common Random NumbersFor comparing two procedures - use the same sequence ofrandom numbers.

I StratificationI Divide sample space Ω into strata Ω1, . . . ,Ωs

I In each strata, generate Ri replicates conditional on Ωi andobtain an estimates Ii

I Combine using the law of total probability:

I = p1 I1 + · · ·+ ps Is

I Need to know pi = P(Ωi ) for all i

Axel Gandy Monte Carlo Simulation 31

Page 35: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Part II

Parallel Computing

Introduction

Parallel RNG

Practical use of parallel computing (R)

Axel Gandy Parallel Computing 32

Page 36: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

OutlineIntroduction

Moore’s LawArchitecture of Parallel ComputersEmbarrassingly ParallelSpeedupCommunication between Processes

Parallel RNG

Practical use of parallel computing (R)

Axel Gandy Parallel Computing 33

Page 37: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Moore’s Law

(Source: Wikipedia, Creative Commons Attribution ShareAlike 3.0 License)

Axel Gandy Parallel Computing 34

Page 38: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Growth of Data Storage

1980 1985 1990 1995 2000 2005 2010

1e−

021e

+00

1e+

02

Growth PC Harddrive Capacity

year

capa

city

[GB

]

I Not only the computer speed but also the data size is increasingexponentially!

I The increase in the available storage is at least as fast as theincrease in computing power.

Axel Gandy Parallel Computing 35

Page 39: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Introduction

I Recently: Less increase in CPU clock speed

I → multi core CPUs are available (eight cores readily available -80 cores in labs)

I → software needs to be adapted to exploit this

I Traditional computing:Problem is broken into small steps that are executed sequentially

I Parallel computing:Steps are being executed in parallel

Axel Gandy Parallel Computing 36

Page 40: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

von Neumann Architecture

I CPU executes a stored program that specifies a sequence of readand write operations on the memory.

I Memory is used to store both program and data instructions

I Program instructions are coded data which tell the computer todo something

I Data is simply information to be used by the program

I A central processing unit (CPU) gets instructions and/or datafrom memory, decodes the instructions and then sequentiallyperforms them.

Axel Gandy Parallel Computing 37

Page 41: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Different Architectures

I Multicore computing

I Symmetric multiprocessingI Distributed Computing

I Cluster computingI Massive Parallel processorI Grid Computing

List of top 500 supercomputers at http://www.top500.org/

Axel Gandy Parallel Computing 38

Page 42: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Flynn’s taxonomySingle Instruction Multiple Instruction

Single Data SISD MISD

Multiple Data SIMD MIMDExamples:

I SIMD: GPUs

Axel Gandy Parallel Computing 39

Page 43: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Memory Architectures of Parallel Computers

I Traditional SystemCPU

Memory

I Shared Memory System Memory

CPU CPU CPU

I Distributed Memory System

CPU

Memory

CPU

Memory

CPU

Memory

I Distributed Shared Memory System

Memory

CPU CPU CPU

Memory

CPU CPU CPU

Memory

CPU CPU CPU

Axel Gandy Parallel Computing 40

Page 44: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Embarrassingly Parallel Computations

Task can be divided into parts that can be executed separatedly.Examples:

I Monte Carlo Integration

I Bootstrap

I Cross-Validation

Note: MCMC methods do not fall (easily) into this category.

Axel Gandy Parallel Computing 41

Page 45: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Speedup

Ideally: computational time reduced linearly in the number of CPUs

I Suppose only a fraction p of thetotal tasks can be parallelized.

I Supposing we have n parallelCPUs, the speedup is

1

(1− p) + p/n(Amdahl’s Law)

→ no infinite speedup possible.

Example

p = 90%, maximum speed upby a factor of 10.

Axel Gandy Parallel Computing 42

Page 46: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Speedup

Ideally: computational time reduced linearly in the number of CPUs

I Suppose only a fraction p of thetotal tasks can be parallelized.

I Supposing we have n parallelCPUs, the speedup is

1

(1− p) + p/n(Amdahl’s Law)

→ no infinite speedup possible.

Example

p = 90%, maximum speed upby a factor of 10.

0 20 40 60 80 100

24

68

10

90% of task can be parallelized

number of processors

spee

d up

Axel Gandy Parallel Computing 42

Page 47: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Communication between processes

I Forking

I Threading

I OpenMP (good for multicore machines)shared memory multiprocessing

I PVM (Parallel Virtual Machine)

I MPI (Message Passing Interface; de facto standard for largescale parallel computations)

I For “Big Data”: Hadoop and related approaches.

How to divide tasks? e.g. Master/Slave concept

Axel Gandy Parallel Computing 43

Page 48: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

OutlineIntroduction

Parallel RNGIntroGeneral ApproachImplementations in R

Practical use of parallel computing (R)

Axel Gandy Parallel Computing 44

Page 49: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Parallel Random Number Generation

Problems with RNG on parallel computers

I Cannot use identical streams

I Sharing a single stream: a lot of overhead.

I Starting from different seeds: danger of overlapping streams(in particular if seeding is not sophisticated or simulation is large)

I Need independent streams on each processor...

Axel Gandy Parallel Computing 45

Page 50: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Parallel Random Number Generation - sketch of generalapproach

s

s31

u31

G

s32

T

u32

G

s33

T

u33

G

. . .T

f3

s21

u21

G

s22

T

u22

G

s23

T

u23

G

. . .Tf2

s11

u11

G

s12

T

u12

G

s13

T

u13

G

. . .T

f1

Axel Gandy Parallel Computing 46

Page 51: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Packages in R for Parallen random Number Generation

rsprng Interface to the scalable parallel random numbergenerators library (SPRNG)http://sprng.cs.fsu.edu/

rlecuyer Essentially starts with one random stream andpartitions it into long substreams by jumping ahead.L’Ecuyer et al. (2002)

Axel Gandy Parallel Computing 47

Page 52: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

OutlineIntroduction

Parallel RNG

Practical use of parallel computing (R)Things to do before considering parallelisation.PackagesSome other Packages

Axel Gandy Parallel Computing 48

Page 53: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Profile

I Determine what part of the programme uses most time with aprofiler

I Improve the important parts (usually the innermost loop)

I R has a built-in profiler (see Rprof, Rprof.summary, packageprofr)

Axel Gandy Parallel Computing 49

Page 54: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Use Vectorization instead of Loops

> a <- rnorm(1e7);

> system.time(x <- 0; for (i in 1:length(a)) x <- x+a[i])[3]

elapsed8.5

> system.time(sum(a))[3]

elapsed0.02

Axel Gandy Parallel Computing 50

Page 55: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Just in time compilation - the compiler package I

I compiler package: can pre-compile R code into “Byte Code”

I R core - the base and recommended packages are nowbyte-compiled by default.

> library(compiler)> f <- function(i)j <- 0;for (i in 1:10000) j <- j+i;j> system.time(replicate(1000,f()))

user system elapsed4.98 0.00 4.98

> fc <- cmpfun(f)> system.time(replicate(1000,fc()))

user system elapsed0.44 0.00 0.43

Axel Gandy Parallel Computing 51

Page 56: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Just in time compilation - the compiler package IIThis is an extreme example; the speedup will usually be not as big.Can enable this functionality automatically via

require(compiler)enableJIT(3)

I Other JIT implementation: RA; Not just a package - centralparts are reimplemented.(http://www.milbo.users.sonic.net/ra/); need to installRa instead of R (Ra has not been updated for some time).

I Bill Venables (on R help archive):“if you really want to write R code as you might C code, then jitcan help make it practical in terms of time. On the other hand, ifyou want to write R code using as much of the inbuilt operatorsas you have, then you can possibly still do things better.”

Axel Gandy Parallel Computing 52

Page 57: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Use Compiled Code

I R is an interpreted language.

I Can include C, C++ and Fortran code.

I Can dramaticallly speed up computationally intensive parts(a factor of 100 is possible)

I No speedup if the computationally part is a vector/matrixoperation.

I Downside: decreased portability, longer programming time

I Helpful libraries: Rcpp

Axel Gandy Parallel Computing 53

Page 58: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

R-package: parallelI Part of R core since R version 2.14.

I Mostly for “embarassingly parallel” computations

I Extends the “apply”-style function to a cluster of machines

> detectCores()[1] 4> cl <- makeCluster(4)> f <- function(i) mean(replicate(10000,mean(rnorm(10000,mean=i))))> parSapply(cl,1:4,FUN=f)[1] 1.000165 2.000036 3.000067 3.999995> system.time(parSapply(cl,1:4,FUN=f))

user system elapsed0.00 0.00 16.24

> system.time(sapply(1:4,FUN=f))user system elapsed44.32 0.01 44.63

> stopCluster(cl)

Page 59: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Random number generator

parallel RNG is being set up automatically

> cl<-makeCluster(2)> parLapply(cl,1:2,function(i) rnorm(3))[[1]][1] -0.1490175 1.4870953 -0.4753602

[[2]][1] -1.139051 0.202475 1.184057

> stopCluster(cl)

Axel Gandy Parallel Computing 55

Page 60: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

RmpiFor more complicated parallel algorithms that are not embarassinglyparallel.Tutorial under http://math.acadiau.ca/ACMMaC/Rmpi/Hello world from this tutorial

# Load the R MPI package if it is not already loaded.if (!is.loaded("mpi_initialize")) library("Rmpi")

# Spawn as many slaves as possiblempi.spawn.Rslaves()

# In case R exits unexpectedly, have it automatically clean up# resources taken up by Rmpi (slaves, memory, etc...).Last <- function()if (is.loaded("mpi_initialize"))if (mpi.comm.size(1) > 0)print("Please use mpi.close.Rslaves() to close slaves.")mpi.close.Rslaves()print("Please use mpi.quit() to quit R").Call("mpi_finalize")

# Tell all slaves to return a message identifying themselvesmpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))# Tell all slaves to close down, and exit the programmpi.close.Rslaves()mpi.quit()

(not able to install under win from CRAN - install fromhttp://www.stats.uwo.ca/faculty/yu/Rmpi/)

Axel Gandy Parallel Computing 56

Page 61: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Some other Packages

multicore Use of parallel computing on a single machine via fork(Unix, MacOS) - very fast and easy to use.

GridR http://cran.r-project.org/web/packages/GridR/Wegener et al. (2009, Future Generation ComputerSystems)

rparallel http://www.rparallel.org/

Axel Gandy Parallel Computing 57

Page 62: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

GPUs

I graphical processing units - in graphics cards

I very good at parallel processing

I need to taylor to specific GPU.

I Packages in R:

gputools several basic routines.cudaBayesreg Bayesian multilevel modeling for fMRI.

Axel Gandy Parallel Computing 58

Page 63: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

Introduction Parallel RNG Practical use of parallel computing (R)

Further Reading

I A tutorial on Parallel Computing:https://computing.llnl.gov/tutorials/parallel_comp/

I High Performance Computing task view on CRANhttp://cran.r-project.org/web/views/HighPerformanceComputing.html

I A talk on high performance comuting with R: http://dirk.eddelbuettel.com/papers/useR2010hpcTutorial.pdf

Axel Gandy Parallel Computing 59

Page 64: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

References

Part III

Appendix

Axel Gandy Appendix 60

Page 65: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

References

Topics in the coming lectures:

I Optimisation

I MCMC methods

I Bootstrap

I Particle Filtering

Axel Gandy Appendix 61

Page 66: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

References

References I

Gentle, J. (2003). Random Number Generation and Monte Carlo Methods.Springer.

Knuth, D. (1981). The art of computer programming. Vol. 2: Seminumericalalgorithms. Addison-Wesley.

Knuth, G. (1998). The Art of Computer Programming, Seminumerical Algorithms,Vol.2 .

L’Ecuyer, P. (1994). Uniform random number generation. Annals of OperationsResearch 53, 77–120.

L’Ecuyer, P. (2004). Random number generation. In Handbook of ComputationalStatistics: Concepts and Methods, Springer.

L’Ecuyer, P. (2006). Uniform random number generation. In Handbooks inOperations Research and Management Science, Elsevier.

L’Ecuyer, P., Simard, R., Chen, E. J. & Kelton, W. D. (2002). Anobjected-oriented random-number package with many long streams andsubstreams. Operations Research 50, 1073–1075. The code in C, C++, Java,and FORTRAN is available.

Axel Gandy Appendix 62

Page 67: Advanced Computational Methods in Statistics: Lecture 1 ...agandy/teaching/ltcc/... · Advanced Computational Methods in Statistics: Lecture 1 Monte Carlo Simulation & Parallel Computing

References

References IINiederreiter, H. (1992). Random number generation and quasi-Monte Carlo

methods. SIAM.

Axel Gandy Appendix 63