78
QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University A tutorial reflecting personal views on what is important in QMC. MCQMC 2016 will be at Stanford, August 14-19 mcqmc2016.stanford.edu MCMSki, January 2016

QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

  • Upload
    haquynh

  • View
    230

  • Download
    0

Embed Size (px)

Citation preview

Page 1: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 1

Quasi-Monte Carlo

Art B. Owen

Stanford University

A tutorial reflecting personal views on what is important in QMC.

MCQMC 2016 will be at Stanford, August 14-19

mcqmc2016.stanford.edu

MCMSki, January 2016

Page 2: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 2

These are the slides I presented at MCMSki 5 in Lenzerheide Switzerland on

Thursday January 8. I have added a few interstitial slides like this one after the

fact. I correct some typos too.

It was my good fortune that the tutorial I gave was opposite the Imposteriors

rehearsal in the other room. That gave the tutorial a near-plenary audience. While

I regret not reaching the Imposteriors as well, I count myself lucky for having such

good attendance.

What I present is a statistician’s view of QMC, and so my emphases are different

in some ways from the customary presentation of QMC.

Software Matlab has an implementation of scrambled Sobol’ and Halton points.

Start with the documentation for sobolset and haltonset. They support

skipping initial points and leaping (taking every k’th point for k > 1). I know of no

reason to ever use leaping. Skipping over bM values might be helpful to get

non-overlapping QMC sets of size bm for m 6M . Skipping can also improve

Halton points, but then again, randomizing without skipping should be as good or

better.

MCMSki, January 2016

Page 3: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 3

Outline1) What QMC is (stratification on steroids)

2) Why it works (discrepancy, variation and Koksma-Hlawka)

3) Digital constructions: van der Corput, Halton and nets

4) Lattices

5) Randomized QMC

6) Curse of dimension, tractability, weighted spaces

(room for Bayesian thinking here)

7) QMC ∩ Particles

8) QMC ∩ MCMC

MCMSki, January 2016

Page 4: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 4

But first: orientationThere are landmark papers where Monte Carlo was introduced/adapted for some

specific problems. Here is a subset of examples.

• Physics Metropolis et al. (1953)

• Chemistry (reaction equations) Gillespie (1977)

• Financial valuation Boyle (1977)

• Resampling Efron (1979)

• OR (discrete event simulation) Tocher & Owen (1960)

• Bayes (maybe 5 landmarks in early days)

• Nonsmooth optimization Kirkpatrick et al. (1983)

• Computer graphics (path tracing) Kajiya (1988)

MCMSki, January 2016

Page 5: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 5

Quasi-Monte CarloQMC is used in plain quadrature.

Particle transport methods in physics Jerome Spanier++

Financial valuation, some early examples Paskov & Traub 1990s

Graphical rendering Alex Keller++

(They got an Oscar!)

Solving PDEs Frances Kuo, Christoph Schwab++, 2015

Particle methods Chopin & Gerber (2015)

What next

I expect that there are undiscovered landmark applications of QMC, where

somebody applies/adapts/extends it for new problems, especially

• machine learning

• Bayes

• uncertainty quantification MCMSki, January 2016

Page 6: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 6

The next slide presents a spectrum of methods to suit problems ranging from

those so easy that they are considered solved to those so hard that they may

never be solved. Things at the top can be embedded in your toaster. Things at

the bottom may be nearly impossible or may require a team of PhDs to operate.

There are no PhDs inside your toaster.

This is a riff on a comment Stephen Boyd made.

Plain QMC handles problems a bit better behaved than plain MC and delivers

better results. To get at the really cutting edge problems tackled by methods at

the bottom of the list may require switching ’plain QMC’ to something else.

MCMSki, January 2016

Page 7: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 7

A spectrumThe harder the problem the further down this list we go.

• Everybody just knows it, e.g., 42

• We have closed form expression, e.g., sin(2πx1)× x2

• ∃ exact deterministic black box, b(x)

• classic quadratures, e.g., Simpson’s or Gauss rule

• plain quasi-Monte Carlo

• plain Monte Carlo

• MCMC, SMC

• ABC, aMCMC, variational MC...

• Noone will ever know

MCMSki, January 2016

Page 8: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 8

MC and QMCWe estimate

µ =

∫[0,1]d

f(x) dx by µ =1

n

n∑i=1

f(xi), xi ∈ [0, 1]d

In plain MC, the xi are IID U[0, 1]d. In QMC they’re ’spread evenly’.

Non uniform

µ =

∫f(x)p(x) dx and µ =

1

n

n∑i=1

f(ψ(xi)), xi ∈ [0, 1]d

Given suitable ψ

ψ(x) ∼ p whenever x ∼ U[0, 1]d

Many methods fit this framework. Acceptance-rejection is awkward.

MCMSki, January 2016

Page 9: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 9

Illustration

● ●

Monte Carlo

Fibonacci lattice

MC and two QMC methods in the unit square

Hammersley sequence

MC points always have clusters and gaps. What is random is where they appear.

QMC points avoid clusters and gaps to the extent that mathematics permits.

MCMSki, January 2016

Page 10: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 10

Philosophies I don’t share1) Zaremba (1968): The proper justification of the normal practice of Monte Carlo

integration must be based not on the randomness of the procedure, which is

spurious, but on equidistribution properties of the sets of points at which the

integrand values are computed.

I very much like how randomness lets you estimate your errors.

2) Also: it is possible to object to frequentist tools (LLN and CLT) being introduced

into Bayesian problems.

If it works, why not use it? This complaint is now very rare.

3) Pseudo-random numbers aren’t really random.

Yes, but they’re very well tested. We get more trouble from floating

point numbers representing reals than we do from pseudo-randomness

representing randomness.

MCMSki, January 2016

Page 11: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 11

Measuring uniformityWe need a way to verify that the points xi are ‘spread out’ in [0, 1]d.

The most fruitful way is to show that

U {x1,x2, . . . ,xn}.= U[0, 1]d

Discrepancy

A discrepancy is a distance ‖F − Fn‖ between measures

F = U[0, 1]d and Fn = U {x1,x2, . . . ,xn}.

There are many discrepancies.

Technicality

More properly: Distn(xi).= U[0, 1]d, for i ∼ U{1, 2, . . . , n}

There could be ties.

MCMSki, January 2016

Page 12: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 12

Local discrepancyDid the box [0,a) get it’s fair share of points?

● ●

a

0

0.6

0.70

b

0

0.42

0.45

Local discrepancy at a, b

δ(a) = Fn([0,a))− F ([0,a)) =13

32− 0.6× 0.7 = −0.01375

Star discrepancy

D∗n = D∗n(x1, . . . ,xn) = supa∈[0,1)d

|δ(a)|

For d = 1 this is Kolmogorov-Smirnov. MCMSki, January 2016

Page 13: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 13

More discrepanciesD∗n = sup

a∈[0,1)d

∣∣Fn([0,a))− F ([0,a))∣∣

Dn = supa,b∈[0,1)d

∣∣Fn([a, b))− F ([a, b))∣∣

D∗n 6 Dn 6 2dD∗n

Lp discrepancies

D∗pn =

(∫[0,1)d

|δ(a)|p da

)1/p

Also

Wrap-around discrepancies Hickernell

Discrepancies over (triangles, rotated rectangles, balls · · · convex sets · · · ).Beck, Chen, Schmidt

Best results are only for axis-aligned hyper-rectangles. MCMSki, January 2016

Page 14: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 14

Koksma’s inequalityFor d = 1 |µ− µ| 6 D∗n(x1, . . . , xn)×

∫ 1

0|f ′(x)|dx

NB:∫ 1

0|f ′(x)|dx is the total variation of f .

Koksma-Hlawka theorem∣∣∣∣∣ 1nn∑i=1

f(xi)−∫[0,1)d

f(x) dx

∣∣∣∣∣ 6 D∗n × VHK(f)

VHK is the total variation in the sense of Hardy (1905) and Krause (1903)

Puzzler

Is this a 100% confidence interval?

MCMSki, January 2016

Page 15: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 15

For d = 1, KoksmaWLOG 0 ≡ x0 6 x1 6 x2 6 · · · 6 xn 6 xn+1 ≡ 1

Integration and summation by parts∫ 1

0

f(x) dx = f(1)−∫ 1

0

xf ′(x) dx

1

n

n∑i=1

f(xi) = f(1)− 1

n

n∑i=0

i(f(xi+1)− f(xi))

= f(1)− 1

n

n∑i=0

i

∫ xi+1

xi

f ′(x) dx

After simplification, for continuous f ′

µ− µ =1

n

n∑i=1

f(xi)−∫ 1

0

f(x) dx = −∫ 1

0

δ(x)f ′(x) dx

Upshot

|µ− µ| 6 ‖δ‖p × ‖f ′‖q 1/p+ 1/q = 1 MCMSki, January 2016

Page 16: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 16

Rates of convergenceIt is possible to get D∗n =

log(n)d−1

n.

Then

|µ− µ| = o(n−1+ε) vs Op(n−1/2) for MC

What about those logs?

log(n)d−1/n can be very large if d is large.

The s dimensional coordinate projections (i.e., margins) of x1, . . . ,xn typically

have discrepancy O(log(n)s−1/n)

Log factors are not material for functions of low effective dimension (later).

Roth (1954)

D∗n of o( log(n)(d−1)/2

n

)is unattainable

Randomization

Can also control the log factors. (later) MCMSki, January 2016

Page 17: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 17

Tight and loose boundsThey are not mutually exclusive.

Koksma-Hlawka is tight

|µ− µ| 6 (1− ε)D∗n(x1, . . . ,xn)× VHK(f) fails for some f

KH holds as an equality for a worst case function like f ′.= ±δ.

Koksma-Hlawka is also very loose

It can greatly over-estimate actual error. Usually δ and f ′ are dissimilar.

µ− µ = −〈δ, f ′〉

Just like Chebychev’s inequality

It is also tight and very loose. E.g., Pr(|N (0, 1)| > 10

)6 0.01 is loose.

Yes: 1.5× 10−23 6 10−2

MCMSki, January 2016

Page 18: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 18

VariationMultidimensional variation can be a bit more subtle than one dimensional.

f(x1, x2) =

1, x1 + x2 6 1/2

0, else

VHK(f) =∞ on [0, 1]2

VHK(fε) <∞, for ‖f − fε‖1 < ε

Vitali variation

It is almost∫[0,1]d

| ∂d

∂x1,...,∂xdf(x)|dx

Vanishes if f does not depend on xj for some j ∈ 1, . . . , d.

Hardy-Krause variation

Sum of Vitali variations on ‘upper faces’ of [0, 1]d with 0 to d− 1 components

fixed at 1.

QMC-friendly discontinuities

Axis parallel. Xiaoqun Wang++ VHK <∞.MCMSki, January 2016

Page 19: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 19

Effective dimensionFunctional ANOVA decomposition Hoeffding (1948), Sobol’ (1969)

f(x) = µ+ f1(x1) + f2(x2) + · · ·+ fd(xd) + f1,2(x1, x2) + · · ·

Each of these sub-f ’s integrates to zero.

Then

µ−µ =1

n

n∑i=1

f1(xi1)+ · · ·+ 1

n

n∑i=1

fd(xid)+1

n

n∑i=1

f1,2(xi1, xi2)+ · · ·

If f is dominated by its main effects and low order interactions (Caflisch, Morokoff

& O (1997)) then the error is like a lower dimensional QMC error.

With some notational license

|µ− µ| 6∑

u⊆{1,...,d}

D∗n(x1u, . . . ,xnu)VHK(fu)

=∑

u⊆{1,...,d}

O( log(n)|u|−1

n

)VHK(fu)

MCMSki, January 2016

Page 20: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 20

For d = 1Points xi = (i− 1/2)/n minimize both Dn and D∗n.

But where do we put the n + 1’st point?

Extensible sequences

Take first n points of x1,x2,x3, . . . ,xn,xn+1,xn+2, . . . .

Then we can get D∗n = O((log n)d/n).

No known extensible constructions get O((log n)d−1/n).

MCMSki, January 2016

Page 21: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 21

van der Corputi φ2(i)

1 1 0.1 1/2 0.5

2 10 0.01 1/4 0.25

3 11 0.11 3/4 0.75

4 100 0.001 1/8 0.125

5 101 0.101 5/8 0.625

6 110 0.011 3/8 0.375

7 111 0.111 7/8 0.875

8 1000 0.0001 1/16 0.0625

9 1001 0.1001 9/16 0.5625

Take xi = φ2(i). Extensible with D∗n = O(log(n)/n).

Commonly xi = φ2(i− 1) starts at x1 = 0. MCMSki, January 2016

Page 22: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 22

Halton sequencesThe van der Corput trick works for any base. Use bases 2, 3, 5, 7, . . .

72 Halton points

864 Halton points

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

864 random points

Halton sequence in the unit square

Via base b digital expansions

i =K∑k=0

bkaik → φb(i) ≡K∑k=0

b−1−kaik

xi = (φ2(i), φ3(i), . . . , φp(i)) MCMSki, January 2016

Page 23: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 23

The Halton sequence is easy to implement. It is a good tool for testing whether

QMC will help on your problem. It is also well suited to homework problems.

If you need large prime numbers, then scrambling the digits in the Halton

sequence is recommended. Simply using a random permutation of them, as Giray

Okten advocates, will bring an improvement.

MCMSki, January 2016

Page 24: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 24

Digital netsHalton sequences are balanced if n is a multiple of 2a and 3b and 5c . . .

Digital nets use just one base b =⇒ balance all margins equally.

Elementary intervals

Some elementary intervals in base 5

MCMSki, January 2016

Page 25: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 25

Digital netsE =

s∏j=1

[ ajbkj

aj + 1

bkj

), 0 6 aj < bkj

(0,m, s)-net

n = bm points in [0, 1)s. If vol(E) = 1/n then E has one of the n points.

e.g. Faure (1982) points, prime base b > s

(t,m, s)-net

If E deserves bt points it gets bt points. Integer t > 0.

e.g. Sobol’ (1967) points base 2

Smaller t is better (but a construction might not exist).

minT project

Schurer & Schmid give bounds on t given b, m and s

Monographs

Niederreiter (1992) Dick & Pillichshammer (2010)MCMSki, January 2016

Page 26: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 26

Example nets

A (0,3,2) net

A (0,4,2) net

Two digital nets in base 5

The (0, 4, 2)-net is a bivariate margin of a (0, 4, 5)-net.

The parent net has 54 = 625 points in [0, 1)5.

It balances 43,750 elementary intervals.

Think of 43,750 control variates for 625 obs.

We should remove that diagonal striping artifact (later).MCMSki, January 2016

Page 27: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 27

Extensible netsNets can be extended to larger sample sizes.

It raises D∗n from O((log n)s−1/n) to O((log n)s/n)

(t, s)-sequence in base b

Infinite sequence of (t,m, s)-nets.

x1, . . . ,xbm︸ ︷︷ ︸ xbm+1, . . . ,x2bm︸ ︷︷ ︸ · · · xkbm+1, . . . ,x(k+1)bm︸ ︷︷ ︸ · · ·Simultaneously for all t > m.

(t,m, s)−net︸ ︷︷ ︸1st

(t,m, s)−net︸ ︷︷ ︸2nd

· · · (t,m, s)−net︸ ︷︷ ︸b’th︸ ︷︷ ︸

(t,m+1,s)−net

· · ·

Examples

Sobol’ b = 2 Faure t = 0 Niederreiter & Xing b = 2 (mostly)

MCMSki, January 2016

Page 28: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 28

Latticesxi =

( in,Z2i

n,Z3i

n, . . . ,

Zdi

n

)(mod 1) Zj ∈ N

Z = (1, Z2, Z3, . . . , Zd)

• the other main family of QMC points

• an extensive literature, e.g., Sloan & Joe also Kuo, Nuyens, Dick, Cools,

Hickernell, · · ·

• less benefit from randomization

z = (1,41)

z = (1,233)

z = (1,253)

Some lattice rules for n=377

MCMSki, January 2016

Page 29: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 29

Lattices ctdThe choice of Z = (1, Z2, Z3, . . . , Zd) ∈ Nd is critical.

We get to choose Z . =⇒ We can tune/optimize.

We have to choose Z . =⇒ We must search.

Korobov (1959) lattices

Z = (1, Z, Z2, . . . , Zd−1), Z ∈ N

Reduced search space. Minor performance penalty.

Extensibility

Lattices are not extensible. But ‘shifted lattices’ can be extended.

Hickernell, Hong, L’Ecuyer & Lemieux (2000)

MCMSki, January 2016

Page 30: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 30

Lattice strategySloan & Joe (1994) approach to lattices.

1) Suppose that f is periodic.

2) Upper bound |µ− µ| for smooth periodic functions.

3) Find lattice with small upper bound.

4) “Make f periodic.”

The last step involves replacing f by periodic f with∫f(x) dx =

∫f(x) dx.

MCMSki, January 2016

Page 31: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 31

I skipped over some of these slides, to just show and explain the dual lattices a

few slides hence.

This presentation of lattices is based on the book by Sloan and Joe. There has

been considerable progress since then and I’m hopeful that somebody will put it

together into a new monograph.

MCMSki, January 2016

Page 32: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 32

Smooth periodic functions

f(x + z) = f(x), ∀z ∈ Zd

Fourier

f(x) =∑h∈Zd

f(h)e2π√−1hTx (in mean square)

f(h) =

∫[0,1]d

f(x)e−2π√−1hTx

Substitute

µ =1

n

n∑i=1

f(xi) =∑h∈Zd

f(h)

(1

n

n∑i=1

e2πhTxi

)µ =

∫f(x) dx =

∑h∈Zd

f(h)

∫[0,1]d

e2πhTx dx = f(0)

MCMSki, January 2016

Page 33: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 33

Averaging sinusoids

xi =iZ

nmod 1

1

n

n∑i=1

e2πhTxi =

1 hTZ = 0 (mod n)

0 else.

µ− µ =∑h∈Zd

hTZ=0 (mod n)

f(h)− f(0)

Dual lattices

L⊥ = {h ∈ Zd | hTZ = 0 (mod n)}

Smooth f =⇒ f decays with |f(h)|.So search for Z with L⊥ \ {0} ‘far from origin’.

MCMSki, January 2016

Page 34: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 34

Dual lattice

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

n=144 z = (1,89)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

n=144 z = (1,5)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

n=144 z = (1,131)

−30 −10 0 10 20 30

−30

−10

010

2030

● ●● ●

●● ●

● ●●

● ●●

● ●● ●

●● ●

● ●●

● ●●

● ●● ●

●● ●

●● ●

● ●●

● ●● ●

●● ●

●● ●

● ●●

● ●●

●●

●● ●

●● ●

● ●●

●●

● ●● ●

●● ●

● ●●

● ●●

● ●● ●

●● ●

●● ●

● ●●

● ●● ●

●● ●

●● ●

● ●●

● ●●

●●

●● ●

●● ●

● ●●

● ●●

● ●● ●

●● ●

● ●●

● ●●

● ●● ●

●● ●

●●

●●

● ●

−30 −10 0 10 20 30

−30

−10

010

2030

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−30 −10 0 10 20 30

−30

−10

010

2030

●●

●● ●

● ●● ●

● ●● ●

● ●● ●

●●

●●

● ●● ●

● ●● ●

● ●● ●

● ●●

●●

●● ●

● ●● ●

● ●● ●

● ●● ●

●●

●●

● ●● ●

● ●● ●

●● ●

● ●●

●●

●● ●

● ●● ●

● ●● ●

● ●● ●

●●

●●

● ●● ●

● ●● ●

● ●● ●

● ●●

●●

●● ●

● ●● ●

● ●● ●

● ●● ●

●●

●●

● ●● ●

● ●● ●

● ●● ●

● ●● ●

●●

Some integration lattices

and their dual lattices MCMSki, January 2016

Page 35: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 35

Periodizing transformation∫[0,1]d

f(x) dx =

∫[0,1]d

f(τ(x))J(x) dx

τ = transformation

J = Jacobian

Pick τ so that J(x) vanishes on the boundary of [0, 1]d.

Better: make r derivatives of f ◦ τ × J vanish on ∂[0, 1]d.

Upshot

Good news: get very good convergence rate

Bad news: get strong curse of dimension in the constant.

MCMSki, January 2016

Page 36: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 36

QMC error estimation|µ− µ| 6 D∗n × VHK(f)

Not a 100% confidence interval

• D∗n is hard to compute

• VHK harder to get than µ

• VHK =∞ is common, e.g., f(x1, x2) = 1x1+x261

• We either get |µ− µ| <∞ or |µ− µ| 6∞(and maybe we already knew)

Also

Koksma-Hlawka is worst case. It can be very conservative.

MCMSki, January 2016

Page 37: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 37

Randomized QMC1) Make xi ∼ U[0, 1)d individually,

2) keeping D∗n = O(n−1+ε) collectively.

R independent replicates

µ =1

R

R∑r=1

µr

Var(µ) =1

R(R− 1)

R∑r=1

(µr − µ)2

If VHK(f) <∞ then

E((µ− µ)2) = O(n−2+ε)

Random shift Cranley & Patterson (1976)

Scrambled nets O (1995,1997,1998)

Survey in L’Ecuyer & Lemieux (2005)MCMSki, January 2016

Page 38: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 38

Rotation modulo 1

Before

Shift

After

Cranley−Patterson rotation

Shift the points by u ∼ U[0, 1)s with wraparound:

xi → xi + u (mod 1).

Commonly used on lattice rules.

Can also be used with nets.

At least it removes x1 = 0.MCMSki, January 2016

Page 39: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 39

Digit scrambling

1) Chop the space into b slabs. Shuffle

them.

2) Do the same within each of those b

slabs.

3) And so on within b2, b3, · · · sub-

slabs.

4) And the same for all s coordinates.

This operation yields xi ∼ U[0, 1)s and preserves the net property. O (1995)

MCMSki, January 2016

Page 40: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 40

Digit scramblingMathematically it is a permutation operation on the base b digits of

xij =

K∑k=1

aijk b−k → xij =

K∑k=1

aijk b−k

The mapping aijk → aijk = πjk(aijk) preserves the net property.

In the previous ‘nested’ scramble πjk also depends on aij1, . . . , aij,k−1

Simpler scrambles

Random linear permutations a→ g + h× amod p (prime p = b)

g ∼ U{0, 1, . . . , b− 1}, h ∼ U{1, 2, . . . , b− 1}Matousek (1998) has nested linear permutations

Digital shift aijk → aijk + gij (mod p)

For b = 2 xi → xi = xi ⊕ u bitwise XOR

MCMSki, January 2016

Page 41: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 41

Example scramblesTwo components of the first 530 points of a Faure (0, 53)-net in base 53.

Digital shift

Random linear

●●

●●

●●

●●

●●

●●

●●

●●

Nested uniform

Randomized Faure points

The digital shift is much like a Cranley-Patterson rotation.

It uses just one random u for all points: xi = xi ⊕ u.

Random linear Matousek (1998) and nested uniform O (1995) yield the same

variance.

MCMSki, January 2016

Page 42: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 42

Unscrambled FaureFirst n = 112 = 121 points of Faure (0, 11)-net in [0, 1]11.

−2 −1 0 1 2

−1.

0−

0.5

0.0

0.5

1.0

●●●●●●●●●●●●

● ●●

●●

●●●

●●●

●● ●●●

●●● ●

●●

●●

● ●

●●●●●●

●●● ●●●

●●●●

●●●

● ●

● ●

●●●

●●●●

●●● ●●●

●●●●●●

● ●

●●

●●

● ●●●

●●● ●●

●●●

●●●

●●

●● ●

2x2 ++ x3 −− x6 ++ x8 −− 2x10 −− x11

x 3−−

x 6−−

x 8++

x 11

−1.0 −0.5 0.0 0.5 1.0−

1.0

−0.

50.

00.

51.

0

●●●●●●●●●●●● ● ●

●●●●●●

●● ●● ●●

●●

●●

●● ●●● ●●

●●●

●●● ●

●●

● ●●●

●●●●

● ●

●●●

● ●

● ●

●●●

● ●

●●●●

●●● ●

●●

● ●●●

●●●

●● ●●● ●●

●●

●●

●● ●● ●●

●●●●●●

● ● ●

x4 ++ x6 −− x10 −− x11

x 1−−

x 2−−

x 8++

x 9

Two projections of 121 Faure points

Unscrambled points are very structured.

MCMSki, January 2016

Page 43: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 43

Sobol’ projectionsUsing code from Bratley & Fox (1988).

Implementations differ based on choices of ‘direction numbers’.

x2 versus x1

●●

●●

x38 versus x37

●●

●●

x26 versus x31

Three projections of 1024 Sobol points

At some higher sample size the empty blocks in x26 vs x31 fill in.

Expect empty blocks in higher order margins to last longer than those in low

dimensional margins.

MCMSki, January 2016

Page 44: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 44

Scrambled net propertiesUsing σ2 =

∫(f(x)− µ)2 dx

If Then N.B.

f ∈ L2 Var(µ) = o(1/n) even if VHK(f) =∞f ∈ L2 Var(µ) 6 Γt,b,s σ

2/n for t = 0, Γ 6 exp(1).= 2.718

∂1,2,...,sf ∈ L2 Var(µ) = O(log(n)s−1/n3) O (1997,2008)

Γ <∞ rules out (log n)s−1 catastrophe at finite n.

Loh (2003) has a CLT for t = 0 (and fully scrambled points).

Geometrically

Scrambling breaks up the horizontal striping of the Faure nets.

Scrambling Sobol’ points moves the full/empty blocks around.

Improved rate

RMSE is O(n−1/2) better than QMC rate (cancellation).

Holds for nested uniform and nested linear scrambles.MCMSki, January 2016

Page 45: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 45

I suspect that the CLT will not hold for random linear scrambles.

Hickernell and Yue found fourth moment differences in discrepancies for the two

different kinds of scrambles.

I have a paper on recycling physical random numbers EJS (2009) where the

result always has an asymptotically symmetric distribution which is never

Gaussian. That is what lead me to guess (still unverified) that the CLT might not

hold for the nested linear scramble of Matousek.

MCMSki, January 2016

Page 46: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 46

Scrambling vs shiftingConsider n = 2m points in [0, 1).

QMC

van der Corput points (i− 1)/n for i = 1, . . . , n.

• · · · · · · · · · |• · · · · · · · · ·|• · · · · · · · · ·|• · · · · · · · · ·|

Shift

Shift all points by U ∼ U(0, 1) with wraparound.

Get one point in each [(i− 1)/n, i/n)

· · · • · · · · · · |· · · • · · · · · ·|· · · • · · · · · ·|· · · • · · · · · ·|

Scramble

Get a stratified sample, independent xi ∼ U[(i− 1)/n, i/n)

· · · • · · · · · · |· · · · · · · · • ·|· · · · · • · · · ·|· · • · · · · · · ·|

Random errors cancel yielding an O(n−1/2) improvement.MCMSki, January 2016

Page 47: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 47

Higher order netsResults from Dick, Baldeaux

Start with a net in 2s dimensions.

‘Interleave’ digits of two variables to make a new one

0.g1g2g3 · · · and 0.h1h2h3 · · · → 0.g1h2g2h2 · · · .

Get a ’higher order’ net in s dimensions.

Error is O(n−2+ε) under increased smoothness.

(partial twice wrt each input)

Scrambling gets RMSE O(n−2−1/2+ε)

Even higher

Start with ks dimensions interleave down to s.

Get O(n−k+ε) and O(n−k−1/2+ε) (under still higher smoothness)

Upshot

Not yet widely used. Takes a lot of smoothness and a lot of inputs.MCMSki, January 2016

Page 48: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 48

The curse of dimensionCurse of dimension: larger d makes integration harder.

CrM =

{f : [0, 1]d → R

∣∣∣∣ ∣∣∣∏j

∂αj

∂xαj

j

f∣∣∣ 6M,

∑j

αj = r, αj > 0

}Bahkvalov I:

For any x1, . . . ,xn ∈ [0, 1]d there is f ∈ CrM with |µn − µ| > kn−r/d

Ordinary QMC like r = d

Bahkvalov II:

Random points can’t beat RMSE O(n−r/d−1/2)

Ordinary MC like r = 0

MCMSki, January 2016

Page 49: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 49

What if we beat those rates?Sometimes we get high accuracy for large d.

It does not mean we beat the curse of dimensionality.

Bahkvalov never promised universal failure.

Only the existence of hard cases.

We may have just had an easy, non-worst case function.

Two kinds of easy

• Truncation: only the first s� d components of x matter

• Superposition: the components only matter “s at a time”

Either way

f might not be “fully d-dimensional”.

MCMSki, January 2016

Page 50: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 50

Lifted curseTwo main tools to describe it

• Weighted spaces and tractability

• ANOVA and effective dimension

Implications

Neither causes the curse to be lifted. They describe the happy circumstance

where the curse did not apply.

Both leave important gaps described below.

MCMSki, January 2016

Page 51: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 51

Weighted spacesHickernell (1996), Sloan & Wozniakowski (1998)

∂u ≡∏j∈u

∂xjassume ∂1:df exists

Inner product, weights γu > 0

〈f, g〉 =∑u⊆1:d

1

γu

∫[0,1]u

(∫[0,1]−u

∂uf(x) dx−u

)(∫[0,1]−u

∂ug(x) dx−u

)dxu

Function ball Bγ,C = {f | 〈f, f〉γ 6 C}Small γu =⇒ small ‖∂uf‖ in ball.

Product weights

γu =∏j∈u γj where γj decrease rapidly with j.

Now f ∈ Bγ,C implies ∂uf small when |u| large.

MCMSki, January 2016

Page 52: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 52

TractabilityErr(d, n) ≡ Worst case error in function ball Bγ

n∗(d, ε) ≡ First n with Err(d, n) 6 ε× Err(d, 0)

Weak tractability

n∗(d, ε) = poly(d, 1/ε)

lim supd→∞

∑dj=1 γj

log d<∞ suffices

Strong tractability

n∗(d, ε) = poly(1/ε)∞∑j=1

γj <∞ suffices

Many versions and many contributors

Conditions above are for spaces described by Sloan & Wozniakowski (1998)

Novak, Wasilkowski, Hickernell, Dick, Kuo . . . MCMSki, January 2016

Page 53: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 53

ANOVA and scrambled netsWrite the ANOVA of f

f(x) =∑

u⊆{1,2,...,s}

fu(x)

Term by term

µ =∑

u⊆{1,2,...,s}

fu where fu =1

n

n∑i=1

fu(xi)

Var(µ) =∑

u⊆{1,2,...,s}

Var(fu) O (1997)

Expect O(n−3+ε) from small |u| and O(n−1) from large |u|

Lower order fu are formed by integration. Can be smoother than f .

Griebel, Kuo, Sloan (2010, 2013, 2014)

MCMSki, January 2016

Page 54: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 54

Effective dimension s 6 dCaflisch, Morokoff & O (1997)

Often f is dominated by its low order interactions.

Then RQMC may make a huge improvement.

Let σ2u = Var(fu) variance component

Truncation sense∑u⊆1:s

σ2u > 0.99

∑u⊆1:d

σ2u

Superposition sense∑|u|6s

σ2u > 0.99

∑u⊆1:d

σ2u

Mean dimension∑u |u|σ2

u∑u σ

2u

Liu & O (2006) (Easier to estimate · · · another story)MCMSki, January 2016

Page 55: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 55

ExampleKuo, Schwab, Sloan (2012) consider quadrature for

f(x) =1

1 +∑dj=1 x

αj /j!

, 0 < α 6 1.

For α = 1 and d = 500

R = 50 replicated estimates of∑v |v|σ2

v/σ2 using n = 10,000

had mean 1.0052 and standard deviation 0.0058.

Upshot

f(x) is nearly additive

mean dimension between 1.00356 and 1.00684

(±2 standard errors)

MCMSki, January 2016

Page 56: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 56

Weighted spaces, γu ensure good performance for a class of f .

/ Tractability conditions greatly restrict f .

, QMC can be tuned to given γ e.g., Dick & Pillichshammer (2010)

/ We don’t measure/observe a true γ.

Effective dimension, Can be measured for a given f .

/ Does not imply smooth interactions fu.

, Anova components can be smoother than f Sloan, Kuo, Griebel.

MCMSki, January 2016

Page 57: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 57

Description vs engineeringEffective dimension and weighted spaces both describe settings where f can be

integrated well.

Some authors try to engineer changes in f to make it more suited to QMC.

E.g., cram importance into first few components of x.

MC vs QMC

MC places lots of effort on variance reduction.

For QMC we gain by reducing effective dimension.

Or Hardy-Krause variation (but there asymptotics are slow).

E.g., turning u ∼ U[0, 1]d into z ∼ N (0,Σ)

The choice of Σ1/2 affects QMC performance.

Caflisch, Morokoff & O (1997)

Acworth, Broadie & Glasserman (1998)

Imai & Tan (2014)

MCMSki, January 2016

Page 58: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 58

Sampling Brownian motionFeynman-Kac/Brownian bridge

First few variables define a ‘skeleton’. The rest fill in.

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

Time t

B(t

) ●

●●

Brownian bridge construction of Brownian motion

See also Mike Giles++ on multi-level MC. MCMSki, January 2016

Page 59: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 59

Acworth, Broadie and Glasserman simulate BM in terms of its principal

components. If the first few (or any few) PCs dominate the evaluation function,

then it reduces effective dimension.

There can be no universal BM generator that is best for all integrands.

Papageorgiou (2002) exhibits a financial option for which the ordinary step by

step construction of Brownian motion works better than the Brownian bridge

construction.

Imai and Tan have a series of papers tuning the generation of Brownian motion to

specific integrands.

The book edited by Schreider (1964) “Method of statistical testing, Monte Carlo

method” includes a Chapter II by I. M. Sobol’ in which the Brownian bridge

construction is described. Sobol’ cites a 1958 article by Gel’fand, Frolov and

Chentsov “The evaluation of Wiener integrals by the Monte Carlo method”. Izv.

vys. uch. zav. ser. matem., No. 5, 32–45

MCMSki, January 2016

Page 60: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 60

Choosing γEach γ corresponds to a reproducing kernel Hilbert space (RKHS)

The question

Which RKHS should we use in a given problem?

H1 or H2 or · · · or HJ · · ·

1) sometimes f ∈ Hj all j = 1, . . . , J

and f ∈ H1 vsH2 have very different implications

2) sometimes f belongs to none of them.

while |f − f | 6 ε where f ∈ H

Bayes and empirical Bayes ideas might help chooseH

Maybe we want anH where f is ‘typical’.

A natural γ has

γu ∝∫[0,1]d

(∂uf(x)

)2dx

NB: the constant of proportionality is also important. MCMSki, January 2016

Page 61: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 61

AsymptoticsTwo ways to think about asymptotics.

View 1

We really want to know what happens in the limit as n→∞.

View 2

We want an approximation for finite n, like 1n ≈ 0.

For QMC

If the dimension is large, then the asymptotics solve view 1 not view 2.

Practical n are likely to be on an ‘initial transient’

Sometimes the asymptote ‘sets in’ at n ≈ 4d.

For MC

Var(µ) = σ2/n ‘sets in’ for n > 1.

MCMSki, January 2016

Page 62: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 62

Sequential QMCJRSS-B discussion paper by Chopin & Gerber (2015)

N particles in Rd for T time steps

Advance xnt to xn,t+1 = φ(xnt,un,t+1) each u ∈ [0, 1)S .

Do them all with Ut+1 ∈ [0, 1)N×S . Usually IID.

For QMC

We want rows of Ut+1 to be ‘balanced’ wrt position of xtEasy if d = S = 1 Lecot & Tuffin (2004)

Take Ut+1 ∈ [0, 1)N×2

Sort first column along xn,t. Update points with remaining column.

What to do if d > 1?

Alignment is tricky because Rd is not ordered.

Related work by L’Ecuyer, Lecot, L’Archeveque-Gaudet on array-RQMC.

d-dimensional alignments. Huge variance reductions. Limited theory.MCMSki, January 2016

Page 63: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 63

Using Hilbert curvesThese are space-filling curves as m→∞:

m= 1 m= 2 m= 3 m= 4

Chopin & Gerber thread a curve through xn,t ∈ Rd

Align particles via first column of RQMC points Ut+1 ∈ [0, 1)N×(1+S).

Use remaining S columns to advance them.

Results

Squared error is o(N−1).

Good empirical results for modest d

NB:

I left out lots of details about particle weighting. MCMSki, January 2016

Page 64: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 64

Recipe for QMC in MCMCFor MCMC we follow one particle.

1) Step xi ← ψ(xi−1,vi) for vi ∈ (0, 1)d.

2) For v1 = (u1, . . . , ud), v2 = (ud+1, . . . , u2d) etc.

n steps require u1, . . . , und ∈ (0, 1)

3) MCMC uses ui ∼ U(0, 1)

4) Replace IID by N = nd ‘balanced’ points

Reasons for caution

1) We’re using 1 point in [0, 1]nd with n→∞

2) Our xi won’t be Markovian

MCMSki, January 2016

Page 65: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 65

QMC ∩ MCMCEarly references

Chentsov (1967)

Plugs in ‘completely uniformly distributed’ points. (defined later)

Samples in finite state space by inversion.

Shows consistency.

Beautiful coupling argument. (Resembles coupling from the past.)

Sobol’ (1974)

Has n×∞ points xij ∈ [0, 1]

Samples from a row until a return to start state, then goes to next row

Gets rate O(1/n) · · · if transition probabilities are a/2b for integers a, b

MCMSki, January 2016

Page 66: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 66

MCMC≈ QMCT

Method Rows Columns

QMC n points d variables 1 6 d� n→∞MCMC r replicates n steps 1 6 r � n→∞

QMC MCMC

QMC based on equidistribution MCMC based on ergodicity

MCMSki, January 2016

Page 67: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 67

Severe failure is possiblevan der Corput ui ∈ [0, 1/2) ⇐⇒ ui+1 ∈ [1/2, 1)

ui+1 vs ui

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

High proposal ⇐⇒ low acceptance and vice versa

Morokoff and Caflisch (1993) describe heat particle leaving regionMCMSki, January 2016

Page 68: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 68

Completely uniformly distributedu1, u2, · · · ∈ [0, 1] are CUD if

D∗n(z1, . . . ,zn)→ 0, where

zi = (ui, . . . , ui+d−1)

For all d > 1

Overlapping blocks

z1 = (u1, . . . , ud)

z2 = (u2, . . . , ud+1)

......

zn = (un, . . . , un+d−1)

Chentsov (1967) shows we can use non-overlapping blocks

vi = (ud(i−1)1 , . . . , udi) ∀dMCMSki, January 2016

Page 69: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 69

CUD ctd

CUD≡ one of Knuth’s definitions of randomness

Recommendations

1) Use all the d-tuples from your RNG

2) Be sure to pick a small RNG

As considered in

Niederreiter (1986)

Entacher, Hellekalek, and L’Ecuyer (1999)

L’Ecuyer and Lemieux (1999)

MCMSki, January 2016

Page 70: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 70

Some results1) Metropolis on CUD points consistent in finite state space O & Tribble (2005)

2) Weakly CUD random points ok Tribble & O (2008)

3) Consistency in continuous problems (Gibbs and Metropolis)

Chen, Dick & O (2010)

4) Chen (2011) rate O(n−1+ε) for some smooth ARMAs

MCMSki, January 2016

Page 71: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 71

Related refsLiao (1998) reorders QMC points for MCMC

Craiu & Lemieux (2007) QMC in multiple-try Metropolis

Lemieux & Sidorsky (2006) QMC in exact sampling

Lemieux, Ormoneit, Fleet (2001) QMC for particle filters

Frigessi, Gasemyr, Rue (2000) MCMC ∩ antithetics

Craiu, Meng (2004) MCMC ∩ Latin hypercubes

Propp (2004) Rotor-Router

MCMSki, January 2016

Page 72: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 72

Thesis of Tribble (2007)Results on CUD and weak CUD

Constructions of small RNGs

Examples with Gibbs and Metropolis

MCMSki, January 2016

Page 73: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 73

Variance reductions from Tribblen = 210 n = 212 n = 214

Data sets min max min max min max

Pumps (d = 11) 286 1543 304 5003 1186 16089

Vasorestriction (d = 42) 14 15 56 76 108 124

Pumps: hierarchical Poisson-Gamma model.

Vasorestriction: probit model 3 coefficients, 39 latent variables.

Min & max variance reductions for all pump and all non-latent vaso. parameters.

Details

Targets are posterior means of parameters.

CUD points were LFSR, with Cranley-Patterson rotations.

MCMSki, January 2016

Page 74: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 74

Chen, Dick & O Annals Stat.Chen, Dick & O extend consistency to continuous state spaces.

MCMC remains consistent when driven by u1, u2, . . . , if

1) ui are CUD (or CUD in probability)

2) m-step transitions are Riemann integrable ∀m > 1, and

3) • for Metropolis-Hastings: there is a coupling region

(Independence sampler can have one)

• for Gibbs: there is a contraction property

(Gibbs for probit model proven to contract)

Either way, the chain has to forget it’s past.

That could be by regeneration or exponential decay.

Aside: Henri Lebesgue made our lives so much easier!

MCMSki, January 2016

Page 75: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 75

Small RNGsMatsumoto & Nishimura sent us some small linear feedback shift register RNGs

Similar equidistribution properties to “small Mersenne twisters” but not necessarily

the same constructions.

They come in sizes M = 2m − 1 for 10 6 m 6 32.

u1, u2, . . . , uM

We explore them for some simulations.

Prepend one or more 0s:

0, . . . , 0, u1, . . . , uM

put into a matrix and apply Cranley-Patterson rotations

MCMSki, January 2016

Page 76: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 76

SummaryBivariate Gaussian apparent better convergence rate for mean

Hit and run, volume estimator no improvement

M/M/1 queue, average wait mixed results

Garch some big improvements

Heston stochastic volatility big improvements for in the money case

Synopsis

The smoother the problem, the more CUD points can improve.

Same as for finite dimensional QMC.

MCMSki, January 2016

Page 77: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 77

What next?QMC helps MCMC the most when transitions are smooth.

i.e., no acceptance rejection.

Some thoughts

• Gibbs is promising. Maybe tweak JAGS.

• Maybe Hamiltonian dynamics using high acceptance.

• When there is acceptance-rejection, maybe the optimal acceptance rate for

QMC is higher.

MCMSki, January 2016

Page 78: QMC tutorial 1 Quasi-Monte Carlo - Stanford Universitystatweb.stanford.edu/~owen/pubtalks/qmctutorial-annotated.pdf · QMC tutorial 1 Quasi-Monte Carlo Art B. Owen Stanford University

QMC tutorial 78

Thanks• QMC ∩ MCMC co-authors:

Seth Tribble, Su Chen, Josef Dick, Makoto Matsumoto, Takuji Nishimura

• MCMSki organizers

• Nicolas Chopin

• NSF

MCMSki, January 2016