45
SAMSI Monte Carlo workshop 1 Important sampling the union of rare events, with an application to power systems analysis Art B. Owen Stanford University With Yury Maximov and Michael Chertkov of Los Alamos National Laboratory. December 15, 2017

Important sampling the union of rare events, with an ...owen/pubtalks/samsi.pdfSAMSI Monte Carlo workshop 1 Important sampling the union of rare events, with an application to power

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • SAMSI Monte Carlo workshop 1

    Important sampling the union of

    rare events, with an application

    to power systems analysis

    Art B. Owen

    Stanford University

    With Yury Maximov and Michael Chertkov

    of Los Alamos National Laboratory.

    December 15, 2017

  • SAMSI Monte Carlo workshop 2

    MCQMC 2018

    July 1–8, 2018

    Rennes, France

    MC, QMC, MCMC, RQMC, SMC, MLMC, MIMC, · · ·

    Call for papers is now online:

    http://mcqmc2018.inria.fr/

    December 15, 2017

  • SAMSI Monte Carlo workshop 3

    Rare event samplingMotivation: an electrical grid has N nodes. Power p1, p2, · · · , pN• Random pi > 0, e.g., wind generation,

    • Random pi < 0, e.g. consumption,

    • Fixed pi for controllable nodes,

    AC phase angles θi

    • (θ1, . . . , θN ) = F(p1, . . . , pN )

    • Constraints: |θi − θj | < θ̄ if i ∼ j

    • Find P(

    maxi∼j |θi − θj | > θ̄)

    Simplified model

    • (p1, . . . , pN ) Gaussian

    • θ linear in p1, . . . , pN

    December 15, 2017

  • SAMSI Monte Carlo workshop 4

    Gaussian setup

    For x ∼ N (0, Id), Hj = {x | ωTj x > τj} find µ = P(x ∈ ∪Jj=1Hj).WLOG ‖ωj‖ = 1. Ordinarily τj > 0. d hundreds. J thousands.

    −10 −5 0 5 10

    −6

    −4

    −2

    02

    46

    c(−

    7, 6

    )

    Solid: deciles of ‖x‖. Dashed: 10−3 · · · 10−7. December 15, 2017

  • SAMSI Monte Carlo workshop 5

    Basic bounds

    Let Pj ≡ P(ωTj x > τj) = Φ(−τj)

    Then max16j6J

    Pj ≡ µ 6 µ 6 µ̄ ≡J∑j=1

    Pj

    Inclusion-exclusion

    For u ⊆ 1:J = {1, 2, . . . , J}, let

    Hu = ∪j∈uHj Hu(x) ≡ 1{x ∈ Hu} Pu = P(Hu) = E(Hu(x))

    µ =∑|u|>0

    (−1)|u|−1Pu

    Plethora of bounds

    Survey by Yang, Alajaji & Takahara (2014)

    December 15, 2017

  • SAMSI Monte Carlo workshop 6

    Other uses• Other engineering reliability

    • False discoveries in statistics: J correlated test statistics

    • Speculative: inference after model selectionTaylor, Fithian, Markovic, Tian

    December 15, 2017

  • SAMSI Monte Carlo workshop 7

    Importance samplingFor x ∼ p, seek η = Ep(f(x)) =

    ∫f(x)p(x) dx. Take

    η̂ =1

    n

    n∑i=1

    f(xi)p(xi)

    q(xi), xi

    iid∼ q

    Unbiased if q(x) > 0 whenever f(x)p(x) 6= 0.

    Variance

    Var(η̂) = σ2q/n, where

    σ2q =

    ∫f2p2

    q− µ2 = · · · =

    ∫(fp− µq)2

    q

    Num: seek q ≈ fp/µ, i.e., nearly proportional to fpDen: watch out for small q.

    December 15, 2017

  • SAMSI Monte Carlo workshop 8

    Self-normalized IS

    η̂SNIS =

    ∑ni=1 f(xi)p(xi)/q(xi)∑n

    i=1 p(xi)/q(xi)

    Available for unnormalized p and / or q.

    Good for Bayes, limited effectiveness for rare events.

    Optimal SNIS puts 1/2 the samples in the rare event.

    Optimal plain IS puts all samples there.

    Best possible asymptotic coefficient of variation is 2/√n.

    December 15, 2017

  • SAMSI Monte Carlo workshop 9

    Mixture samplingFor αj > 0,

    ∑j αj = 1

    η̂α =1

    n

    n∑i=1

    f(xi)p(xi)∑j αjqj(xi)

    , xiiid∼ qα ≡

    ∑j

    αjqj

    Defensive mixtures

    Take q0 = p, α0 > 0. Get p/qα 6 1/α0. Hesterberg (1995)

    Additional refs

    Use∫qj = 1 as control variates. O & Zhou (2000).

    Optimization over α is convex. He & O (2014).

    Multiple IS. Veach & Guibas (1994). Veach’s Oscar.

    Elvira, Martino, Luengo, Bugallo (2015) Generalizations.

    December 15, 2017

  • SAMSI Monte Carlo workshop 10

    Instantonsµj = arg maxx∈Hj p(x) Chertkov, Pan, Stepanov (2011)

    −10 −5 0 5 10

    −6

    −4

    −2

    02

    46

    c(−

    7, 6

    )

    • Solid dots µj• Initial thought: qj = N (µj , I)• and qα =

    ∑j αjqj

    • I.e., mixture of exponential tilting

    Conditional sampling

    qj = L(x | x ∈ Hj) = p(x)Hj(x)/Pj

    December 15, 2017

  • SAMSI Monte Carlo workshop 11

    Mixture of conditional sampling

    α0, α1, . . . , αJ > 0,∑j

    αj = 1, qα =∑j

    αjqj , q0 ≡ p

    µ = P(x ∈ ∪Jj=1Hj) = P(x ∈ H1:J) = E(H1:J(x)

    )Mixture IS

    µ̂α =1

    n

    n∑i=1

    H1:J(xi)p(xi)∑Jj=0 αjqj(xi)

    (where qj = pHj/Pj )

    =1

    n

    n∑i=1

    H1:J(xi)∑Jj=0 αjHj(xi)/Pj

    with H0 ≡ 1 and P0 = 1.

    It would be nice to have αj/Pj constant.

    December 15, 2017

  • SAMSI Monte Carlo workshop 12

    ALOREAt Least One Rare Event

    Like AMIS of Cornuet, Marin, Mira, Robert (2012)

    Put α∗0 = 0, and take α∗j ∝ Pj . That is

    α∗j =Pj∑J

    j′=1 Pj′=Pjµ̄, ( µ̄ is the union bound )

    Then

    µ̂α∗ =1

    n

    n∑i=1

    H1:J(xi)∑Jj=1 αjHj(xi)/Pj

    = µ̄× 1n

    n∑i=1

    H1:J(xi)∑Jj=1Hj(xi)

    If x ∼ qα∗ then H1:J(x) = 1. So

    µ̂α∗ = µ̄×1

    n

    n∑i=1

    1

    S(xi), S(x) ≡

    J∑j=1

    Hj(x)

    December 15, 2017

  • SAMSI Monte Carlo workshop 13

    Prior workAdler, Blanchet, Liu (2008, 2012) estimate

    w(b) = P(

    maxt∈T

    f(t) > b)

    for a Gaussian random field f(t) over T ⊂ Rd.They also consider a finite set T = {t1, . . . , tN}.

    Comparisons

    • We use the same mixture estimate as them for finite T and Gaussian data.

    • Their analysis is for Gaussian distributions.We consider arbitrary sets of J events.

    • They take limits as b→∞.We have non-asymptotic bounds and more general limits.

    • Our analysis is limited to finite sets.They handle extrema over a continuum.

    December 15, 2017

  • SAMSI Monte Carlo workshop 14

    TheoremO, Maximov & Cherkov (2017)

    Let H1, . . . ,HJ be events defined by x ∼ p.Let qj(x) = p(x)Hj(x)/Pj for Pj = P(x ∈ Hj).Let xi

    iid∼ qα∗ =∑Jj=1 α

    ∗jqj for α

    ∗j = Pj/µ̄.

    Take

    µ̂ = µ̄× 1n

    n∑i=1

    1

    S(xi), S(xi) =

    J∑j=1

    Hj(xi)

    Then E(µ̂) = µ and

    Var(µ̂) =1

    n

    (µ̄

    J∑s=1

    Tss− µ2

    )6µ(µ̄− µ)

    n

    where Ts ≡ P(S(x) = s).The RHS follows because

    J∑s=1

    Tss

    6J∑s=1

    Ts = µ.December 15, 2017

  • SAMSI Monte Carlo workshop 15

    RemarksWith S(x) equal to the number of rare events at x,

    µ̂ = µ̄× 1n

    n∑i=1

    1

    S(xi)

    1 6 S(x) 6 J =⇒ µ̄J

    6 µ̂ 6 µ̄

    Robustness

    The usual problem with rare event estimation is getting no rare events in n tries.

    Then µ̂ = 0.

    Here the corresponding failure is never seeing S > 2 rare events.

    Then µ̂ = µ̄, an upper bound and probably fairly accurate if all S(xi) = 1

    Conditioning vs sampling fromN (µj, I)Avoids wasting samples outside the failure zone.

    Avoids awkward likelihood ratios.

    December 15, 2017

  • SAMSI Monte Carlo workshop 16

    General GaussiansFor y ∼ N (η,Σ) let the event be γTj y > κj .

    Translate into xTωj > τj for

    ωj =γTj Σ

    1/2

    γTj Σγjand τj =

    κj − γTj ηγTj Σγj

    Adler et al.

    Their context has all κj = b

    December 15, 2017

  • SAMSI Monte Carlo workshop 17

    SamplingGet x ∼ N (0, I), such that xTω > τ .y will be xTω

    1) Sample u ∼ U(0, 1).2) Let y = Φ−1(Φ(τ) + u(1− Φ(τ))). May easily get y =∞.3) Sample z ∼ N (0, I).4) Deliver x = ωy + (I − ωωT)z.

    Better numerics from

    1) Sample u ∼ U(0, 1).2) Let y = Φ−1(uΦ(−τ)).3) Sample z ∼ N (0, I).4) Let x = ωy + (I − ωωT)z.5) Deliver x = −x.

    I.E., sample x ∼ N (0, I) subject to xTω 6 −τ and deliver−x.

    Step 4 via ωy + z − ω(ωTz).December 15, 2017

  • SAMSI Monte Carlo workshop 18

    More boundsRecall that Ts = P(S(x) = s). Therefore

    µ̄ =J∑j=1

    Pj = E

    (J∑j=1

    Hj(x)

    )= E(S) =

    J∑s=1

    sTs

    µ = E(

    max16j6J

    Hj(x))

    = P(S > 0), so

    µ̄ = E(S | S > 0)× P(S > 0) = µ× E(S | S > 0)

    Therefore

    n×Var(µ̂) = µ̄J∑s=1

    Tss− µ2

    =(µE(S | S > 0)

    )(µE(S−1 | S > 0)

    )− µ2

    = µ2(E(S | S > 0)E(S−1 | S > 0)− 1

    )December 15, 2017

  • SAMSI Monte Carlo workshop 19

    Bounds continued

    Var(µ̂) 61

    n

    (E(S | S > 0)× E(S−1 | S > 0)− 1

    )Lemma

    Let S be a random variable in {1, 2, . . . , J} for J ∈ N. Then

    E(S)E(S−1) 6J + J−1 + 2

    4with equality if and only if S ∼ U{1, J}.

    Corollary

    Var(µ̂) 6µ2

    n

    J + J−1 − 24

    .

    Thanks to Yanbo Tang and Jeffrey Negrea for an improved proof of the lemma.December 15, 2017

  • SAMSI Monte Carlo workshop 20

    Numerical comparisonR function mvtnorm of Genz & Bretz (2009) gets

    P(a 6 y 6 b

    )= P

    (∩j{aj 6 yj 6 bj}

    )for y ∼ N (η,Σ)

    Their code makes sophisticated use of quasi-Monte Carlo.

    Adaptive, up to 25,000 evals in FORTRAN.

    It was not designed for rare events.

    It computes an intersection.

    Usage

    Pack ωj into Ω and τj into T

    1− µ = P(ΩTx 6 T ) = P(y 6 T ), y ∼ N (0,ΩTΩ)

    It can handle up to 1000 inputs. IE J 6 1000

    Upshot

    ALORE works (much) better for rare events.

    mvtnorm works better for non-rare events.December 15, 2017

  • SAMSI Monte Carlo workshop 21

    Polygon exampleP(J, τ) regular J sided polygon in R2 outside circle of radius τ > 0.

    P =

    x∣∣∣∣∣sin(2πj/J)

    cos(2πj/J)

    T x 6 τ, j = 1, . . . , J

    a priori bounds

    µ = P(x ∈ Pc) 6 P(χ2(2) > τ2) = exp(−τ2/2)

    This is pretty tight. A trigonometric argument gives

    1 >µ

    exp(−τ2/2)> 1−

    (J tan

    (πJ

    )− π

    )τ2

    .= 1− π

    2τ2

    6J2

    Let’s use J = 360

    So µ.= exp(−τ2/2) for reasonable τ .

    December 15, 2017

  • SAMSI Monte Carlo workshop 22

    IS vs MVNALORE had n = 1000. MVN had up to 25,000. 100 random repeats.

    τ µ E((µ̂ALORE/µ− 1)2) E((µ̂MVN/µ− 1)2)

    2 1.35×10−01 0.000399 9.42×10−08

    3 1.11×10−02 0.000451 9.24×10−07

    4 3.35×10−04 0.000549 2.37×10−02

    5 3.73×10−06 0.000600 1.81×10+00

    6 1.52×10−08 0.000543 4.39×10−01

    7 2.29×10−11 0.000559 3.62×10−01

    8 1.27×10−14 0.000540 1.34×10−01

    For τ = 5 MVN had a few outliers.

    December 15, 2017

  • SAMSI Monte Carlo workshop 23

    Polygon againALORE

    Relative estimate

    Fre

    quen

    cy

    0.96 1.00 1.04

    02

    46

    810

    Pmvnorm

    Relative estimate

    Fre

    quen

    cy

    1 2 3 4 5 60

    1020

    3040

    5060

    Results of 100 estimates of the P(x 6∈ P(360, 6)), divided by exp(−62/2).Left panel: ALORE. Right panel: pmvnorm.

    December 15, 2017

  • SAMSI Monte Carlo workshop 24

    SymmetryMaybe the circumscribed polygon is too easy due to symmetry.

    Redo it with just ωj = (cos(2πj/J), sin(2πj/J))T for the 72 prime numbers

    j ∈ {1, 2, . . . , 360}.

    For τ = 6 variance of µ̂/ exp(−18) is 0.00077 for ALORE (n = 1000),and 8.5 for pmvnorm.

    December 15, 2017

  • SAMSI Monte Carlo workshop 25

    High dimensional half spacesTake random ωj ∈ Rd for large d.By concentration of measure they’re nearly orthogonal.

    µ.= 1−

    J∏j=1

    (1− Pj).=∑j

    Pj = µ̄ (for rare events)

    Simulation

    • Dimension d ∼ U{20, 50, 100, 200, 500}• Constraints J ∼ U{d/2, d, 2d}• τ such that− log10(µ̄) ∼ U[4, 8] (then rounded to 2 places).

    December 15, 2017

  • SAMSI Monte Carlo workshop 26

    High dimensional results

    ●● ●

    ●●●

    ●●

    ●●

    ●●● ●

    ●● ●

    ● ●●●

    ●●

    ●●

    ●●

    ●●

    ● ●●

    ● ●●●

    ●●

    ●●

    ● ●●

    ● ●

    ●●

    ● ●● ●

    ● ●

    ● ●●

    ●● ●

    ●● ●●

    ● ●●

    ●●

    ●● ●

    ●●

    ● ●

    ● ●

    ● ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ● ●

    ●●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    1e−08 1e−07 1e−06 1e−05 1e−04

    110

    010

    000

    Union bound

    Est

    imat

    e / B

    ound

    ● ●● ● ● ●● ● ●● ● ●● ●● ● ●●● ● ●●● ●● ● ●● ● ● ●● ● ●● ● ● ●● ●●●● ●● ●●● ●●● ●● ● ● ●●● ● ●● ● ●●● ● ●● ●● ●● ● ●● ●●● ●● ● ●●● ●● ● ●● ●● ● ● ●●● ●● ● ● ● ●●● ●●● ●● ● ●●● ● ●● ●● ● ●●● ● ●●● ●● ●●● ● ●●● ●● ●● ●● ●● ● ● ●●●● ●● ●● ● ●●● ●● ● ●● ● ●● ● ●●● ● ●● ● ●● ●●●● ●●●● ●●● ●●●● ●● ●● ●● ● ●●●

    ALOREpmvnorm

    Results of 200 estimates of the µ for varying high dimensional problems with

    nearly independent events.December 15, 2017

  • SAMSI Monte Carlo workshop 27

    Power systemsNetwork with N nodes called busses and M edges.

    Typically M/N is not very large.

    Power pi at bus i. Each bus is Fixed or Random or Slack:

    slack bus S has pS = −∑i 6=S pi.

    p = (pTF , pTR, pS)

    T ∈ RN

    Randomness driven by pR ∼ N (ηR,ΣRR)

    Inductances Bij

    A Laplacian Bij 6= 0 if i ∼ j in the network. Bii = −∑j 6=iBij .

    Taylor approximation

    θ = B+p (pseudo inverse)

    Therefore θ is Gaussian as are all θi − θj for i ∼ j.

    December 15, 2017

  • SAMSI Monte Carlo workshop 28

    ExamplesOur examples come from MATPOWER (a Matlab toolbox).

    Zimmernan, Murillo-Sánchez & Thomas (2011)

    We used n = 10,000.

    Polish winter peak grid

    2383 busses, d = 326 random busses, J = 5772 phase constraints.

    ω̄ µ̂ se/µ̂ µ µ̄

    π/4 3.7× 10−23 0.0024 3.6× 10−23 4.2× 10−23

    π/5 2.6× 10−12 0.0022 2.6× 10−12 2.9× 10−12

    π/6 3.9× 10−07 0.0024 3.9× 10−07 4.4× 10−07

    π/7 2.0× 10−03 0.0027 2.0× 10−03 2.4× 10−03

    ω̄ is the phase constraint, µ̂ is the ALORE estimate, se is the estimated standard

    error, µ is the largest single event probability and µ̄ is the union bound. December 15, 2017

  • SAMSI Monte Carlo workshop 29

    Pegase 2869Fliscounakis et al. (2013) “large part of the European system”

    N = 2869 busses. d = 509 random busses. J = 7936 phase constraints.

    Results

    ω̄ µ̂ se/µ̂ µ µ̄

    π/2 3.5× 10−20 0∗ 3.3× 10−20 3.5× 10−20

    π/3 8.9× 10−10 5.0× 10−5 7.7× 10−10 8.9× 10−10

    π/4 4.3× 10−06 1.8× 10−3 3.5× 10−06 4.6× 10−06

    π/5 2.9× 10−03 3.5× 10−3 1.8× 10−03 4.1× 10−03

    Notes

    θ̄ = π/2 is unrealistically large. We got µ̂ = µ̄. All 10,000 samples had S = 1.

    Some sort of Wilson interval or Bayesian approach could help.

    One half space was sampled 9408 times, another 592 times. December 15, 2017

  • SAMSI Monte Carlo workshop 30

    Other modelsIEEE case 14 and IEEE case 300 and Pegase 1354 were all dominated by one

    failure mode so µ.= µ̄ and no sampling is needed.

    Another model had random power corresponding to wind generators but phase

    failures were not rare events in that model.

    The Pegase 13659 model was too large for our computer. The Laplacian had

    37,250 rows and columns.

    The Pegase 9241 model was large and slow and phase failure was not a rare

    event.

    Caveats

    We used a DC approximation to AC power flow (which is common) and the phase

    estimates were based on a Taylor approximation.

    December 15, 2017

  • SAMSI Monte Carlo workshop 31

    Next steps1) Nonlinear boundaries

    2) Non-Gaussian models

    3) Optimizing cost subject to a constraint on µ

    Sampling half-spaces will work if we have convex failure regions and a

    log-concave nominal density.

    December 15, 2017

  • SAMSI Monte Carlo workshop 32

    Thanks• Michael Chertkov and Yury Maximov, co-authors

    • Center for Nonlinear Studies at LANL, for hospitality

    • Alan Genz for help on mvtnorm

    • Bert Zwart, pointing out Adler et al.

    • Yanbo Tang and Jeffrey Negrea, improved Lemma proof

    • NSF DMS-1407397 & DMS-1521145

    • DOE/GMLC 2.0 Project: Emergency monitoring and controls through newtechnologies and analytics

    • Jianfeng Lu, Ilse Ipsen

    • Sue McDonald, Kerem Jackson, Thomas Gehrmann

    December 15, 2017

  • SAMSI Monte Carlo workshop 33

    Bonus topicThinning MCMC output:

    It really can improve statistical efficiency.

    Short story

    If it costs 1 unit to advance xi−1 → xiand θ > 0 units to compute yi = f(xi)

    then thinning to every k’th value lets us get larger n

    and less variance if ACF(yi) decays slowly.

    December 15, 2017

  • SAMSI Monte Carlo workshop 34

    MCMC (notation for)A simple MCMC generates:

    xi = ϕ(xi−1,ui) ∈ Rd, ui ∼ U(0, 1)m

    More general ones use ui ∈ Rmi where mi may be random.E.g., step i consume mi uniform random variables.

    The function ϕ is constructed so xi has desired stationary distribution π.

    We approximate

    µ =

    ∫f(x)π(x) dx by µ̂ =

    1

    n

    n∑i=1

    f(xi)

    For simplicity, ignore burn-in / warmup.

    December 15, 2017

  • SAMSI Monte Carlo workshop 35

    ThinningGet xki and f(xki) for i = 1, . . . , n and k > 1.

    • Thinning by a factor k usually reduces autocorrelations.

    • Then f(xki) are “more nearly IID”.

    • Thinning can also save storage xi.

    December 15, 2017

  • SAMSI Monte Carlo workshop 36

    Statistical efficiencyLet yi = f(xi) ∈ R. Geyer (1992) shows that for k > 1

    Var( 1n

    n∑i=1

    yi

    )6 Var

    ( 1bn/kc

    bn/kc∑i=1

    yki

    )Quotes

    Link & Eaton (2011):

    “Thinning is often unnecessary and always inefficient.”

    MacEachern & Berliner (1994):

    “This article provides a justification of the ban against sub-sampling

    the output of a stationary Markov chain that is suitable for presentation in

    undergraduate and beginning graduate-level courses.”

    Gamerman & Lopes (2006)’ on thinning the Gibbs sampler:

    “There is no gain in efficiency, however, by this approach and

    estimation is shown below to be always less precise than retaining all

    chain values.” December 15, 2017

  • SAMSI Monte Carlo workshop 37

    Not so fastThe analysis assumes that we compute f(xi) and only use every k’th one.

    Suppose that it costs

    • 1 unit to advance the chain: xi → xi+1.

    • θ units to compute yi = f(x).

    If we thin the chain we compute f less often and can use larger n.

    When it pays

    If θ is large and Corr(yi, yi+k) decays slowly, then thinning can be much more

    efficienty.

    When can that happen?

    E.g., x describes a set of particles and f computes interpoint distances.

    Updating f will cost proportionally to updating x, maybe much more.

    December 15, 2017

  • SAMSI Monte Carlo workshop 38

    Thinned estimateµ̂k =

    1

    nk

    nk∑i=1

    f(xik)

    knk advances xi → xi+1 andnk computations xi → yi = f(xi).

    Budget: knk + θnk 6 B

    nk =

    ⌊B

    k + θ

    ⌋≈ Bk + θ

    .

    Variance

    Var(µ̂k).=σ2

    nk

    (1 + 2

    ∞∑`=1

    ρk`

    ), ρ` = Corr(yt, yt+`).

    December 15, 2017

  • SAMSI Monte Carlo workshop 39

    Efficiency

    eff(k) =Var(µ̂1)

    Var(µ̂k)

    .=

    σ2

    n1

    (1 + 2

    ∑∞`=1 ρ`

    )σ2

    nk

    (1 + 2

    ∑∞`=1 ρk`

    ) = nk(1 + 2∑∞`=1 ρ`)n1(1 + 2

    ∑∞`=1 ρk`

    )NB: nk < n1 but ordinarily

    ∑∞`=1 ρ` >

    ∑ρk`.

    Sample size ratio

    nkn1≈ B/(k + θ)B/(1 + θ)

    =1 + θ

    k + θ

    Therefore

    eff(k) =(1 + θ)

    (1 + 2

    ∑∞`=1 ρ`

    )(k + θ)

    (1 + 2

    ∑∞`=1 ρk`

    )

    December 15, 2017

  • SAMSI Monte Carlo workshop 40

    Autocorrelation modelsACF plots often resemble AR(1):

    ρ` = ρ|`|, 0 < ρ < 1

    E.g., figures in Jackman (2009). Newman & Barkema (1999):

    “the autocorrelation is expected to fall off exponentially at long times”.

    Geyer (1991) notes exponential upper bound under ρ-mixing.

    Monotone non-negative autocorrelations

    ρ1 > ρ2 > · · · > ρ` > ρ`+1 > · · · > 0

    December 15, 2017

  • SAMSI Monte Carlo workshop 41

    Under the AR(1) model

    eff(k) = effAR(k) =(1 + θ)

    (1 + 2

    ∑∞`=1 ρ

    `)

    (k + θ)(1 + 2

    ∑∞`=1 ρ

    k`)

    =(1 + θ)

    (1 + 2ρ/(1− ρ)

    )(k + θ)

    (1 + 2ρk/(1− ρk)

    )...

    =1 + θ

    k + θ

    1 + ρ

    1− ρ1− ρk

    1 + ρk

    For large θ and ρ near 1, thinning will be very efficient.

    December 15, 2017

  • SAMSI Monte Carlo workshop 42

    Optimal thinning factor k

    θ \ ρ 0.1 0.5 0.9 0.99 0.999 0.9999 0.99999 0.999999

    0.001 1 1 1 4 18 84 391 1817

    0.01 1 1 2 8 39 182 843 3915

    0.1 1 1 4 18 84 391 1817 8434

    1 1 2 8 39 182 843 3915 18171

    10 2 4 17 83 390 1816 8433 39148

    100 3 7 32 172 833 3905 18161 84333

    1000 4 10 51 327 1729 8337 39049 181612

    θ is the cost to compute y = f(x).

    ρ is the autocorrelation.

    December 15, 2017

  • SAMSI Monte Carlo workshop 43

    Efficiency of optimal k

    θ \ ρ 0.1 0.5 0.9 0.99 0.999 0.9999 0.99999 0.999999

    0.001 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    0.01 1.00 1.00 1.00 1.01 1.01 1.01 1.01 1.01

    0.1 1.00 1.00 1.06 1.09 1.10 1.10 1.10 1.10

    1 1.00 1.20 1.68 1.93 1.98 2.00 2.00 2.00

    10 1.10 2.08 5.53 9.29 10.59 10.91 10.98 11.00

    100 1.20 2.79 13.57 51.61 85.29 97.25 100.17 100.82

    1000 1.22 2.97 17.93 139.29 512.38 845.38 963.79 992.79

    Versus k = 1.

    December 15, 2017

  • SAMSI Monte Carlo workshop 44

    Least k for 95% efficiencyθ \ ρ 0.1 0.5 0.9 0.99 0.999 0.9999 0.99999 0.999999

    0.001 1 1 1 1 1 1 1 1

    0.01 1 1 1 1 1 1 1 1

    0.1 1 1 2 2 2 2 2 2

    1 1 2 5 11 17 19 19 19

    10 2 4 12 45 109 164 184 189

    100 2 5 22 118 442 1085 1632 1835

    1000 2 6 31 228 1182 4415 10846 16311

    Table 1: Smallest k to give at least 95% of the efficiency of the most efficient k, as

    a function of θ and the autoregression parameter ρ.

    December 15, 2017

  • SAMSI Monte Carlo workshop 45

    Additional• Thinning can pay when ρ1 > 0 and ρ1 > ρ2 > · · · > 0.

    • θ > 0 reduces the optimal acceptance rate from 0.234Jeffrey Rosenthal has a quantitative version.

    December 15, 2017