Important sampling the union of rare events, with an ...owen/pubtalks/samsi.pdfSAMSI Monte Carlo workshop 1 Important sampling the union of rare events, with an application to power

SAMSI Monte Carlo workshop 1

Important sampling the union of

rare events, with an application

to power systems analysis

Art B. Owen

Stanford University

With Yury Maximov and Michael Chertkov

of Los Alamos National Laboratory.

December 15, 2017


MCQMC 2018

July 1–8, 2018

Rennes, France

MC, QMC, MCMC, RQMC, SMC, MLMC, MIMC, · · ·

Call for papers is now online:

http://mcqmc2018.inria.fr/

December 15, 2017


Rare event samplingMotivation: an electrical grid has N nodes. Power p1, p2, · · · , pN• Random pi > 0, e.g., wind generation,

• Random pi < 0, e.g. consumption,

• Fixed pi for controllable nodes,

AC phase angles θi

• (θ1, . . . , θN ) = F(p1, . . . , pN )

• Constraints: |θi − θj | < θ̄ if i ∼ j

• Find P(

maxi∼j |θi − θj | > θ̄)

Simplified model

• (p1, . . . , pN ) Gaussian

• θ linear in p1, . . . , pN

December 15, 2017


Gaussian setup

For x ∼ N (0, Id), Hj = {x | ωTj x > τj} find µ = P(x ∈ ∪Jj=1Hj).WLOG ‖ωj‖ = 1. Ordinarily τj > 0. d hundreds. J thousands.

−10 −5 0 5 10

−6

−4

−2

02

46

c(−

7, 6

)

●

●

●

●

●

●

Solid: deciles of ‖x‖. Dashed: 10−3 · · · 10−7. December 15, 2017


Basic bounds

Let Pj ≡ P(ωTj x > τj) = Φ(−τj)

Then max16j6J

Pj ≡ µ 6 µ 6 µ̄ ≡J∑j=1

Pj

Inclusion-exclusion

For u ⊆ 1:J = {1, 2, . . . , J}, let

Hu = ∪j∈uHj Hu(x) ≡ 1{x ∈ Hu} Pu = P(Hu) = E(Hu(x))

µ =∑|u|>0

(−1)|u|−1Pu

Plethora of bounds

Survey by Yang, Alajaji & Takahara (2014)

December 15, 2017


Other uses• Other engineering reliability

• False discoveries in statistics: J correlated test statistics

• Speculative: inference after model selectionTaylor, Fithian, Markovic, Tian

December 15, 2017


Importance samplingFor x ∼ p, seek η = Ep(f(x)) =

∫f(x)p(x) dx. Take

η̂ =1

n

n∑i=1

f(xi)p(xi)

q(xi), xi

iid∼ q

Unbiased if q(x) > 0 whenever f(x)p(x) 6= 0.

Variance

Var(η̂) = σ2q/n, where

σ2q =

∫f2p2

q− µ2 = · · · =

∫(fp− µq)2

q

Num: seek q ≈ fp/µ, i.e., nearly proportional to fpDen: watch out for small q.

December 15, 2017


Self-normalized IS

η̂SNIS =

∑ni=1 f(xi)p(xi)/q(xi)∑n

i=1 p(xi)/q(xi)

Available for unnormalized p and / or q.

Good for Bayes, limited effectiveness for rare events.

Optimal SNIS puts 1/2 the samples in the rare event.

Optimal plain IS puts all samples there.

Best possible asymptotic coefficient of variation is 2/√n.

December 15, 2017


Mixture samplingFor αj > 0,

∑j αj = 1

η̂α =1

n

n∑i=1

f(xi)p(xi)∑j αjqj(xi)

, xiiid∼ qα ≡

∑j

αjqj

Defensive mixtures

Take q0 = p, α0 > 0. Get p/qα 6 1/α0. Hesterberg (1995)

Additional refs

Use∫qj = 1 as control variates. O & Zhou (2000).

Optimization over α is convex. He & O (2014).

Multiple IS. Veach & Guibas (1994). Veach’s Oscar.

Elvira, Martino, Luengo, Bugallo (2015) Generalizations.

December 15, 2017


Instantonsµj = arg maxx∈Hj p(x) Chertkov, Pan, Stepanov (2011)

−10 −5 0 5 10

−6

−4

−2

02

46

c(−

7, 6

)

●

●

●

●

●

●

• Solid dots µj• Initial thought: qj = N (µj , I)• and qα =

∑j αjqj

• I.e., mixture of exponential tilting

Conditional sampling

qj = L(x | x ∈ Hj) = p(x)Hj(x)/Pj

December 15, 2017


Mixture of conditional sampling

α0, α1, . . . , αJ > 0,∑j

αj = 1, qα =∑j

αjqj , q0 ≡ p

µ = P(x ∈ ∪Jj=1Hj) = P(x ∈ H1:J) = E(H1:J(x)

)Mixture IS

µ̂α =1

n

n∑i=1

H1:J(xi)p(xi)∑Jj=0 αjqj(xi)

(where qj = pHj/Pj )

=1

n

n∑i=1

H1:J(xi)∑Jj=0 αjHj(xi)/Pj

with H0 ≡ 1 and P0 = 1.

It would be nice to have αj/Pj constant.

December 15, 2017


ALOREAt Least One Rare Event

Like AMIS of Cornuet, Marin, Mira, Robert (2012)

Put α∗0 = 0, and take α∗j ∝ Pj . That is

α∗j =Pj∑J

j′=1 Pj′=Pjµ̄, ( µ̄ is the union bound )

Then

µ̂α∗ =1

n

n∑i=1

H1:J(xi)∑Jj=1 αjHj(xi)/Pj

= µ̄× 1n

n∑i=1

H1:J(xi)∑Jj=1Hj(xi)

If x ∼ qα∗ then H1:J(x) = 1. So

µ̂α∗ = µ̄×1

n

n∑i=1

1

S(xi), S(x) ≡

J∑j=1

Hj(x)

December 15, 2017


Prior workAdler, Blanchet, Liu (2008, 2012) estimate

w(b) = P(

maxt∈T

f(t) > b)

for a Gaussian random field f(t) over T ⊂ Rd.They also consider a finite set T = {t1, . . . , tN}.

Comparisons

• We use the same mixture estimate as them for finite T and Gaussian data.

• Their analysis is for Gaussian distributions.We consider arbitrary sets of J events.

• They take limits as b→∞.We have non-asymptotic bounds and more general limits.

• Our analysis is limited to finite sets.They handle extrema over a continuum.

December 15, 2017


TheoremO, Maximov & Cherkov (2017)

Let H1, . . . ,HJ be events defined by x ∼ p.Let qj(x) = p(x)Hj(x)/Pj for Pj = P(x ∈ Hj).Let xi

iid∼ qα∗ =∑Jj=1 α

∗jqj for α

∗j = Pj/µ̄.

Take

µ̂ = µ̄× 1n

n∑i=1

1

S(xi), S(xi) =

J∑j=1

Hj(xi)

Then E(µ̂) = µ and

Var(µ̂) =1

n

(µ̄

J∑s=1

Tss− µ2

)6µ(µ̄− µ)

n

where Ts ≡ P(S(x) = s).The RHS follows because

J∑s=1

Tss

6J∑s=1

Ts = µ.December 15, 2017


RemarksWith S(x) equal to the number of rare events at x,

µ̂ = µ̄× 1n

n∑i=1

1

S(xi)

1 6 S(x) 6 J =⇒ µ̄J

6 µ̂ 6 µ̄

Robustness

The usual problem with rare event estimation is getting no rare events in n tries.

Then µ̂ = 0.

Here the corresponding failure is never seeing S > 2 rare events.

Then µ̂ = µ̄, an upper bound and probably fairly accurate if all S(xi) = 1

Conditioning vs sampling fromN (µj, I)Avoids wasting samples outside the failure zone.

Avoids awkward likelihood ratios.

December 15, 2017


General GaussiansFor y ∼ N (η,Σ) let the event be γTj y > κj .

Translate into xTωj > τj for

ωj =γTj Σ

1/2

γTj Σγjand τj =

κj − γTj ηγTj Σγj

Adler et al.

Their context has all κj = b

December 15, 2017


SamplingGet x ∼ N (0, I), such that xTω > τ .y will be xTω

1) Sample u ∼ U(0, 1).2) Let y = Φ−1(Φ(τ) + u(1− Φ(τ))). May easily get y =∞.3) Sample z ∼ N (0, I).4) Deliver x = ωy + (I − ωωT)z.

Better numerics from

1) Sample u ∼ U(0, 1).2) Let y = Φ−1(uΦ(−τ)).3) Sample z ∼ N (0, I).4) Let x = ωy + (I − ωωT)z.5) Deliver x = −x.

I.E., sample x ∼ N (0, I) subject to xTω 6 −τ and deliver−x.

Step 4 via ωy + z − ω(ωTz).December 15, 2017


More boundsRecall that Ts = P(S(x) = s). Therefore

µ̄ =J∑j=1

Pj = E

(J∑j=1

Hj(x)

)= E(S) =

J∑s=1

sTs

µ = E(

max16j6J

Hj(x))

= P(S > 0), so

µ̄ = E(S | S > 0)× P(S > 0) = µ× E(S | S > 0)

Therefore

n×Var(µ̂) = µ̄J∑s=1

Tss− µ2

=(µE(S | S > 0)

)(µE(S−1 | S > 0)

)− µ2

= µ2(E(S | S > 0)E(S−1 | S > 0)− 1

)December 15, 2017


Bounds continued

Var(µ̂) 61

n

(E(S | S > 0)× E(S−1 | S > 0)− 1

)Lemma

Let S be a random variable in {1, 2, . . . , J} for J ∈ N. Then

E(S)E(S−1) 6J + J−1 + 2

4with equality if and only if S ∼ U{1, J}.

Corollary

Var(µ̂) 6µ2

n

J + J−1 − 24

.

Thanks to Yanbo Tang and Jeffrey Negrea for an improved proof of the lemma.December 15, 2017


Numerical comparisonR function mvtnorm of Genz & Bretz (2009) gets

P(a 6 y 6 b

)= P

(∩j{aj 6 yj 6 bj}

)for y ∼ N (η,Σ)

Their code makes sophisticated use of quasi-Monte Carlo.

Adaptive, up to 25,000 evals in FORTRAN.

It was not designed for rare events.

It computes an intersection.

Usage

Pack ωj into Ω and τj into T

1− µ = P(ΩTx 6 T ) = P(y 6 T ), y ∼ N (0,ΩTΩ)

It can handle up to 1000 inputs. IE J 6 1000

Upshot

ALORE works (much) better for rare events.

mvtnorm works better for non-rare events.December 15, 2017


Polygon exampleP(J, τ) regular J sided polygon in R2 outside circle of radius τ > 0.

P =

x∣∣∣∣∣sin(2πj/J)

cos(2πj/J)

T x 6 τ, j = 1, . . . , J

a priori bounds

µ = P(x ∈ Pc) 6 P(χ2(2) > τ2) = exp(−τ2/2)

This is pretty tight. A trigonometric argument gives

1 >µ

exp(−τ2/2)> 1−

(J tan

(πJ

)− π

)τ2

2π

.= 1− π

2τ2

6J2

Let’s use J = 360

So µ.= exp(−τ2/2) for reasonable τ .

December 15, 2017


IS vs MVNALORE had n = 1000. MVN had up to 25,000. 100 random repeats.

τ µ E((µ̂ALORE/µ− 1)2) E((µ̂MVN/µ− 1)2)

2 1.35×10−01 0.000399 9.42×10−08

3 1.11×10−02 0.000451 9.24×10−07

4 3.35×10−04 0.000549 2.37×10−02

5 3.73×10−06 0.000600 1.81×10+00

6 1.52×10−08 0.000543 4.39×10−01

7 2.29×10−11 0.000559 3.62×10−01

8 1.27×10−14 0.000540 1.34×10−01

For τ = 5 MVN had a few outliers.

December 15, 2017


Polygon againALORE

Relative estimate

Fre

quen

cy

0.96 1.00 1.04

02

46

810

Pmvnorm

Relative estimate

Fre

quen

cy

1 2 3 4 5 60

1020

3040

5060

Results of 100 estimates of the P(x 6∈ P(360, 6)), divided by exp(−62/2).Left panel: ALORE. Right panel: pmvnorm.

December 15, 2017


SymmetryMaybe the circumscribed polygon is too easy due to symmetry.

Redo it with just ωj = (cos(2πj/J), sin(2πj/J))T for the 72 prime numbers

j ∈ {1, 2, . . . , 360}.

For τ = 6 variance of µ̂/ exp(−18) is 0.00077 for ALORE (n = 1000),and 8.5 for pmvnorm.

December 15, 2017


High dimensional half spacesTake random ωj ∈ Rd for large d.By concentration of measure they’re nearly orthogonal.

µ.= 1−

J∏j=1

(1− Pj).=∑j

Pj = µ̄ (for rare events)

Simulation

• Dimension d ∼ U{20, 50, 100, 200, 500}• Constraints J ∼ U{d/2, d, 2d}• τ such that− log10(µ̄) ∼ U[4, 8] (then rounded to 2 places).

December 15, 2017


High dimensional results

●

●

●

●● ●

●

●

●●●

●

●

●●

●●

●●● ●

●● ●

● ●●●

●

●●

●●

●

●

●●

●

●●

●

●

●

●

● ●●

● ●●●

●●

●

●

●●

● ●●

● ●

●

●●

●

●

● ●● ●

● ●

●

● ●●

●

●● ●

●

●

●

●

●

●

●● ●●

●

● ●●

●●

●● ●

●

●●

● ●

●

● ●

●

● ●●

● ●●

●

●

●●

●

●●

●

●●

●

●

●●

●

●

●

●

●●

●●

●

●

●●

●

● ●

● ●

●

●●

●

● ●

●

●●●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●●●

●

●

●●

●

1e−08 1e−07 1e−06 1e−05 1e−04

110

010

000

Union bound

Est

imat

e / B

ound

● ●● ● ● ●● ● ●● ● ●● ●● ● ●●● ● ●●● ●● ● ●● ● ● ●● ● ●● ● ● ●● ●●●● ●● ●●● ●●● ●● ● ● ●●● ● ●● ● ●●● ● ●● ●● ●● ● ●● ●●● ●● ● ●●● ●● ● ●● ●● ● ● ●●● ●● ● ● ● ●●● ●●● ●● ● ●●● ● ●● ●● ● ●●● ● ●●● ●● ●●● ● ●●● ●● ●● ●● ●● ● ● ●●●● ●● ●● ● ●●● ●● ● ●● ● ●● ● ●●● ● ●● ● ●● ●●●● ●●●● ●●● ●●●● ●● ●● ●● ● ●●●

●

●

ALOREpmvnorm

Results of 200 estimates of the µ for varying high dimensional problems with

nearly independent events.December 15, 2017


Power systemsNetwork with N nodes called busses and M edges.

Typically M/N is not very large.

Power pi at bus i. Each bus is Fixed or Random or Slack:

slack bus S has pS = −∑i 6=S pi.

p = (pTF , pTR, pS)

T ∈ RN

Randomness driven by pR ∼ N (ηR,ΣRR)

Inductances Bij

A Laplacian Bij 6= 0 if i ∼ j in the network. Bii = −∑j 6=iBij .

Taylor approximation

θ = B+p (pseudo inverse)

Therefore θ is Gaussian as are all θi − θj for i ∼ j.

December 15, 2017


ExamplesOur examples come from MATPOWER (a Matlab toolbox).

Zimmernan, Murillo-Sánchez & Thomas (2011)

We used n = 10,000.

Polish winter peak grid

2383 busses, d = 326 random busses, J = 5772 phase constraints.

ω̄ µ̂ se/µ̂ µ µ̄

π/4 3.7× 10−23 0.0024 3.6× 10−23 4.2× 10−23

π/5 2.6× 10−12 0.0022 2.6× 10−12 2.9× 10−12

π/6 3.9× 10−07 0.0024 3.9× 10−07 4.4× 10−07

π/7 2.0× 10−03 0.0027 2.0× 10−03 2.4× 10−03

ω̄ is the phase constraint, µ̂ is the ALORE estimate, se is the estimated standard

error, µ is the largest single event probability and µ̄ is the union bound. December 15, 2017


Pegase 2869Fliscounakis et al. (2013) “large part of the European system”

N = 2869 busses. d = 509 random busses. J = 7936 phase constraints.

Results

ω̄ µ̂ se/µ̂ µ µ̄

π/2 3.5× 10−20 0∗ 3.3× 10−20 3.5× 10−20

π/3 8.9× 10−10 5.0× 10−5 7.7× 10−10 8.9× 10−10

π/4 4.3× 10−06 1.8× 10−3 3.5× 10−06 4.6× 10−06

π/5 2.9× 10−03 3.5× 10−3 1.8× 10−03 4.1× 10−03

Notes

θ̄ = π/2 is unrealistically large. We got µ̂ = µ̄. All 10,000 samples had S = 1.

Some sort of Wilson interval or Bayesian approach could help.

One half space was sampled 9408 times, another 592 times. December 15, 2017


Other modelsIEEE case 14 and IEEE case 300 and Pegase 1354 were all dominated by one

failure mode so µ.= µ̄ and no sampling is needed.

Another model had random power corresponding to wind generators but phase

failures were not rare events in that model.

The Pegase 13659 model was too large for our computer. The Laplacian had

37,250 rows and columns.

The Pegase 9241 model was large and slow and phase failure was not a rare

event.

Caveats

We used a DC approximation to AC power flow (which is common) and the phase

estimates were based on a Taylor approximation.

December 15, 2017


Next steps1) Nonlinear boundaries

2) Non-Gaussian models

3) Optimizing cost subject to a constraint on µ

Sampling half-spaces will work if we have convex failure regions and a

log-concave nominal density.

December 15, 2017


Thanks• Michael Chertkov and Yury Maximov, co-authors

• Center for Nonlinear Studies at LANL, for hospitality

• Alan Genz for help on mvtnorm

• Bert Zwart, pointing out Adler et al.

• Yanbo Tang and Jeffrey Negrea, improved Lemma proof

• NSF DMS-1407397 & DMS-1521145

• DOE/GMLC 2.0 Project: Emergency monitoring and controls through newtechnologies and analytics

• Jianfeng Lu, Ilse Ipsen

• Sue McDonald, Kerem Jackson, Thomas Gehrmann

December 15, 2017


Bonus topicThinning MCMC output:

It really can improve statistical efficiency.

Short story

If it costs 1 unit to advance xi−1 → xiand θ > 0 units to compute yi = f(xi)

then thinning to every k’th value lets us get larger n

and less variance if ACF(yi) decays slowly.

December 15, 2017


MCMC (notation for)A simple MCMC generates:

xi = ϕ(xi−1,ui) ∈ Rd, ui ∼ U(0, 1)m

More general ones use ui ∈ Rmi where mi may be random.E.g., step i consume mi uniform random variables.

The function ϕ is constructed so xi has desired stationary distribution π.

We approximate

µ =

∫f(x)π(x) dx by µ̂ =

1

n

n∑i=1

f(xi)

For simplicity, ignore burn-in / warmup.

December 15, 2017


ThinningGet xki and f(xki) for i = 1, . . . , n and k > 1.

• Thinning by a factor k usually reduces autocorrelations.

• Then f(xki) are “more nearly IID”.

• Thinning can also save storage xi.

December 15, 2017


Statistical efficiencyLet yi = f(xi) ∈ R. Geyer (1992) shows that for k > 1

Var( 1n

n∑i=1

yi

)6 Var

( 1bn/kc

bn/kc∑i=1

yki

)Quotes

Link & Eaton (2011):

“Thinning is often unnecessary and always inefficient.”

MacEachern & Berliner (1994):

“This article provides a justification of the ban against sub-sampling

the output of a stationary Markov chain that is suitable for presentation in

undergraduate and beginning graduate-level courses.”

Gamerman & Lopes (2006)’ on thinning the Gibbs sampler:

“There is no gain in efficiency, however, by this approach and

estimation is shown below to be always less precise than retaining all

chain values.” December 15, 2017


Not so fastThe analysis assumes that we compute f(xi) and only use every k’th one.

Suppose that it costs

• 1 unit to advance the chain: xi → xi+1.

• θ units to compute yi = f(x).

If we thin the chain we compute f less often and can use larger n.

When it pays

If θ is large and Corr(yi, yi+k) decays slowly, then thinning can be much more

efficienty.

When can that happen?

E.g., x describes a set of particles and f computes interpoint distances.

Updating f will cost proportionally to updating x, maybe much more.

December 15, 2017


Thinned estimateµ̂k =

1

nk

nk∑i=1

f(xik)

knk advances xi → xi+1 andnk computations xi → yi = f(xi).

Budget: knk + θnk 6 B

nk =

⌊B

k + θ

⌋≈ Bk + θ

.

Variance

Var(µ̂k).=σ2

nk

(1 + 2

∞∑`=1

ρk`

), ρ` = Corr(yt, yt+`).

December 15, 2017


Efficiency

eff(k) =Var(µ̂1)

Var(µ̂k)

.=

σ2

n1

(1 + 2

∑∞`=1 ρ`

)σ2

nk

(1 + 2

∑∞`=1 ρk`

) = nk(1 + 2∑∞`=1 ρ`)n1(1 + 2

∑∞`=1 ρk`

)NB: nk < n1 but ordinarily

∑∞`=1 ρ` >

∑ρk`.

Sample size ratio

nkn1≈ B/(k + θ)B/(1 + θ)

=1 + θ

k + θ

Therefore

eff(k) =(1 + θ)

(1 + 2

∑∞`=1 ρ`

)(k + θ)

(1 + 2

∑∞`=1 ρk`

)

December 15, 2017


Autocorrelation modelsACF plots often resemble AR(1):

ρ` = ρ|`|, 0 < ρ < 1

E.g., figures in Jackman (2009). Newman & Barkema (1999):

“the autocorrelation is expected to fall off exponentially at long times”.

Geyer (1991) notes exponential upper bound under ρ-mixing.

Monotone non-negative autocorrelations

ρ1 > ρ2 > · · · > ρ` > ρ`+1 > · · · > 0

December 15, 2017


Under the AR(1) model

eff(k) = effAR(k) =(1 + θ)

(1 + 2

∑∞`=1 ρ

`)

(k + θ)(1 + 2

∑∞`=1 ρ

k`)

=(1 + θ)

(1 + 2ρ/(1− ρ)

)(k + θ)

(1 + 2ρk/(1− ρk)

)...

=1 + θ

k + θ

1 + ρ

1− ρ1− ρk

1 + ρk

For large θ and ρ near 1, thinning will be very efficient.

December 15, 2017


Optimal thinning factor k

θ \ ρ 0.1 0.5 0.9 0.99 0.999 0.9999 0.99999 0.999999

0.001 1 1 1 4 18 84 391 1817

0.01 1 1 2 8 39 182 843 3915

0.1 1 1 4 18 84 391 1817 8434

1 1 2 8 39 182 843 3915 18171

10 2 4 17 83 390 1816 8433 39148

100 3 7 32 172 833 3905 18161 84333

1000 4 10 51 327 1729 8337 39049 181612

θ is the cost to compute y = f(x).

ρ is the autocorrelation.

December 15, 2017


Efficiency of optimal k

θ \ ρ 0.1 0.5 0.9 0.99 0.999 0.9999 0.99999 0.999999

0.001 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

0.01 1.00 1.00 1.00 1.01 1.01 1.01 1.01 1.01

0.1 1.00 1.00 1.06 1.09 1.10 1.10 1.10 1.10

1 1.00 1.20 1.68 1.93 1.98 2.00 2.00 2.00

10 1.10 2.08 5.53 9.29 10.59 10.91 10.98 11.00

100 1.20 2.79 13.57 51.61 85.29 97.25 100.17 100.82

1000 1.22 2.97 17.93 139.29 512.38 845.38 963.79 992.79

Versus k = 1.

December 15, 2017


Least k for 95% efficiencyθ \ ρ 0.1 0.5 0.9 0.99 0.999 0.9999 0.99999 0.999999

0.001 1 1 1 1 1 1 1 1

0.01 1 1 1 1 1 1 1 1

0.1 1 1 2 2 2 2 2 2

1 1 2 5 11 17 19 19 19

10 2 4 12 45 109 164 184 189

100 2 5 22 118 442 1085 1632 1835

1000 2 6 31 228 1182 4415 10846 16311

Table 1: Smallest k to give at least 95% of the efficiency of the most efficient k, as

a function of θ and the autoregression parameter ρ.

December 15, 2017


Additional• Thinning can pay when ρ1 > 0 and ρ1 > ρ2 > · · · > 0.

• θ > 0 reduces the optimal acceptance rate from 0.234Jeffrey Rosenthal has a quantitative version.

December 15, 2017

Documents

Important sampling the union of rare events, with an ...owen/pubtalks/samsi.pdfSAMSI Monte Carlo workshop 1 Important sampling the union of rare events, with an application to power