Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
A Short Introduction to Stein’s Method
Gesine ReinertDepartment of Statistics
University of Oxford
Michaelmas Term 2011
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Outline
Lecture 1: Normal approximationMotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Lecture 2: Poisson approximation and other distributionsPoisson approximation
Poisson approximation: the local approach
Poisson approximation: size bias couplings
A general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Lecture 1: Normal approximation
Recall two famous distributional approximations:Example: X1, X2, . . . ,Xn i.i.d., P(Xi = 1) = p = 1 − P(Xi = 0)
n−1/2n∑
i=1
(Xi − p) ≈d N (0, p(1 − p))
but alson∑
i=1
Xi ≈d Poisson(np)
Which approximation should we choose?We would like to assess the distance between distributions viaexplicit bounds.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Weak convergence
Often distributional distances are phrased in terms of testfunctions. This approach is based on weak convergence.For c.d.f.s Fn, n ≥ 0 and F on the line we say that Fn convergesweakly (converges in distribution) to F ,
FnD−→ F
ifFn(x) → F (x) (n → ∞)
for all continuity points x of F .For the associated probability distributions:
PnD−→ P.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Facts of weak convergence
PnD−→ P
is equivalent to:
1. Pn(A) → P(A) for each P-continuity set A (i.e. P(∂A) = 0)
2.∫
fdPn →∫
fdP for all functions f that are bounded,continuous, real-valued
3.∫
fdPn →∫
fdP for all functions f that are bounded, infinitelyoften differentiable, continuous, real-valued.
For a random variable X with distribution L(X ),
L(Xn)D−→ L(X ) ⇐⇒ Ef (Xn) → Ef (X )
for all functions f that are bounded, infinitely often differentiable,continuous, real-valued.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Total variation distance
Let L(X ) = P,L(Y ) = Q; define total variation distance
dTV (P, Q) = supA measurable
|P(A) − Q(A)|.
If the underlying space is discrete then convergence in totalvariation is equivalent to convergence in distribution. Forprobability measures on Z+ we can write
dTV (P, Q) =1
2
∑
j≥0
|P{j} − Q{j}|.
If the underlying space is continuous then the total variationdistance is very strong; for example approximating a discreteinteger-valued random variable by a normal random variable wecould choose as set A the set of integers; then the total variationdistance would be 1.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Wasserstein distance
Consider the set of Lipschitz-continuous functions with Lipschitzconstant 1,
L = {g : R → R; |g(y) − g(x)| ≤ |y − x |}
and Wasserstein distance
dW (P, Q) = supg∈L
|Eg(Y ) − Eg(X )|
= inf E|Y − X |,
where the infimum is over all couplings X , Y such thatL(X ) = P,L(Y ) = Q.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Using
F = {f ∈ L absolutely continuous, f (0) = f ′(0) = 0}
we also have
dW (P, Q) = supf ∈F
|Ef ′(Y ) − Ef ′(X )|.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Stein’s Method for Normal Approximation
Stein (1972, 1986)Z ∼ N (µ, σ2) if and only if for all smooth functions f ,
E(Z − µ)f (Z ) = σ2Ef ′(Z ).
For a random variable W with EW = µ,VarW = σ2, if
σ2Ef ′(W ) − E(W − µ)f (W )
is close to zero for many functions f , then W should be close to Zin distribution.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Sketch of proof for µ = 0, σ2 = 1:
Assume Z ∼ N (0, 1). Integration by parts:
1√2π
∫
f ′(x)e−x2/2dx
=
[
1√2π
f (x)e−x2/2
]
+1√2π
∫
xf (x)e−x2/2dx
=1√2π
∫
xf (x)e−x2/2dx
and so Ef ′(Z ) = EZf (Z ).
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Conversely, assume EZf (Z ) = Ef ′(Z ): We solve the differentialequation
f ′(x) − xf (x) = g(x), limx→−∞f (x)e−x2/2 = 0
for any bounded function g, giving
f (y) = ey2/2
∫ y
−∞g(x)e−x2/2dx
Take g(x) = 1(x ≤ x0) − Φ(x0), then
0 = E(f ′(Z ) − Zf (Z )) = P(Z ≤ x0) − Φ(x0)
so Z ∼ N (0, 1).
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
The Stein equation
Let µ = 0. Given a test function h, let Nh = Eh(Z/σ), and solvefor f in the Stein equation
σ2f ′(w) − wf (w) = h(w/σ) − Nh
giving
f (y) = ey2/2
∫ y
−∞(h(x/σ) − Nh) e−x2/2dx .
Now evaluate the expectation of the r.h.s. by the expectation ofthe l.h.s.We can bound the solution f and its derivatives in terms of thetest function h and its derivatives, e.g. ‖ f ′′ ‖≤ 2 ‖ h′ ‖.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Example: the sum of i.i.d. random variables
X , X1, . . . ,Xn i.i.d. with EX = 0, VarX = 1n; put W =
∑ni=1 Xi
and put Wi = W − Xi =∑
j 6=i Xj . Then
EWf (W ) =n∑
i=1
EXi f (W )
=n∑
i=1
EXi f (Wi ) +n∑
i=1
EX 2i f ′(Wi ) + R
=1
n
n∑
i=1
Ef ′(Wi ) + R.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
So
Ef ′(W ) − EWf (W ) =1
n
n∑
i=1
E{f ′(W ) − f ′(Wi )} + R
and we can bound the remainder term R.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Theorem: explicit bound on distance
For any smooth h
|Eh(W ) − Nh| ≤ ‖h′‖(
2√n
+n∑
i=1
E|X 3i |)
.
Note: this bound is true for any n; nothing goes to infinity.
This theorem extends to local dependence.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Local dependence
Let X1, . . . ,Xn be mean zero, finite variances, put W =∑n
i=1 Xi ;assume VarW = 1. Suppose that for each i = 1, . . . , n there existsets Ai ⊂ Bi ⊂ {1, . . . , n} such that Xi is independent of
∑
j 6∈AiXj
and∑
j∈AiXj is independent of
∑
j 6∈BiXj . Define
ηi =∑
j∈Ai
Xj and τi =∑
j∈Bi
Xj .
TheoremFor any smooth h with ‖h′‖ ≤ 1,
|Eh(W ) − Nh| ≤ 2n∑
i=1
(E|Xiηiτi | + |E(Xiηi )|E|τi |) +n∑
i=1
E|Xiη2i |.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Example: word counts in DNA sequences
Let B = B1B2 . . .Bn be a string of letters which are i.i.d. from thealphabet A = {A, C , G , T}. Let w = w1w2 . . .wm be a word oflength m composed of letters from A. Let V = V (w) be thenumber of times that w occurs in the sequence B.Words occur in clumps; for example consider
B = T G A A C A A A C A A C A A A G A A C A A A A
w = A A C A A
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Example: word counts in DNA sequences
Let B = B1B2 . . .Bn be a string of letters which are i.i.d. from thealphabet A = {A, C , G , T}. Let w = w1w2 . . .wm be a word oflength m composed of letters from A. Let V = V (w) be thenumber of times that w occurs in the sequence B.Words occur in clumps; for example consider
B = T G A A C A A A C A A C A A A G A A C A A A A
w = A A C A A
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Example: word counts in DNA sequences
Let B = B1B2 . . .Bn be a string of letters which are i.i.d. from thealphabet A = {A, C , G , T}. Let w = w1w2 . . .wm be a word oflength m composed of letters from A. Let V = V (w) be thenumber of times that w occurs in the sequence B.Words occur in clumps; for example consider
B = T G A A C A A A C A A C A A A G A A C A A A A
w = A A C A A
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Example: word counts in DNA sequences
Let B = B1B2 . . .Bn be a string of letters which are i.i.d. from thealphabet A = {A, C , G , T}. Let w = w1w2 . . .wm be a word oflength m composed of letters from A. Let V = V (w) be thenumber of times that w occurs in the sequence B.Words occur in clumps; for example consider
B = T G A A C A A A C A A C A A A G A A C A A A A
w = A A C A A
V (w) = 4; there are two clumps of occurrences.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
For j = 1, . . . , n − m + 1 define the word indicators
Ij =
{
1 if BjBj+1 . . .Bj+m−1 = w
0 otherwise.
Let p(x) = P(X1 = x) for x = A, C , G , T , then
EIj = p(w) :=m∏
i=1
p(wi )
andEV = (n − m + 1)p(w).
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
The variance
Define, for ℓ = 1, . . . ,m,
pw(m) = p(wm−ℓ+1)p(wm−ℓ+2) · · · p(wm),
which is the probability that w occurs, given that w1w2 · · ·wm−ℓ
has occurred. Define the self-overlap indicator
ξ(j) = 1(wj+1 = w1, . . . ,wm = wm−j).
This overlap indicator equals 1 if the last m − j letters of woverlap exactly the first m − j letters of w. Then the varianceVar(W ) = σ2 is given by
2p(w)m−1∑
j=1
(n − m − j + 1)ξ(j)pw(j) − 2(m − 1)(n − m + 1)p(w)2.
The variance is of order n when m = o(n).Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
The neighbourhoods of dependence
Put
Xj =1
σ(Ij − p(w))
andAi = {j ∈ {1, . . . , n} : |j − i | ≤ m − 1}
as well as
Bi = {j ∈ {1, . . . , n} : |j − i | ≤ 2(m − 1)}.
Then Xi is independent of∑
j 6∈AiXj and
∑
j∈AiXj is independent
of∑
j 6∈BiXj . Moreover
ηi =∑
j :|j−i |≤m−1
Xj and τi =∑
j :|j−i |≤2(m−1)
Xj .
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Consider the statement of the theorem,
|Eh(W ) − Nh| ≤ 2n∑
i=1
(E|Xiηiτi | + |E(Xiηi )|E|τi |) +n∑
i=1
E|Xiη2i |.
We bound |Xi | ≤ 1σ and |ηi | ≤ 2m−1
σ as well as |τi | ≤ 4m−1σ and
hence the bound can be bounded by
(2m − 1)(4(4m − 1) + 1)
σ3n.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Couplings
The local approach does not work well when there is (weak) globaldependence, for example as in simple random sampling.Instead we make use of couplings: changing one random variableshould not have much effect on the other random variables if thedependence is weak.Here we consider two such couplings: the size-bias coupling andthe zero-bias coupling. There are more, such as the exchangeablepair coupling.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Size-Bias coupling: µ > 0
If W ≥ 0,EW > 0 then W s has the W -size biased distribution if
EWf (W ) = EWEf (W s)
for all f for which both sides exist.
Example: If X ∼ Bernoulli(p), then EXf (X ) = pf (1) and soX s = 1.
Example: If X ∼ Poisson(λ), then
P(X s = k) =ke−λλk
k!λ=
e−λλk−1
(k − 1)!
and so X s = X + 1, where the equality is in distribution.Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
UsingEWf (W ) = EWEf (W s)
with f (x) =? we get that
EW s =EW 2
µ.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
UsingEWf (W ) = EWEf (W s)
with f (x) = x we get that
EW s =EW 2
µ.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
UsingEWf (W ) = EWEf (W s)
with f (x) = x2 we get that
E(W s)2 =EW 3
µ.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Construction
(Goldstein + Rinott 1997) Suppose W =∑n
i=1 Xi with Xi ≥ 0,EXi > 0, all i .Choose index V proportional to the mean, EXv . If V = v : replaceXv by X s
v having the Xv -size biased distribution, independent, andif X s
v = x : adjust Xu, u 6= v , such that
L(Xu, u 6= v) = L(Xu, u 6= v |Xv = x)
Then W s =∑
u 6=V Xu + X sV has the W -size bias distribution.
Example: sum of independent Bernoullis Example: Xi ∼ Be(pi ) fori = 1, . . . , nThen W s =
∑
u 6=V Xu + 1.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
How does the size-bias coupling help?
X , X1, . . . ,Xn i.i.d. non-negative, EX = µ,VarX = σ2 andW =
∑ni=1 Xi . Then
E(W − µ)f (W ) = µE(f (W s) − f (W ))
≈ µE(W s − W )f ′(W )
= µ1
n
n∑
i=1
E(X si − Xi )f
′(W )
≈ µEf ′(W )E(X s − µ)
= µEf ′(W )
{
1
µEX 2 − µ
}
= σ2Ef ′(W ).
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
As a theorem
(Goldstein + Rinott 1997) Let W ≥ 0 have mean µ > 0 andvariance 1. Let W s have the W−size bias distribution and assumethat W s is defined on the same probability space as W . Let h be asmooth function. Then
|Eh(W − µ) − Eh(Z )|
≤ (sup h − inf h)µ√
VarE(W s − W |W ) +1
6‖h′‖µE[(W s − W )2].
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Example: sum of i.i.d.’s
X , X1, . . . ,Xn i.i.d. non-negative, EX = µn, VarX = 1
n, and
W =∑n
i=1 Xi . Then
E(W s − W |W ) =1
n
n∑
i=1
E(X si − Xi |W )
=1
n
(
n∑
i=1
nEX 2
µ−
n∑
i=1
Xi
)
=nEX 2
µ− 1
nW
and so
VarE(W s − W |W ) =1
n2.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Moreover
E[(W s − W )2] =1
n
n∑
i=1
E(Xi − X si )2
= EX 2 − 2µEX s + E(X s)2
= EX 2 − 2nEX 2 + nEX 3
µ
≤ nEX 3
µ.
Thus the bound gives
|Eh(W − µ) − Eh(Z )| ≤ 1
2‖h′′‖µ1
n+
1
6‖h(3)‖EX 3n.
We think of X ≍ n−12 ; then the overall bound is of order n−
12 .
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Example: Vertex degrees in Bernoulli random graphs
Let G (n, p) be a random graph on n vertices where each pair ofvertices has probability p of making up an edge, independently.The degree D(v) of a vertex v is the number of its neighbours.Let W count the number of vertices with degree d . Then
W =n∑
v=1
I(D(v) = d).
The indicators are not independent - if we know that D(v) = 0 forv = 1, . . . , n − 1, then D(n) = 0 also.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Coupling
Take a random graph and pick a vertex v at random. Set itsdegree equal to d . If its degree was d already, nothing needs to beadjusted. If its degree was D(v) > d then choose D(v) − d of itsedges uniformly at random and delete them. If the degree wasD(v) < d then choose d − D(v) vertices among the remainingn − D(v) − 1 non-neighbours of v and create an edge betweeneach of these and v .Goldstein and Rinott (1997) calculate the bound; see also Lin andR. (2012) for a random graph model where p = p(u, v) maydepend on the vertices.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
For mean-zero random variables the size bias coupling is not easyto implement.In particular it would not be applicable to normally distributedrandom variables.This drawback lead to the definition of the zero-bias coupling.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Zero bias coupling
Let X be a mean zero random variable with finite, nonzerovariance σ2. We say that X ∗ has the X -zero biased distribution iffor all differentiable f for which EXf (X ) exists,
EXf (X ) = σ2Ef ′(X ∗).
The zero bias distribution X ∗ exists for all X that have mean zeroand finite variance. (Goldstein and R. 1997)It is easy to verify that W ∗ has density
p∗(w) = σ−2E{W I(W > w)}.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Examples
Example: If X ∼ N (0, σ2), then X has the X -zero bias distribution- and the normal is the only fixed point of the zero biastransformation.
Example: If X ∼ Bernoulli(p) − p, then
E{X I(X > x)} = p(1 − p) for − p < x < 1 − p
and is zero elsewhere, so X ∗ ∼ Uniform(−p, 1 − p).
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Connection with Wasserstein distance
We have for W mean zero, variance 1,
|Eh(W ) − Nh| = |E[
f ′(W ) − Wf (W )]
|= |E
[
f ′(W ) − f ′(W ∗)]
|≤ ||f ′′||E|W − W ∗|,
where || · || is the supremum norm. By (Stein 1986) for habsolutely continuous we have ||f ′′|| ≤ 2||h′|| and hence
|Eh(W ) − Nh| ≤ 2||h′||E|W − W ∗|;
thusdW (L(W ),N (0, 1)) ≤ 2E|W − W ∗|.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Construction: sum of i.i.d.s
W =∑n
i=1 Xi , where the Xi ’s are independent mean zero finitevariance σ2
i variables: Choose an index I proportional to thevariance, zero bias in that variable,
W ∗ = W − XI + X ∗I .
Then, for any smooth f ,
EWf (W ) =n∑
i=1
EXi f (W ) =n∑
i=1
EXi f (Xi +∑
t 6=i
Xt)
=
n∑
i=1
σ2i Ef ′(X ∗
i +∑
t 6=i
Xt) = σ2n∑
i=1
σ2i
σ2Ef ′(W − Xi + X ∗
i )
= σ2Ef ′(W − XI + X ∗I ) = σ2Ef ′(W ∗),
where we have used independence of Xi and Xt , t 6= i .Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
And here is the theorem
Let X1, . . . ,Xn be independent mean zero variables with variancesσ2
1, . . . , σ2n and finite third moments, and let
W = (X1 + . . . + Xn)/σ where σ2 = σ21 + . . . + σ2
n. Then for allabsolutely continuous test functions h,
|Eh(W ) − Nh| ≤ 2||h′||σ3
n∑
i=1
E
(
|Xi | +1
2|Xi |3
)
σ2i .
When the variables are i.i.d. with variance 1,
|Eh(W ) − Nh| ≤ 3||h′||E|X1|3√n
.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
The construction is (much) more involved under dependence.When third moments vanish, fourth moments exist: Order n−1
bound.Also: Berry-Esseen bound, combinatorial central limit theorem(Goldstein 2004).
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Some remarks
◮ Left out:
1. Exchangeable pair couplings, also used for variance reductionin simulations
2. Multivariate, also coupling approaches3. Infinite-dimensional variables (Malliavin calculus), random
measures
◮ Key feature: bounds in the presence of dependence
◮ In the i.i.d. case: Berry-Esseen inequality not quite recovered
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
Further reading
◮ Barbour, A.D. (1990). Stein’s method for diffusion approximations, Probability Theory and Related Fields
84 297-322.
◮ Barbour, A.D. and Chen. L.H.Y. eds (2005). An Introduction to Stein’s Method. Lecture Notes Series 4,Inst. Math. Sciences, World Scientific Press, Singapore.
◮ Barbour, A.D. and Chen. L.H.Y. eds (2005). Stein’s Method and Applications. Lecture Notes Series 5,Inst. Math. Sciences, World Scientific Press, Singapore.
◮ Chen, L.H.Y., Goldstein, L., and Shao, Q.-M. (2011). Normal Approximation by Stein’s Method. Springer,Berlin and Heidelberg.
◮ Diaconis, P., and Holmes, S. (eds.) (2004). Stein’s Method: Expository Lectures and Applications. IMSLecture Notes 46.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
MotivationDistributional distancesThe Stein equationLocal dependenceCouplings
More further reading
◮ Reinert, G. (2011). Gaussian approximation of functionals: Malliavin calculus and Stein’s method In:Surveys in Stochastic Processes. Editors: Jochen Blath , Peter Imkeller , and Sylvie Roelly. EuropeanMathematical Society, Zurich. pp. 107-126
◮ G. Reinert (1998). Couplings for normal approximations with Stein’s method. In Microsurveys in Discrete
Probability. D. Aldous, J. Propp eds., Dimacs series. AMS, 193 - 207.
◮ G. Reinert, S. Schbath, and M.S. Waterman (2005). Probabilistic and Statistical Properties of FiniteWords in Finite Sequences. In: Lothaire: Applied Combinatorics on Words, Cambridge University Press, J.Berstel, D. Perrin, eds., 268-352.
◮ Stein, C. (1972). A bound for the error in the normal approximation to the distribution of a sum ofdependent random variables. Proc. Sixth Berkeley Symp. Math. Statist. Probab. 2 583-602, Univ.California Press, Berkeley.
◮ Stein, C. (1986). Approximate Computation of Expectations. IMS, Hayward, CA.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Poisson approximation and other distributions
We first look at Poisson approximation, then we will put theapproximations in a general framework.We then apply the general framework to chisquare distributions aswell as one-dimensional Gibbs measures.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Poisson approximation
The standard reference for this section is Barbour, Holst andJanson (1992).Recall the Poisson(λ) distribution
pk = e−λ λk
k!, k = 0, 1, . . . .
This is a discrete distribution, and for approximations on Z+ wewould usually take total variation distance;
dTV (P, Q) = supA⊂Z+
|P(A) − Q(A)|.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
A Stein characterisation
Fact: Z ∼ Po(λ) ⇐⇒ for all functions f : Z+ → R withEf (Z ) = 0
Eλg(Z + 1) − Zg(Z ) = 0.
This suggests to use as test functions indicators I (j ∈ A) for setsA ⊂ Z+ and as Stein equation
λg(j + 1) − jg(j) = I (j ∈ A) − Po{λ}(A).
For any random variable W we hence have
λg(W + 1) − Wg(W ) = I (W ∈ A) − Po{λ}(A)
and taking expectations
dTV (L(W ), Po{λ}) ≤ supg
{Eλg(W + 1) − Wg(W )} .
Here the sup is over all functions g which solve the Stein equation.Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
The solution of the Stein equation
λg(j + 1) − jg(j) = I (j ∈ A) − Po{λ}(A)
can be calculated iteratively. We only need the results that
supj
|g(j)| ≤ min(1, λ− 12 )
and∆g := sup
j
|g(j + 1) − g(j)| ≤ min(1, λ−1).
Here λ−1 is often referred to as the magic factor.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Example: sum of independent Bernoulli variables
Let X1, X2, . . . be independent Bernoulli(pi ) and W =∑n
i=1 Xi .With λ =
∑ni=1 pi
EWg(W ) =∑
EXig(W ) =∑
EXig(W − Xi + 1)
=∑
piEg(W − Xi + 1)
and thus
Eλg(W + 1) − Wg(W )
=∑
piE(g(W + 1) − g(W − Xi + 1))
=∑
p2i E(g(W + 1) − g(W − Xi + 1)|Xi = 1)
and
|Eλg(W + 1) − Wg(W )| ≤ ∆g∑
p2i ≤ min(1, λ−1)
∑
p2i .
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
The local approach
Again the argument generalises to local dependence.Let X1, . . . ,Xn be Bernoulli random variables, Xi ∼ Be(pi ), andput W =
∑ni=1 Xi . Let λ =
∑ni=1 pi . Suppose that for each
i = 1, . . . , n there exists a set Ai ⊂ {1, . . . , n} such that Xi isindependent of
∑
j 6∈AiXj . Define
ηi =∑
j∈Ai
Xj .
TheoremFor any smooth h with ‖h′‖ ≤ 1, and λ = EW,
dTV (L(W ), Po(λ)) ≤n∑
i=1
[(piEηi + E(Xiηi ))] min(
1, λ−1)
.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Example: word counts in DNA sequences
Let B = B1B2 . . .Bn be a string of letters which are i.i.d. from thealphabet A = {A, C , G , T}. Let w = w1w2 . . .wm be a word oflength m composed of letters from A. Let V = V (w) be thenumber of times that w occurs in the sequence B.For j = 1, . . . , n − m + 1 define the word indicators
Xj =
{
1 if BjBj+1 . . .Bj+m−1 = w0 otherwise.
Let p(x) = P(B1 = x) for x = A, C , G , T , then
EXj = p(w) :=m∏
i=1
p(wi ).
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
The neighbourhood of dependence
Put Ai = {j ∈ {1, . . . , n} : |j − i | ≤ m − 1}. Then Xi isindependent of
∑
j 6∈AiXj . Moreover ηi =
∑
j :|j−i |≤m−1 Xj .
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Consider the statement of the theorem,
dTV (L(W ), Po(λ) ≤n∑
i=1
[(piEηi + E(Xiηi ))] min(
1, λ−1)
.
Here pi = p(w) for all i and we think of λ = O(1) so that roughlyp(w) = O(n−1). Moreover Eηi ≤ (2m − 1)p(w) andE(Xiηi ) =
∑
j :|j−i |≤m−1 EXiXj depends on the amount ofself-overlap in w. If w does not have any self-overlap then we haveEXiXj = 0 for |i − j | ≤ m − 1 and hence
dTV (L(W ), Po(λ) ≤ n(2m − 1)p(w)2 min(
1, λ−1)
.
If w has considerable self-overlap then E(Xiηi ) will not be smalland indeed a compound Poisson approximation would be advised.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Example: sequencing by hybridisation
(Arratia et al. 1996) Sequencing by hybridisation is a tool todetermine a DNA sequence from the unordered list of all l-tuplesin the sequence; typically l = 8, 10, 12. Assume that the multisetof all l-tuples is known. The sequence is then uniquely recoverableunless one of the following occurs in the sequence:
◮ a non-trivial rotation
◮ a non-trivial transposition using three way repeats
◮ a non-trivial transposition using two interleaved pairs ofrepeats.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Using a Poisson approximation for pairs and three way repeats wecan bound the probability of the sequence being uniquelyrecoverable. For example if all letters are equally likely:The probability of unique recoverability is at least .95 ifl = 8 and sequence length is at most 85l = 10 and sequence length is at most 469l = 12 and sequence length is at most 2,288.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Recall: If W ≥ 0,EW = λ then W s has the W -size biaseddistribution if
EWf (W ) = λEf (W s)
for all f for which both sides exist.
Example: sum of independent Bernoullis Example: Xi ∼ Be(pi ) fori = 1, . . . , n; pick V ∝ pi , then W s =
∑
u 6=V Xu + 1.
TheoremFor the sum of possibly dependent Bernoullis, withL(Xj , j 6= i) = L(Xj , j 6= i |Xi = 1),
dTV (L(W ), Po(λ)) = |n∑
i=1
pi (Eg(W + 1) − g(∑
j 6=i
Xj + 1))|
≤ ∆gn∑
i=1
piE|W −∑
j 6=i
Xj |.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
If the Bernoulli random variables are independent then
E|W −∑
j 6=i
Xj | = EXi = pi
and we obtain the same bound as from the local approach.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
A general framework
Start with a target distribution µ.
1. Find characterization: operator A such that X ∼ µ if and onlyif for all smooth functions f , EAf (X ) = 0
2. For each smooth function h find solution f = fh of the Steinequation
h(x) −∫
hdµ = Af (x)
3. Then for any variable W ,
Eh(W ) −∫
hdµ = EAf (W )
We usually need to bound f , f ′, or ∆f .Here h is a smooth test function; for nonsmooth functions: seetechniques used by Shao, Chen, Rinott and Rotar, Gotze.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
The generator approach
Barbour 1989, 1990; Gotze 1993Choose A as generator of a Markov process with stationarydistribution µ, that is:Let (Xt)t≥0 be a homogeneous Markov processPut Tt f (x) = E (f (Xt)|X (0) = x)The generator is defined as
Af (x) = limt↓0
1
t(Tt f (x) − f (x)) .
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Generator facts
(Ethier and Kurtz (1986), for example)
1. µ stationary distribution then X ∼ µ if and only ifEAf (X ) = 0 for f for which Af is defined
2. Tth − h = A(
∫ t
0 Tuhdu)
and formally
∫
hdµ − h = A(∫ ∞
0Tuhdu
)
if the r.h.s. exists.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Examples
1. Ah(x) = h′′(x) − xh′(x) is the generator ofOrnstein-Uhlenbeck process, stationary distribution N (0, 1).
2. Ah(x) = λ(h(x + 1) − h(x)) + x(h(x − 1) − h(x)) or
Af (x) = λf (x + 1) − xf (x)
is the generator of an immigration-death process withimmigration rate λ and unit per capita death rate; thestationary distribution is Poisson(λ).
Advantage: generalisations to multivariate, diffusions, measurespace...
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Chisquare distributions
A generator for χ2p is
Af (x) = xf ′′(x) +1
2(p − x)f ′(x).
Luk 1994: A Stein operator for Gamma(r , λ) is
Af (x) = xf ′′(x) + (r − λx)f ′(x)
and χ2p = Gamma(d/2, 1/2); and, for χ2
p, A is the generator of aMarkov process given by the solution of the stochastic differentialequation
Xt = x +1
2
∫ t
0(p − Xs)ds +
∫ t
0
√
2XsdBs
where Bs is standard Brownian motion.Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
The Stein equation is then
(χ2p) h(x) − χ2
ph = xf ′′(x) +1
2(p − x)f ′(x)
where χ2ph is the expectation of h under the χ2
p-distribution.
Lemma(Pickett 2002). Suppose h : R → R is absolutely bounded,|h(x)| ≤ ceax for some c > 0 a ∈ R, and the first k derivatives of hare bounded. Then the equation (χ2
p) has a solution f = fh suchthat
‖ f (j) ‖≤√
2π√p
‖ h(j−1) ‖
with h(0) = h.
(Improvement over Luk 1994 in 1√p)
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Example: squared sum (R. + Pickett)Xi , i = 1, . . . , n, i.i.d. mean zero, variance one, exisiting 8th
moment
S =1√n
n∑
i=1
Xi
andW = S2
Pickett + R. (2004): the bound on the distance to Chisquare(1)for smooth test functions is of order 1
n
Also: Pearson’s chisquare statistic.Robert Gaunt: Variance-gamma distributions.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
One-dimensional Gibbs distributions
(Eichelsbacher + R. (2008)) We consider discrete univariateGibbs measures µ with supp(µ) = {0, . . . ,N}, whereN ∈ N0 ∪ {∞}. Such a Gibbs measure can be written as
µ(k) =1
Zexp(V (k))
ωk
k!, k = 0, 1, . . . ,N,
with Z =∑N
k=0 exp(V (k))ωk
k! , where ω > 0 is fixed. We shallassume that the normalizing constant Z exists.Conversely, for a given probability distribution (µ(k))k∈N0 we canfind a representation as a Gibbs measure by choosing
V (k) = log µ(k) + log k! + log Z − k log ω, k = 0, 1, . . . ,N,
with V (0) = log µ(0) + log Z. We have some freedom in thechoice of ω and thus of V . Fix a representation.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
A Markov process
To each Gibbs measure we associate a Markovian birth-deathprocess with unit per-capita death rate dk = k and birth rate
bk = ω exp{V (k + 1) − V (k)} = (k + 1)µ(k + 1)
µ(k),
for k , k + 1 ∈ supp(µ). It is easy to see that this birth-deathprocess has invariant measure µ.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
The generator
Following the generator approach to Stein’s method, we wouldtherefore choose as generator
(Ah)(k) = (h(k + 1) − h(k)) exp{V (k + 1) − V (k)}ω+k(h(k − 1) − h(k))
or, with the simplification f (k) = h(k) − h(k − 1),
(Af )(k) = f (k + 1) exp{V (k + 1) − V (k)}ω − kf (k).
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Examples
1. The Poisson-distribution with parameter λ > 0: We useω = λ, V (k) = −λ,Z = 1. The Stein-operator is
(Af )(k) = f (k + 1)λ − kf (k)
as before.
2. The Binomial-distribution with parameters n and 0 < p < 1:We use ω = p
1−p, V (k) = − log((n − k)!), and
Z = (n!(1 − p)n)−1. The Stein-operator is
(Af )(k) = f (k + 1)p(n − k)
(1 − p)− kf (k).
3. The Hypergeometric distribution: The Stein-operator is
(Af )(k) = (a − k)(n − k) f (k + 1) − (b − n − x) kf (k).
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
The solution of the Stein equation
For a given function h, a solution f of the Stein equation is givenby f (0) = 0, f (k) = 0 for k 6∈ supp(µ), and
f (j + 1) =j!
ωj+1e−V (j+1)
j∑
k=0
eV (k) ωk
k!(h(k) − µ(h))
= − j!
ωj+1e−V (j+1)
N∑
k=j+1
eV (k) ωk
k!(h(k) − µ(h)) .
We can bound the solution and their first difference ∆f .
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Exploiting size-biasing
Recall that, for W ≥ 0 with EW > 0 we say that W ∗ has theW -size biased distribution if
EWg(W ) = EWEg(W ∗)
for all g for which both sides exist. In particular we obtain that
E
{
exp{V (X + 1) − V (X )}ω g(X + 1) − X g(X )
}
= E
{
exp{V (X + 1) − V (X )}ω g(X + 1) − EXEg(X ∗)
}
and
EX = ωEeV (X+1)−V (X ).
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
A Gibbs measure characterisation
Let X ≥ 0 be such that 0 < E (X ) < ∞, and let µ be a discreteunivariate Gibbs measure on the non-negative integers. ThenX ∼ µ if and only if for all bounded g we have that
ω EeV (X+1)−V (X )g(X + 1) = ω EeV (X+1)−V (X )Eg(X ∗).
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
And here is the theorem
For any W ≥ 0 with 0 < EW < ∞ we have that
Eh(W ) − µ(h)
= ω{EeV (W+1)−V (W )f (W + 1)
−EeV (W+1)−V (W )Ef (W ∗)}
where f is the solution of the Stein equation.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Example: lattice approximation in statistical physics
Chayes and Klein (1994): Assume A ⊂ Rd is a rectangle; denoteits volume by |A|. Consider the intersection of A with thed-dimensional lattice n−1Zd . For each site m in this intersectionwe associate a Bernoulli random variable X n
m which takes value 1,with probability pn
m, if a particle is present at m and 0 otherwise. Ifthe collection (X n
m)m is independent, the joint distribution can beinterpreted as the Gibbs distribution for an ideal gas on the lattice,and we have Poisson convergence.We provide bounds on the approximation for such particles whileallowing for interactions.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
More details
The model is as follows. Pick n ∈ N and suppose that A can bepartitioned into a regular array of d(n) sub-rectangles
{Sn1 , . . . ,Sn
d(n)} with volumes v(Snm) = zn
m
zn. For each m, n choose a
point qnm ∈ Sn
m. We consider a sequence of functions (fk)ksatisfying f0 ≡ 1 and for each k ≥ 1, fk(x1, . . . , xk) is anonnegative function, Riemann integrable on Ak , such that fk is asymmetric function for each k .
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Let X nm, 1 ≤ m ≤ d(n), be 0-1 random variables with joint density
function
P(X n1 = a1, . . . ,X
nd(n) = ad(n)) ∝ fk(qn
i1, . . . , qn
ik)
d(n)∏
m=1
(znm)am
where each ai ∈ {0, 1} and the indices on the right-hand side are
determined by k =∑d(n)
m=1 am. Define Sn =∑d(n)
m=1 X nm.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Then, due to the symmetry of fk ,
P(Sn = k) =
(zn)k
k!
∑d(n)i1=1 · · ·
∑d(n)ik=1 fk(qn
i1, . . . , qn
ik)∏k
m=1 v(Snim
)∑d(n)
k=0(zn)k
k!
∑d(n)i1=1 · · ·
∑d(n)ik=1 fk(qn
i1, . . . , qn
ik)∏k
m=1 v(Snim
).
Note that we can write this distribution as a Gibbs measure µn.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Let S be a nonnegative integer valued random variable defined by
P(S = k) =zk
k!
∫
Ak fk(x1, . . . , xk)dx1 · · · dxk∑∞
k=0zk
k!
∫
Ak fk(x1, . . . , xk)dx1 · · · dxk
.
We write this distribution as a Gibbs measure µ. Using ourapproach we can bound the total variation distance between µn
and µ.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Comparing distributions via generators
Let µ have generator A and corresponding (ω, V ), and let µ2 havegenerator A2 and corresponding (ω2, V2). Assume that bothbirth-death processes have unit per-capita death rates. Then, forX ∼ µ2, if the solution f of the Stein equation for µ is such thatA2f exists,
Eh(X ) − µ(h) = EAf (X ) = E(A−A2)f (X ).
Using that X ∗ has the X -size bias distribution,
|Eh(X ) − µ(h)| ≤ ‖ f ‖ E(X )
{ |ω − ω2|ω2
+ω
ω2E∣
∣
∣e(V (X∗)−V (X∗−1))−(V2(X
∗)−V2(X∗−1)) − 1
∣
∣
∣
}
.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
For example, for two Poisson distributions Poisson(λ1) andPoisson(λ2) this approach gives
∣
∣
∣
∣
Eh(X ) −∫
hdµ
∣
∣
∣
∣
≤ ‖ f ‖ |λ1 − λ2| .
Note that the normalising constant Z in the Gibbs distribution,which is often difficult to calculate, is not needed explicitly in theStein approach. This is one of the main advantages of the aboveapproach.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Final Remarks
In the first example, X1, X2, . . . ,Xn i.i.d.,P(Xi = 1) = p = 1 − P(Xi = 0), using Stein’s method we canshow that
supx
|P((np(1 − p))−1/2n∑
i=1
(Xi − p) ≤ x) − P(N (0, 1) ≤ x)|
≤ 6
√
p(1 − p)
n
and
supx
|P(n∑
i=1
Xi = x) − P(Po(np) = x)| ≤ min(np2, p).
So, if p < 36n+36 , the bound on the Poisson approximation is smaller
than the bound on the normal approximation.Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Application areas
◮ sequence analysis
◮ random graph statistics, and network statistics
◮ number theory
◮ MCMC convergence
◮ exceedances in time series
◮ interacting particle systems
◮ epidemics
◮ permutation tests
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
◮ Many more distributions
◮ Exchangeable pair couplings, also used for variance reductionin simulations
◮ Multivariate, also coupling approaches
◮ Infinite-dimensional: random measures, functionals ofGaussian random fields
◮ General distributional transformations
◮ Bounds in the presence of dependence
◮ In the i.i.d. case the Berry-Esseen inequality is not quiterecovered.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method
Lecture 1: Normal approximationLecture 2: Poisson approximation and other distributions
Poisson approximationA general frameworkChisquare distributionsOne-dimensional Gibbs distributionsFinal Remarks
Further reading
1. Arratia, R., Goldstein, L., and Gordon, L. (1989). Two momentssuffice for Poisson approximation: The Chen-Stein method. Ann.Probab. 17, 9-25.
2. Barbour, A.D., Holst, L., and Janson, S. (1992). PoissonApproximation. Oxford University Press.
3. G. Reinert (2005). Three general approaches to Stein’s method. In:A Program in Honour of Charles Stein: Tutorial Lecture Notes.A.D. Barbour, L.H.Y. Chen, eds. World Scientific, Singapore(2005), 183-221.
Gesine ReinertDepartment of Statistics University of Oxford A Short Introduction to Stein’s Method