Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Bootstrap (Part 4)
Christof Seiler
Stanford University, Spring 2016, Stats 205
Overview
I So far:I Nonparametric bootstrap on the rows (e.g. regression, PCA
with random rows and columns)I Nonparametric bootstrap on the residuals (e.g. regression)I Parametric bootstrap (e.g. PCA with fixed rows and columns)I Studentized bootstrap
I Today:I Bias-Corrected-accelerated (BCa) bootstrapI From BCa to ABC
Motivation
I Correlation coefficient of bivariate normal with ρ = 0.577
sigma = matrix(nrow = 2,ncol = 2)diag(sigma) = 1rho = 0.577sigma[1,2] = sigma[2,1] = rhosigma
## [,1] [,2]## [1,] 1.000 0.577## [2,] 0.577 1.000
I Distribution of sample correlation coefficient (n = 10)I Compare: Percentile, Studentized, and
Bias-Corrected-Accelerated (BCa) bootstrap
MotivationHistogram of corhat
corhat
Fre
quen
cy
−0.5 0.0 0.5 1.0
050
010
0015
0020
00
bias = rho - mean(corhat); bias
## [1] 0.0217078
Motivation
−1.0 −0.5 0.0 0.5 1.0 1.5
020
4060
8010
0Percentile Bootstrap
Motivation
I Studentized bootstrap with variance stabilization fails due tonumerical problems
Motivation
−1.0 −0.5 0.0 0.5 1.0 1.5
020
4060
8010
0Studentized Bootstrap Without Variance Stabilization
Motivation
−1.0 −0.5 0.0 0.5 1.0 1.5
020
4060
8010
0BCa Bootstrap
Motivation
−1.0 −0.5 0.0 0.5 1.0 1.5
040
80
Percentile Bootstrap
−1.0 −0.5 0.0 0.5 1.0 1.5
040
80
Studentized Bootstrap Without Variance Stabilization
−1.0 −0.5 0.0 0.5 1.0 1.5
040
80
BCa Bootstrap
BCa Bootstrap
I The bias-corrected bootstrap is similar to the percentilebootstrap
I Recall the percentile bootstrap:I Take bootstrap samples
θ∗1, . . . , θ∗B
I Order themθ(∗1), . . . , θ(∗B)
I Define interval as
(θ(∗Bα), θ(∗B(1−α)))
(assuming that Bα and B(1− α) are integers)
BCa Bootstrap
I Assume that there is an monotone increasing transformation gsuch that
φ = g(θ) and φ = g(θ)I The BCa bootstrap is based on this model
φ− φσφ
∼ N(−z0, 1) with σφ = 1 + aφ
I Which is a generalization of the usual normal approximation
θ − θσ∼ N(0, 1)
BCa Bootstrap
I z0 is the bias estimateI z0 measures discrepancy between the median of θ∗ and θI It is estimated with
z0 = Φ−1(
#{θ∗b < θ}B
)
I We obtain z0 = 0 if half of the θ∗b values are less than or equalto θ
BCa Bootstrap
I a is the skewness estimateI a measures the rate of change of the standard error of θ with
respect to the true parameter θI It is estimated using the Jackknife
I Delete ith observation in original sample denote new sample byθ(i) and estimate
θ(·) =n∑
i=1
θ(i)
n
I Then
a =∑n
i=1(θ(·) − θ(i))3
6{∑n
i=1(θ(·) − θ(i))2}3/2
BCa Bootstrap
I The bias-corrected version makes two additional corrections tothe percentile version
I By redefining lower α1 and upper α2 levels as
α1 = Φ(
z0 + z0 + z(α)
1− a(z0 + z(α))
)α2 = Φ
(z0 + z0 + z(1−α)
1− a(z0 + z(1−α))
)
with z(α) being the 100α percentile of standard normaland Φ normal CDF
I When a and z0 are equal to zero then α1 = α and α2 = 1− αI The interval is then given by
(θ(∗Bα1), θ(∗Bα2))
(assuming that Bα1 and Bα2 are integers)
BCa Bootstrap
I Same asymptotic accuracy as the studentized bootstrapI Can handle out of range problem as wellI Efron (1987) for detailed justification of this model
BCa Bootstrap in R
library(bootstrap)xdata = matrix(rnorm(30),ncol=2); n = 15theta = function(x,xdata) {
cor(xdata[x,1],xdata[x,2])}
results = bcanon(1:n,100,theta,xdata,alpha=c(0.025, 0.975))
results$confpoints
## alpha bca point## [1,] 0.025 -0.39659## [2,] 0.975 0.69326
Properties of Different Boostrap Methods
Standard Percentile Studentized∗ BCa
Asymptotic Acurracy O(√
n) O(√
n) O(1/n) O(1/n)Range-Preserving No Yes No YesTransformation-Invariant No Yes No YesBias-Correcting No No No YesSkeweness-Correcting No Yes Yes Yesσ, σ∗ required No No Yes NoAnalytic constant orvariance stabilizingtranformation required No No Yes Yes
∗ with variance stabilization
Properties of Different Boostrap Methods
For nonparametric boostrap:
Source: Carpenter and Bithell (2000)
Many More Topics
I Using the boostrap for better confidence in model selection(Efron 2014)
I Using the jackknife and the infinitesimal jackknife forconfidence intervals in random forests prediction orclassification (Wager, Hastie, and Efron 2014)
Approximate Bayesian Computation (ABC)
I Goal: We wish to sample from the posterior distribution p(θ|D)given data D
p(θ|D) = p(D|θ)p(θ)p(D)
I Setting:I The likelihood p(D|θ) is hard to evaluate or expensive to
compute (e.g. missing normalizing constant)I Easy to sample from likelihood p(D|θ)I Easy to sample from prior p(θ)
I Examples:I Population genetics (latent variables)I Ecology, epidemiology, systems biology (models based on
differential equations)
Approximate Bayesian Computation (ABC)
I Sampling algorithm (with data D = {y1, . . . , yn}):1. Sample θi ∼ p(θ)2. Sample xi ∼ p(x |θi )3. Reject θi if
xi 6= yj for j = 1, . . . , n
I ABC sampling (define statistics µ, distance ρ, and tolerance ε):1. Sample θi ∼ p(θ)2. Sample Di = {x1, . . . , xk} ∼ p(x |θi )3. Reject θi if
ρ(µ(Di ), µ(D)) > ε
References
I Efron (1987). Better Bootstrap Confidence IntervalsI Hall (1992). The Bootstrap and Edgeworth ExpansionI Efron and Tibshirani (1994). An Introduction to the BootstrapI Carpenter and Bithell (2000). Bootstrap Conidence Intervals:
When, Which, What? A Practical Guide for MedicalStatisticians
I Marin, Pudlo, Robert, and Ryder (2012). ApproximateBayesian Computational Methods
I Efron (2014). Estimation and Accuracy after Model SelectionI Wager, Hastie, and Efron (2014). Confidence Intervals for
Random Forests: The Jackknife and the Infinitesimal Jackknife