Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Regression III
Lecture 5: Resampling Techniques
Dave Armstrong
University of Western OntarioDepartment of Political Science
Department of Statistics and Actuarial Science (by courtesy)
e: [email protected]: www.quantoid.net/teachicpsr/regression3/
1 / 64
Goals of the Lecture
• Introduce the general idea of Resampling - techniques toresample from the original data
• Bootstrapping• Cross-validation
• An extended example of bootstrapping local polynomialregression models.
2 / 64
Resampling: An Overview
• Resampling techniques sample from the original dataset• Some of the applications of these methods are:
• to compute standard errors and confidence intervals (either whenwe have small sample sizes, dubious distributional assumptions orfor a statistic that does not have an easily derivable asymptoticstandard error)
• Subset selection in regression• Handling Missing Data• Selection of degrees of freedom in nonparametric regression
(especially GAMs)
• For the most part, this lecture will discuss resampling techniquesin the context of computing confidence intervals and hypothesistests for regression analysis.
3 / 64
Resampling and Regression: A Caution
• There is no need whatsoever for bootstrapping in regressionanalysis if the OLS assumptions are met
• In such cases, OLS estimates are unbiased and maximally e�cient.
• There are situations, however, where we cannot satisfy theassumptions and thus other methods are more helpful
• Robust regression (such as MM-estimation) often provides betterestimates than OLS in the presence of influential cases, but onlyhas reliable SEs asymptotically.
• Local polynomial regression is often “better” (in the RSS sense) inthe presence of non-linearity, but because of the unknown df, onlyhas approximate sampling distribution.
• Cross-validation is particularly helpful for validating models andchoosing model fitting parameters
4 / 64
Bootstrapping: General Overview
• If we assume that a random variable X or statistic has aparticular population value, we can study how a statisticalestimator computed from samples behaves
• We don’t always know, however, how a variable or statistic isdistributed in the population
• For example, there may be a statistic for which standard errorshave not been formulated (e.g., imagine we wanted to test whethertwo additive scales have significantly di↵erent levels of internalconsistency - Cronbach’s � doesn’t have an exact samplingdistribution
• Another example is the impact of missing data on a distribution -we don’t know how the missing data di↵er from the observed data
• Bootstrapping is a technique for estimating standard errors andconfidence intervals (sets) without making assumptions about thedistributions that give rise to the data
5 / 64
Bootstrapping: General Overview (2)
• Assume that we have a sample of size n for which we requiremore reliable standard errors for our estimates
• Perhaps n is small, or alternatively, we have a statistic for whichthere is no known sampling distribution
• The bootstrap provides one “solution”• Take several new samples from the original sample, calculating the
statistic each time• Calculate the average and standard error (and maybe quantiles)
from the empirical distribution of the bootstrap samples• In other words, we find a standard error based on sampling (with
replacement) from the original data
• We apply principles of inference similar to those employed whensampling from the population
• The population is to the sample as the sample is to the bootstrapsamples
6 / 64
Bootstrapping: General Overview (3)
there are several Variants of the bootstrap. The two we are mostinterested in are:1. Nonparametric Bootstrap
• No underlying population distribution is assumed• Most commonly used method
2. Parametric Bootstrap• Assumes that the statistic has a particular parametric form (e.g.,
normal)
7 / 64
Bootstrapping the Mean
• Imagine, unrealistically, that we are interested in finding theconfidence interval for the mean of a sample of only 4observations
• Specifically, assume that we are interested in the di↵erence inincome between husbands and wives
• We have four cases, with the following mean di↵erences (in$ 1000’s): 6, -3, 5, 3, for a mean of 2.75 and a standard deviationof 4.031
• From classical theory, we can calculate the standard error:
SE =Sxpn
=4.031129p
4= 2.015564
• Now we’ll compare the confidence interval to the one calculatedusing bootstrapping
8 / 64
Defining the Random Variable
• The first thing that bootstrapping does is estimate thepopulation distribution of Y from the four observations in thesample
• In other words, the random variable Y ⇤ is defined:
Y ⇤ p⇤(Y ⇤)6 0.25-3 0.255 0.253 0.25
• The mean of Y ⇤ is then simply the mean of the sample:
E⇤(Y ⇤) =’
Y ⇤p⇤(Y ⇤)= 2.75
= Y
9 / 64
The Sample as the Population (1)
• We now treat the sample as if it were the population, andresample from it
• In this case, we take all possible samples with replacement,meaning that we take nn = 256 di↵erent samples
• Since we found all possible samples, the mean of these samples issimply the original mean
• We then determine the standard error of Y from these samples
SE⇤(Y ) =
s
Õnnb=1(Y ⇤
b � Y )2nn
= 1.74553
We now adjust for the sample size
SE(Y ) =r
n
n � 1SE⇤(Y ⇤) =
r
4
3⇥ 1.74553 = 2.015564
10 / 64
The Sample as the Population (2)
• In this example, because we used all possible resamples of oursample, the bootstrap standard error (2.015564) is exactly thesame as the original standard error
• This approach can be used for statistics for which we do not havestandard error formulas, or we have small sample sizes
• In summary, the following analogies can be made to samplingfrom the population
• Bootstrap observations ! original observations• Bootstrap Mean ! original sample mean• Original sample mean ! unknown population mean µ• Distribution of the bootstrap means ! unknown sampling
distribution from the original sample
11 / 64
Characteristics of the Bootstrap Statistic
• The bootstrap sampling distribution around the original estimateof the statistic T is analogous to the sampling distribution of Taround the population parameter �
• The average of the bootstrapped statistics is simply:
T ⇤ = E(T ⇤) ⇡ÕRb=1T
⇤b
R
where R is the number of bootstraps• The bias of T can be seen as its deviation from the bootstrapaverage (i.e., it estimates T � �)
B⇤ = T ⇤ �T
• The estimated bootstrap variance of T ⇤ is:
V (T ⇤) =ÕRb=1(T ⇤
b � T ⇤)2
R � 1
12 / 64
Bootstrapping with Larger Samples
• The larger the sample, the more e↵ort it is to calculate thebootstrap estimates
• With large sample sizes, the possible number of bootstrap samplesnn gets very large and impractical (e.g., it would take a long timeto calculate 10001000 bootstrap samples)
• typically we want to take somewhere between 1000 and 2000bootstrap samples in order to find a confidence interval of astatistic
• After calculating the standard error, we can easily find theconfidence interval. Three methods are commonly used1. Normal Theory Intervals2. Percentile Intervals3. Bias Corrected Percentile Intervals
13 / 64
Evaluating Confidence Intervals
Accuracy:
• how quickly do coverage errors go to zero?
• Prob{� < Tlo} = � and Prob{� > Tup } = �
• Errors go to zero at a rate of:• 1
n (second-order accurate)• 1p
n (first-order accurate)
Transformation Respecting:
• For any monotone transformation of � , � =m(� ), can we obtainthe right confidence interval on � with the confidence intervals on� mapped by m()? E.g.,
[�lo , �up ] = [m(�lo),m(�up )]
14 / 64
Bootstrap Confidence Intervals: Normal Theory Intervals
• Many statistics are asymptotically normally distributed
• Therefore, in large samples, we may be able to use a normalityassumption to characterize the bootstrap distribution. E.g.,
T ⇤ ⇠ N (T , se2)
where se isq
V (T ⇤)• This approach works well for the bootstrap confidence interval,but only if the bootstrap sampling distribution is approximatelynormally distributed
• In other words, it is important to look at the distribution beforerelying on the normal theory interval
15 / 64
Bootstrap Confidence Intervals: Percentile Intervals
• Uses percentiles of the bootstrap sampling distribution to findthe end-points of the confidence interval
• If G is the CDF of T ⇤, then we can find the 100(1-�)% confidenceinterval with:
⇥
T%,lo , T%,up⇤
=⇥
G�1(�), G�1(1 � �)⇤
• The (1 � 2�) percentile interval can be approximated with:
⇥
T%,lo , T%,up⇤
⇡h
T ⇤(� )B ,T ⇤(1�� )
B
i
where T ⇤(� )B and T ⇤(1�� )
B are the ordered (B) bootstrap replicatessuch that 100�% of them fall below the former and 100�% ofthem fall above the latter.
• These intervals do not assume a normal distribution, but they donot perform well unless we have a large original sample and atleast 1000 bootstrap samples
16 / 64
Bootstrap Confidence Intervals: Bias-Corrected, Accelerated
Percentile Intervals (BCa)• The BCa CI adjusts the confidence intervals for bias due to smallsamples by employing a normalizing transformation through twocorrection factors.
• This is also a percentile interval, but the percentiles are notnecessarily the ones you would think.
• Using strict percentile intervals,⇥
Tlo , Tup⇤
⇡h
T ⇤(� )B ,T ⇤(1�� )
B
i
• Here,⇥
Tlo , Tup⇤
⇡h
T ⇤(�1)B ,T ⇤(�2)
B
i
• �1 , � and �2 , (1 � �1)
�1 = �
✓
z0 +z0 + z(� )
1 � a(Z0 + z(� ))
◆
�2 = �
✓
z0 +z0 + z(1�� )
1 � a(Z0 + z(1�� ))
◆
17 / 64
Bias correction: z0
z0 = ��1
#�
T ⇤b T
B
!
• This just gives the inverse of the normal CDF for proportion ofbootstrap replicates less than T .
• Note that if #�
T ⇤b T
= 0.5, then z0 = 0.
• If T is unbiased, the proportion will be close to 0.5, meaning thatthe correction is close to 0.
18 / 64
Acceleration: a
a =
Õni=1
�
T(·) �T(i)�3
6n
Õni=1
�
T(·) �T(i)�2o
32
• T(i) is the calculation of the original estimate T with eachobservation i jacknifed out in turn.
• T(·) isÕn
i=1T(i )n
• The acceleration constant corrects the bootstrap sample forskewness.
19 / 64
Evaluation Confidence Intervals
Normal Percentile BCaAccuracy 1st order 1st order 2nd orderTransofrmation-respecting No Yes Yes
20 / 64
Number of BS Samples
Now, we can revisit the idea about the number of bootstrap samples.
• The real question here is how precise to you want estimatedquantities to be.
• Often, we’re trying to estimate the 95% confidence interval (sothe 2.5th and 97.5th percentiles of the distribution).
• Estimating a quantity in the tail of a distribution with precisiontakes more observations.
21 / 64
Simple Simulation: Bootstrapping the Mean (Lower CI)
22 / 64
Bootstrapping the Median
The median doesn’t have an exact sampling distribution (thoughthere is a normal approximation). Let’s consider bootstrapping themedian to get a sense of its sampling variability.
set.seed(123)
n <- 25
x <- rchisq(n,1,1)
meds <- rep(NA, 1000)
for(i in 1:1000){meds[i] <- median(x[sample(1:n, n, replace=T)])
}median(x)
## [1] 1.206853
summary(meds)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2838 0.6693 1.2069 1.2897 1.7855 3.7398
23 / 64
Make CIs
• Normal Theory CI:se <- sd(meds)bias <- mean(meds) - median(x)norm.ci <- median(x)- bias + qnorm((1+.95)/2)*se*c(-1,1)
• Percentile CI:med.ord <- meds[order(meds)]pct.ci <- med.ord[c(25,975)]pct.ci <- quantile(meds, c(.025,.975))
• BCa CI:zhat <- qnorm(mean(meds < median(x)))medi <- sapply(1:n, function(i)median(x[-i]))Tdot <- mean(medi)ahat <- sum((Tdot-medi)^3)/(6*sum((Tdot-medi)^2)^(3/2))zalpha <- 1.96z1alpha <- -1.96a1 <- pnorm(zhat + ((zhat + zalpha)/(1-ahat*(zhat + zalpha))))a2 <- pnorm(zhat + ((zhat + z1alpha)/(1-ahat*(zhat + z1alpha))))bca.ci <- quantile(meds, c(a1, a2))a1 <- floor(a1*1000)a2 <- ceiling(a2*1000)bca.ci <- med.ord[c(a2, a1)]
24 / 64
Looking at CIs
mat <- rbind(norm.ci, pct.ci, bca.ci)rownames(mat) <- c("norm", "pct", "bca")colnames(mat) <- c("lower", "upper")round(mat, 3)
## lower upper## norm -0.046 2.294## pct 0.473 2.829## bca 0.473 2.211
25 / 64
With boot package
library(boot)set.seed(123)med.fun <- function(x, inds){
med <- median(x[inds])med
}boot.med <- boot(x, med.fun, R=1000)boot.ci(boot.med)
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS## Based on 1000 bootstrap replicates#### CALL :## boot.ci(boot.out = boot.med)#### Intervals :## Level Normal Basic## 95% (-0.107, 2.331 ) (-0.415, 1.940 )#### Level Percentile BCa## 95% ( 0.473, 2.829 ) ( 0.473, 2.211 )## Calculations and Intervals on Original Scale
26 / 64
BS Di↵erence in Correlations
We can bootstrap the di↵erence in correlations as well.
set.seed(123)data(Mroz)cor.fun <- function(dat, inds){
tmp <- dat[inds, ]cor1 <- with(tmp[which(tmp$hc == "no"), ],
cor(age, lwg, use="complete"))cor2 <- with(tmp[which(tmp$hc == "yes"), ],
cor(age, lwg, use="complete"))cor2-cor1
}boot.cor <- boot(Mroz, cor.fun, R=2000, strata=Mroz$hc)
27 / 64
Bootstrap Confidence Intervals
boot.ci(boot.cor)
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS## Based on 2000 bootstrap replicates#### CALL :## boot.ci(boot.out = boot.cor)#### Intervals :## Level Normal Basic## 95% (-0.0401, 0.2369 ) (-0.0394, 0.2342 )#### Level Percentile BCa## 95% (-0.0351, 0.2385 ) (-0.0350, 0.2386 )## Calculations and Intervals on Original Scale
28 / 64
Bootstrapping Regression Models
There are two ways to do this• Random-X Bootstrapping
• The regressors are treated as random• Thus, we select bootstrap samples directly from the observations
and calculate the statistic for each bootstrap sample
• Fixed-X Bootstrapping• The regressors are treated as fixed - implies that the regression
model fit to the data is “correct”• The fitted values of Y are then the expectation of the bootstrap
• We attach a random error (usually resampled from the residuals)
to each Y which produces the fixed-x bootstrap sample Y ⇤b
• To obtain bootstrap replications of the coe�cients, we regress Y ⇤b
on the fixed model matrix for each bootstrap sample
29 / 64
WVS Data
• secpay: Imagine two secretaries, of the same age, doingpractically the same job. One finds out that the other earnsconsiderably more than she does. The better paid secretary,however, is quicker, more e�cient and more reliable at her job.In your opinion, is it fair or not fair that one secretary is paidmore than the other?
• gini Observed inequality captured by GINI coe�cient at thecountry level.
• democrat Coded 1 if the country maintained “partly free” or“free” rating on Freedom House’s Freedom in the World meausrecontinuously from 1980 to 1995.
30 / 64
gini
secpay
0.4
0.5
0.6
0.7
0.8
0.9
20 30 40 50 60
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Non−DemocacyDemocracy
gini
secpay
0.4
0.5
0.6
0.7
0.8
0.9
20 30 40 50 60
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Non−DemocacyDemocracy
gini
secpay
0.6
0.7
0.8
0.9
20 30 40 50 60
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Non−DemocacyDemocracy
31 / 64
Bootstrapping Regression: Example (1)
• We fit a linear model of secpay on the interaction of gini anddemocracy.
dat <- read.csv("http://quantoid.net/files/reg3/weakliem.txt", header=T)dat <- dat[-c(25,49), ]dat <- dat[complete.cases(
dat[,c("secpay", "gini", "democrat")]), ]mod <- lm(secpay ~ gini*democrat,data=dat)summary(mod)
#### Call:## lm(formula = secpay ~ gini * democrat, data = dat)#### Residuals:## Min 1Q Median 3Q Max## -0.155604 -0.047674 0.004133 0.046637 0.179825#### Coefficients:## Estimate Std. Error t value Pr(>|t|)## (Intercept) 1.059234 0.059679 17.749 < 2e-16 ***## gini -0.004995 0.001516 -3.294 0.00198 **## democrat -0.486071 0.088182 -5.512 1.86e-06 ***## gini:democrat 0.010840 0.002476 4.378 7.53e-05 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Residual standard error: 0.07118 on 43 degrees of freedom## Multiple R-squared: 0.5176,Adjusted R-squared: 0.484## F-statistic: 15.38 on 3 and 43 DF, p-value: 6.087e-07
32 / 64
Fixed-X Bootstrapping
• If we are relatively certain that the model is the “right” model(i.e., it is properly specified), then this makes sense.
boot.reg1 <- Boot(mod, R=500, method="residual")
33 / 64
Fixed-X CIs
perc.ci <- sapply(1:4,function(x){boot.ci(boot.reg1, type=c("perc"),
index = x)$percent})[4:5,]bca.ci <- sapply(1:4,
function(x){boot.ci(boot.reg1, type=c("bca"),index = x)$bca})[4:5,]
colnames(bca.ci) <- colnames(perc.ci) <- names(coef(mod))rownames(bca.ci) <- rownames(perc.ci) <- c("lower", "upper")perc.ci
## (Intercept) gini democrat gini:democrat## lower 0.9427854 -0.008099226 -0.6507048 0.005767702## upper 1.1808116 -0.002214939 -0.3211883 0.015383294
bca.ci
## (Intercept) gini democrat gini:democrat## lower 0.9398972 -0.008080956 -0.6448495 0.005651383## upper 1.1769527 -0.002060481 -0.3079673 0.015282963
34 / 64
Random-X resampling (1)
• Random-X resampling is also referred to as observationresampling
• It selects B bootstrap samples (i.e., resamples) of theobservations, fits the regression for each one, and determines thestandard errors from the bootstrap distribution
• This is done fairly simply in R using the boot function from theboot library, too.
35 / 64
Random-X Resampling (2)
boot.reg2 <- Boot(mod, R=500, method="case")perc.ci <- sapply(1:4,
function(x){boot.ci(boot.reg2, type=c("perc"),index = x)$percent})[4:5,]
bca.ci <- sapply(1:4,function(x){boot.ci(boot.reg2, type=c("bca"),index = x)$bca})[4:5,]
# change row and column names of output
colnames(bca.ci) <- colnames(perc.ci) <- names(coef(mod))rownames(bca.ci) <- rownames(perc.ci) <- c("lower", "upper")# print output
perc.ci
## (Intercept) gini democrat gini:democrat## lower 0.9373337 -0.007712521 -0.6668725 0.004825119## upper 1.1586946 -0.001223048 -0.2716579 0.015987613
bca.ci
## (Intercept) gini democrat gini:democrat## lower 0.9598786 -0.008503679 -0.6943984 0.005732083## upper 1.1854891 -0.001937200 -0.3025729 0.016770086
36 / 64
What if we Retained the Outliers?
37 / 64
Bootstrap Disrtibution
n <- c("Intercept", "GINI", "Democrat", "Gini*Democrat")par(mfrow=c(2,2))for(j in 1:4){plot(density(boot.reg1$t[,j]), xlab="T*",ylab="Density", main = n[j])
lines(density(boot.reg2$t[,j]), lty=2)}
38 / 64
The Bootstrap Distribution
0.5 0.6 0.7 0.8 0.9 1.0 1.1
01
23
4
Intercept
T*
Density
−0.5 0.0 0.5
0.0
0.5
1.0
1.5
GINI
T*
Density
−0.6 −0.4 −0.2 0.0 0.2 0.4
0.0
0.5
1.0
1.5
2.0
2.5
Democrat
T*
Density
−1.0 −0.5 0.0 0.5 1.0 1.5
0.0
0.2
0.4
0.6
0.8
1.0
Gini*Democrat
T*
Density
39 / 64
Jackknife-after-Bootstrap
• The jackknife-after-bootstrap provides a diagnostic of thebootstrap by allowing us to examine what would happen to thedensity if particular cases were deleted
• Given that we know that there are outliers, this diagnostic isimportant
• The jackknife-after-bootstrap plot is produced using thejack.after.boot function in the boot package. Once again, theindex=2 argument asks for the results for the slope coe�cient inthe model
jack.after.boot(boot.reg1, index=2)
40 / 64
Jackknife-after-Bootstrap (2)
• The jack.after.boot plot is constructed as follows:• the horizontal axis represents the standardized jackknife value (the
standardized value of the di↵erence with that observation takenout)
• The vertical axis represents various quantiles of the bootstrapstatistic
• Horizontal lines in the graph represent the bootstrap distributionat the various quantiles
• Case numbers are labeled at the bottom of the graph so that eachobservation can be identified
41 / 64
R code for Jack After Boot
par(mfrow=c(2,2))jack.after.boot(boot.reg1, index=1)jack.after.boot(boot.reg1, index=2)jack.after.boot(boot.reg1, index=3)jack.after.boot(boot.reg1, index=4)
par(mfrow=c(2,2))jack.after.boot(boot.reg2, index=1)jack.after.boot(boot.reg2, index=2)jack.after.boot(boot.reg2, index=3)jack.after.boot(boot.reg2, index=4)
42 / 64
−2 −1 0 1 2
−0.3
−0.2
−0.1
0.0
0.1
0.2
standardized jackknife value
5, 1
0, 1
6, 5
0, 8
4, 9
0, 9
5 %−i
les
of (T
*−t)
** * * **
*** **
** **
***
**** *
****** *****
** ***
****
* * ***
*
**
* * ***** ******
***
*********** ***
**** ***
** *** * * **
*
** * * ***** **** ** *
** *********** ***
**** ***** *** *
* ** *
** * * ***** **** ** *** **** ******* ******* ***** *** * * ** *
** * * ***** **** ** *** **** ****
*** ******* ***** *** * * ** **
* * * ***** **** **
*** **** **
***** ******* ***** *** * * ** **
* * * *****
**** ***** *
***
******* **
***** *
**** *
***
* ** *
49 20 6 4 32 48 47 36 16 10
2 12 5 35 34 41 39 8 21 26
25 31 19 17 37 33 44 42 11 30
9 14 22 7 45 15 46 13 23 29
40 24 43 18 1 27 3 38 28
−2 −1 0 1 2
−0.6
−0.4
−0.2
0.0
0.2
0.4
standardized jackknife value
5, 1
0, 1
6, 5
0, 8
4, 9
0, 9
5 %−i
les
of (T
*−t)
** * *
** * **** *** **
*****
***
****************
**** **
* * *
* * * *** *****
***
***
**** *** *
********
********
*** ** * * *
* * * *** * **** *** ******* *** ******************** ** * * *
* * * *** * **** *** ******* *** ******************** ** * * *
** * **
** **** *** ******
* *** ********************** *
* *** * **
* * **** *** ******* *** ******************** ** * * ** * * *** *
**** *** ****
**
**** *******
************* *
* * **
16 30 8 23 18 1 31 20 35 19
29 10 11 41 38 15 7 40 32 9
26 17 46 49 45 36 47 24 25 14
28 21 39 3 22 42 4 5 34 2
13 27 33 44 37 48 43 6 12
−2 −1 0 1 2
−0.4
−0.3
−0.2
−0.1
0.0
0.1
0.2
0.3
standardized jackknife value
5, 1
0, 1
6, 5
0, 8
4, 9
0, 9
5 %−i
les
of (T
*−t)
* * * * ** * *** * *
* ******* *** *
********** *********** **
* ** * * * ** * *
** * ** * ****** *** * ********** ******* **** ** * *
* * * * ** * *** * ** * ****** *** * ****
****** ******* **** *** *
* * * * ***
*** * ** * ****** *** * ********** ******* **** ** * *
* * * * ** * *** * **
******* *** *
*******
*** *
****** ****
** * ** * * * ** * *
** * ** * **
**** *** * ********** ******* *
*** ** * *
* * * * ** * *** * ** * **
****
*** ******
***** ******
* ****** *
*
26 8 16 37 39 4 32 34 14 6
29 10 13 5 18 19 48 3 2 12
41 21 15 27 7 20 36 11 40 24
28 49 30 43 47 23 44 42 31 25
46 22 17 38 35 9 331 45
−2 −1 0 1 2
−1.0
−0.5
0.0
0.5
standardized jackknife value
5, 1
0, 1
6, 5
0, 8
4, 9
0, 9
5 %−i
les
of (T
*−t)
* * ** *
** * * **
********** ***
***
*** **
* *** * ****
*****
** *
* * ** *** * * *
*****
****** ****
**
* ** *** *** * ***
** ***
** * *
** ** *
** * * ************ ***
*** * ** *** *** * *** * * **** * * *
* * ** *** * * ************ ****** * ** *** *** * *** * * **** * * *
* * ** *** * * ************ ****** * ** *** *** * *** * * ****
* * *** *
**** * * *****
******* ******
* ** *** *** * ***
* * **** * * **
* ** *** * * *****
******* *****
* * ** *** *** * *** * * ***
** * *
25 6 44 2 48 35 18 21 10 49
12 45 31 36 9 20 5 13 17 41
14 3 34 23 19 32 27 37 46 26
24 42 11 7 47 38 16 22 8 29
1 33 40 4 39 43 15 30 28
43 / 64
−5 −4 −3 −2 −1 0 1 2
−0.4
−0.2
0.0
0.2
standardized jackknife value
5, 1
0, 1
6, 5
0, 8
4, 9
0, 9
5 %−i
les
of (T
*−t)
*
**********
****************************
**** ****
* *
**
*********
***********************
******** * ***** *
** ********************
********************
****
** *
* * **************************************** * **
*** *
** *****
**********************
************* * ****
* **
* ************************
**************** * ***** *
**
**********************
****************** * ***** *
49 4642 33127 191728 1
25 37 16 6 945 1348 15 23
443 18 2440 38358 20 32
395 12 1029117 30 34 22
33 26443614 21241 47
−2 0 2 4
−1.0
−0.5
0.0
0.5
standardized jackknife value
5, 1
0, 1
6, 5
0, 8
4, 9
0, 9
5 %−i
les
of (T
*−t)
* *** ************************************
*******
**
* ***
*******************
*********************
*** *
** *
**
*******************
************************ *
*
* *** **************
******************
*********** * *
* *** ******************
*
***************
********* *
*
* *******
***************
**********************
** *
*
* *** ***************
****
********
*******
********* *
*
22 34 8 7 114531 12 543
32 47 2819 38 106 16 464
1 15 3017 914 344 26 25
20 41 2 35 292736 42 39 49
23 13 2148 4024 18 3337
−2 −1 0 1 2 3 4
−0.4
−0.2
0.0
0.2
0.4
standardized jackknife value
5, 1
0, 1
6, 5
0, 8
4, 9
0, 9
5 %−i
les
of (T
*−t)
* * *** ** **************
**************
******
******
* ** * *** ** **********************
****** ******** ****
**
* * *** ** ************************************ **** *
*
* * *** ** **************************** ******** **** * *
* * *** ** *************
**********
***** *****
*** **** *
*
* * *** ** ********************
******** *
******* **** *
*
* **** **
***************************
********* ****
*
*
22 1 18 458 249 44 6 4
16 34 48 235 1129 42 3 32
17 20 7 30 273614 5 33 25
47 28 2115 463812 37 40 49
23 41 1913103126 43 39
−4 −3 −2 −1 0 1 2
−1.5
−1.0
−0.5
0.0
0.5
1.0
standardized jackknife value
5, 1
0, 1
6, 5
0, 8
4, 9
0, 9
5 %−i
les
of (T
*−t)
*
* * *** *********
***********
*************
***** **** *
*
* * *** *********
**********
********
*********** ***
* *
** * ***
********************
****************** **** *
**
* *** ************************************** *** * *
** * *** *
************************************* *** * ** *
* *** ******************
*******
************* *
** * ** *
* *** ******************
*******
************* *** * *
49 4 3 14 29 8 487 41 17
25 4337 3138 24 4613 34 1
40 3344 12 1130 21 19 47 16
32 6 42 9 2745 15 18 23 22
39 5 26 36 35 102 28 20
44 / 64
Fixed versus random
Why do the samples di↵er?
• Fixed-X resampling enforces the assumption that errors arerandomly distributed by resampling the residuals from a commondistribution
• As a result, if the model is not specified correctly (i.e., there isun-modeled nonlinearity, heteroskedasticity or outliers) theseattributes do not carry over to the bootstrap samples
• the e↵ects of outliers was clear in the random-X case, but not withthe fixed-X bootstrap
45 / 64
Bootstrapping the Loess Curve
We could also use bootstrapping to get the Loess Curve. First, let’stry the fixed-x resampling:
library(foreign)dat <- read.dta("http://quantoid.net/files/reg3/jacob.dta")rescale01 <- function(x)(x-min(x, na.rm=T)/diff(range(x, na.rm=T)))seq.range <- function(x)seq(min(x), max(x), length=250)dat$perot <- rescale01(dat$perot)dat$chal <- rescale01(dat$chal_vote)s <- seq.range(dat$perot)df <- data.frame(perot = s)lo.mod <- loess(chal ~ perot,
data=dat, span=.75)resids <- lo.mod$residualsyhat <- lo.mod$fittedlo.fun <- function(dat, inds){
boot.e <- resids[inds]boot.y <- yhat + boot.etmp.lo <- loess(boot.y ~ perot,
data=dat, span=.75)preds <- predict(tmp.lo, newdata=df)
preds}boot.perot1 <- boot(dat, lo.fun, R=1000)
46 / 64
Bootstrapping the Loess Curve (2)
Now, random-X resampling
lo.fun2 <- function(dat, inds){tmp.lo <- loess(boot.y ~ perot,
data=dat[inds,], span=.75)preds <- predict(tmp.lo, newdata=df)preds
}boot.perot2 <- boot(dat, lo.fun, R=1000)
47 / 64
Make Confidence Intervals
out1 <- sapply(1:250, function(i)boot.ci(boot.perot1,index=i, type="perc")$perc)
out2 <- sapply(1:250, function(i)boot.ci(boot.perot2,index=i, type="perc")$perc)
preds <- predict(lo.mod, newdata=df,se=T)lower <- preds$fit - 1.96*preds$seupper <- preds$fit + 1.96*preds$se
ci1 <- t(out1[4:5, ])ci2 <- t(out2[4:5, ])ci3 <- cbind(lower, upper)
plot(s, preds$fit, type = "l",ylim=range(c(ci1, ci2, ci3)))
poly.x <- c(s, rev(s))polygon(poly.x, y=c(ci1[,1], rev(ci1[,2])),
col=rgb(1,0,0,.25), border=NA)polygon(poly.x, y=c(ci2[,1], rev(ci2[,2])),
col=rgb(0,1,0,.25), border=NA)polygon(poly.x, y=c(ci3[,1], rev(ci3[,2])),
col=rgb(0,0,1,.25), border=NA)legend("topleft",
c("Random-X", "Fixed-X", "Analytical"),fill = rgb(c(1,0,0), c(0,1,0), c(0,0,1),
c(.25,.25,.25)), inset=.01)
48 / 64
Confidence Interval Graph
5 10 15 20 25 30
1020
3040
s
preds$fit
Random−XFixed−XAnalytical
49 / 64
When Might Bootstrapping Fail?
1. Incomplete Data: since we are trying to estimate a populationdistribution from the sample data, we must assume that themissing data are not problematic. It is acceptable, however, touse bootstrapping if multiple imputation is used beforehand
2. Dependent data: bootstrapping assumes independence whensampling with replacement. It should not be used if the data aredependent
3. Outliers and influential cases: if obvious outliers are found, theyshould be removed or corrected before performing the bootstrap(especially for random-X). We do not want the simulations todepend crucially on particular observations
50 / 64
Cross-Validation (1)
• If no two observations have the same Y , a p-variable model fit top + 1 observations will fit the data precisely
• this implies, then, that discarding much of a dataset can lead tobetter fitting models
• Of course, this will lead to biased estimators that are likely to givequite di↵erent predictions on another dataset (generated with thesame DGP)
• Model validation allows us to assess whether the model is likelyto predict accurately on future observations or observations notused to develop this model
• Three major factors that may lead to model failure are:1. Over-fitting2. Changes in measurement3. Changes in sampling
• External validation involves retesting the model on new datacollected at a di↵erent point in time or from a di↵erentpopulation
• Internal validation (or cross-validation) involves fitting andevaluating the model carefully using only one sample
51 / 64
Cross-Validation (2)
• A basic, but powerful, tool for statistical model building
• Cross-validation is similar to bootstrapping in that it resamplesfrom the original data
• The basic form involves randomly dividing the sample into twosubsets:
• The first subset of the data (screening sample) is used to select orestimate a statistical model
• The second subset is then used to test the findings
• Can be helpful in avoiding capitalizing on chance and over-fittingthe data - i.e., findings from the first subset may not always beconfirmed by the second subsets
• Cross-validation is often extended to use several subsets (either apreset number chosen by the researcher or leave-one-outcross-validation)
52 / 64
Cross-Validation (3)
• The data are split into k subsets (usually 3 k 10), but canalso use leave-one-out cross-validation
• Each of the subsets are left out in turn, with the regression runon the remaining data
• Prediction error is then calculated as the sum of the squarederrors:
RSS =’
(Yi � Yi )2
• We choose the model with the smallest average “error”
MSE =
Õ(Yi � Yi )2n
• We could also look to the model with the largest average R2
53 / 64
Cross-Validation (4)
How many observations should I leave out from each fit?
• There is no rule on how many cases to leave out, but Efron(1983) suggests that grouped cross-validation (withapproximately 10% of the data left out each time) is better thanleave-one-out cross-validation
Number of repetitions
• Harrell (2001:93) suggests that one may need to leave 110 of the
sample out 200 times to get accurate estimates
Cross-validation does not validate the complete sample
• External validation, on the other hand, validates the model on anew sample
• Of course, limitations in resources usually prohibits externalvalidation in a single study
54 / 64
Cross-Validation in R
• Cross-validation is done easily using the cv.glm function in theboot package
dat <- read.csv("http://www.quantoid.net/files/reg3/weakliem.txt",
header=T)
dat <- dat[-c(25,49), ]
mod1 <- glm(secpay ~ poly(gini, 3)*democrat, data=dat)
mod2 <- glm(secpay ~ gini*democrat, data=dat)
deltas <- NULL
for(i in 1:25){deltas <- rbind(deltas, c(
cv.glm(dat, mod1, K=5)$delta,
cv.glm(dat, mod2, K=5)$delta)
)}colMeans(deltas)
## [1] 0.008985782 0.008295404 0.005647222 0.005528784
55 / 64
Cross-validating Span in Loess
We could use cross-validation to tell us something about the span inour Loess model.
1. First, split the sample into K groups (usually 10).2. For each of the k = 10 groups, estimate the model on the other 9
and get predictions for the omitted groups observations. Do thisfor each of the 10 subsets in turn.
3. Calculate the CV error: 1nÕ(�i � �i )2
4. Potentially, do this lots of times and average across the CV error.
set.seed(1)n <- 400x <- 0:(n-1)/(n-1)f <- 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10y <- f + rnorm(n, 0, sd = 2)tmp <- data.frame(y=y, x=x)lo.mod <- loess(y ~ x, data=tmp, span=.75)
56 / 64
Minimizing CV Criterion Directly
best.span <- optimize(cv.lo2, c(.05,.95), form=y ~ x, data=tmp,numiter=5, K=10)
best.span
## $minimum## [1] 0.2271807#### $objective## [1] 3.912508
There is also a canned function in fANCOVA that optimizes the spanvia AICc or GCV.
library(fANCOVA)best.span2 <- loess.as(tmp$x, tmp$y, criterion="gcv")best.span2$pars$span
## [1] 0.1483344
57 / 64
The Curve
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●●●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
−50
510
x
y
CVGCV
58 / 64
Manually Cross-Validating � in the Y-J Transform
Sometimes, optimizing the cross-validation criterion fails.
• Randomness in the CV procedure can produce a function thathas several local minima.
• You could force the same random split at every evaluation byhand-coding the CV, but this might not be the best idea.
• If optimization of the CV criterion fails, you could always do itmanually.
59 / 64
Example
cvoptim_yj <- function(pars, form, data, trans.vars, K=5, numiter=10){require(boot)
require(VGAM)form <- as.character(form)for(i in 1:length(trans.vars)){
form <- gsub(trans.vars[i], paste("yeo.johnson(", trans.vars[i],",", pars[i], ")", sep=""), form)
}form <- as.formula(paste0(form[2], form[1], form[3]))m <- glm(as.formula(form), data, family=gaussian)d <- lapply(1:numiter, function(x)cv.glm(data, m, K=K))mean(sapply(d, function(x)x$delta[1]))
}
lams <- seq(0,2, by=.1)s <- sapply(lams, function(x)cvoptim_yj(x, form=prestige ~ income + education + women,
data=Prestige, trans.vars="income", K=3))plot(s ~ lams, type="l", xlab=expression(lambda), ylab="CV Error")
60 / 64
Figure
0.0 0.5 1.0 1.5 2.0
5560
6570
75
λ
CV
Erro
r
61 / 64
Summary and Conclusions
• resampling techniques are powerful tools for estimating standarderrors from small samples or when the statistics we are using donot have easily determined standard errors
• Bootstrapping involves taking a “new” random sample (withreplacement) from the original data
• We then calculate bootstrap standard errors and statistical testsfrom the average of the statistic from the bootstrap samples
• Jackknife resampling takes new samples of the data by omittingeach case individually and recalculating the statistic each time
• Cross-validation randomly splits the sample into two groups,comparing the model results from one sample to the results fromthe other
62 / 64
Readings
Today: Resampling and Cross-validation
⇤ Fox (2008) Chapter 21⇤ Stone (1974)� Efron and Tibshirani (1993)� Davison and Hinkley (1997)� Ronchetti, Field and Blanchard (1997)
63 / 64
Davison, Anthony C. and D.V. Hinkley. 1997. Bootstrap Methods and Their Applications. Cambridge:Cambridge University Press.
Efron, Bradley and Robert Tibshirani. 1993. An Introduction to the Bootstrap. New York: Chapman &Hall.
Fox, John. 2008. Applied Regression Analysis and Generalized Linear Models, 2nd edition. ThousandOaks, CA: Sage, Inc.
Ronchetti, Elvezio, Christopher Field and Wade Blanchard. 1997. “Robust Linear Model Selection byCross-validation.” Journal of the American Statistical Association 92(439):1017–1023.
Stone, Mervyn. 1974. “Cross-validation and Assessment of Statistical Predictions.” Journal of the Royal
Statistical Society, Series B 36(2):111–147.
64 / 64