Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note:...

201ab Quantitative methodsResampling: Cross Validation

Ed Vul

Resampling

Using our existing data to generate possible samples and thusobtain a sampling distribution:

I Bootstrap: of a statitistc for confidence intervals.I Randomization: under the null for NHST.I Cross Validation: for prediction.

The problem: overfitting

−2.5

0.00 0.25 0.50 0.75 1.00x

9th order polynomial for10 data points

−2.5

0.00 0.25 0.50 0.75 1.00x

I Complex models can fitweird patterns.

I They will fit noise, not justsignal

I Fitting noise yields terribleprediction performance,even though “fit” toobserved data looks verygood.

Overfitting yields worse prediction error

−2.5

0.00 0.25 0.50 0.75 1.00x

Overfitting yields worse prediction error

−2.5

0.00 0.25 0.50 0.75 1.00x

We want to. . .

I know how well our model will predict new data, not just howwell it fits observed data/noise.

I pick models that will predict new data well, and not overfit.

But we obviously have not yet seen future data.

Solution: Hold out / validation data

I Use part of existing data as though we have not seen it: Splitthe data into two sets:training used to fit the modeltest (“holdout”) to evaluate the model

I Doing this once is ok if we have a lot of data, so both trainingand test sets can be big even with split.

I With little data we will have lots of variability in evaluation.

Cross-validation

We will do the hold-out process a bunch of times on the same datato try to reduce noise in our test-set performance.

This gives us a better estimate of prediction accuracy for the modelclass (but not any one particular set of parameter values!).

Hold-out: example

dat <- dat %>%mutate(use_as = ifelse((1:n())%%2==1,'train', 'test'))

training_data = dat %>%filter(use_as == 'train')

test_data = dat %>%filter(use_as == 'test')

x y use_as0.00 -0.21 train0.11 -0.93 test0.22 -0.93 train0.33 0.65 test0.44 -1.06 train0.56 0.11 test0.67 2.40 train0.78 1.29 test0.89 0.99 train1.00 0.94 test

Hold-out: example

I fit model on training dataM = lm(data = training_data, y~poly(x, 3))

I generate predictions on test dataprediction = predict(M, test_data)

I Measure prediction error. Here: as sum of squared errors.sum((test_data$y - prediction)ˆ2)

## [1] 7.616142

Train vs test performance as function of complexity

poly.order train.SSE test.SSE1 2.52 5.862 1.88 23.673 0.94 587.044 0.00 15690.46

Note: 10 total datapoints, splitting into 5 train, 5 test. Over andover again. (More on this later)

Leave-one-out cross-validation

Run hold out n times for n data points. Each time use one datapoint as the test data, and the remaining n-1 datapoints as training.

n = nrow(dat)loo_error = rep(NA, n)for(i in 1:n){training_data = dat[(1:n)[-i], ]test_data = dat[i,]M = lm(data = training_data, y~poly(x, 3))prediction = predict(M, test_data)loo_error[i] = (test_data$y - prediction)ˆ2

0 1 2 3error

mean(loo_error)

## [1] 0.9759489

Varieties of cross-validation

I Repeated random sub-sampling (suitable for larger sample sizesand replicates)

I Leave k out (LOO: k=1): exhaustive, for small sample sizesI K-fold (LOO: k=n)

For both fitting and evaluation:- Nested cross-validation.

There are lots of varieties of error/fit measures depending on whatyou are after.

Larger-scale example: data

## Rows: 251## Columns: 14## $ bf.percent <dbl> 12.3, 6.1, 25.3, 10.4, 28.7, 20.9, 19.2, 12.4, 4.1, 11.7...## $ age <dbl> 23, 22, 22, 26, 24, 24, 26, 25, 25, 23, 26, 27, 32, 30, ...## $ weight <dbl> 154.25, 173.25, 154.00, 184.75, 184.25, 210.25, 181.00, ...## $ height <dbl> 67.75, 72.25, 66.25, 72.25, 71.25, 74.75, 69.75, 72.50, ...## $ neck <dbl> 36.2, 38.5, 34.0, 37.4, 34.4, 39.0, 36.4, 37.8, 38.1, 42...## $ chest <dbl> 93.1, 93.6, 95.8, 101.8, 97.3, 104.5, 105.1, 99.6, 100.9...## $ abdomen <dbl> 85.2, 83.0, 87.9, 86.4, 100.0, 94.4, 90.7, 88.5, 82.5, 8...## $ hip <dbl> 94.5, 98.7, 99.2, 101.2, 101.9, 107.8, 100.3, 97.1, 99.9...## $ thigh <dbl> 59.0, 58.7, 59.6, 60.1, 63.2, 66.0, 58.4, 60.0, 62.9, 63...## $ knee <dbl> 37.3, 37.3, 38.9, 37.3, 42.2, 42.0, 38.3, 39.4, 38.3, 41...## $ ankle <dbl> 21.9, 23.4, 24.0, 22.8, 24.0, 25.6, 22.9, 23.2, 23.8, 25...## $ bicep <dbl> 32.0, 30.5, 28.8, 32.4, 32.2, 35.7, 31.9, 30.5, 35.9, 35...## $ forearm <dbl> 27.4, 28.9, 25.2, 29.4, 27.7, 30.6, 27.8, 29.0, 31.1, 30...## $ wrist <dbl> 17.1, 18.2, 16.6, 18.2, 17.7, 18.8, 17.7, 18.8, 18.2, 19...

Large-scale example: Modelslm.model = lm(bf.percent ~ .)svr.model = e1071::svm(bf.percent ~ ., cross=0)lm2.model = lm(bf.percent ~ polym(age,

weight,height,neck,chest,abdomen,hip,thigh,knee,ankle,bicep,forearm,wrist,raw = T,degree = 2))

Leave-50-out random sub-samplingRMSE = function(true_y, predicted_y){sqrt(mean((predicted_y - true_y)ˆ2))

n = nrow(dat)k = 50repetitions = 100errors = rep(NA, repetitions)for(i in 1:repetitions){test_idx = sort(sample(n, k, replace=F))train_idx = (1:n)[-test_idx]test_dat = dat[test_idx,]train_dat = dat[train_idx,]M = lm(data = train_dat, bf.percent ~ .)pred_y = predict(M, test_dat)errors[i] = RMSE(test_dat$bf.percent, pred_y)

Leave-50-out random sub-sampling: Results

lm lm2 svrmodel

test.err

train.err

Resampling: Cross-validation

Goal: estimate prediction accuracy/error on future data withoutactually having data from the future.

Strategy: Repeat many times:

I Split existing data into training and test setI fit model to training set, evaluate error on test set.

Resampling: Bootstrap

Goal: quantify sampling error in some statistic to get confidenceintervals.

Strategy:

I Generate new hypothetical samples of the same size as existingsample by resampling from it (with replacement!).

I Calculate statistic on each sample to obtain many samples ofthe sampling distribution of statistic.

I Use that to get confidence intervals via quantile function.

Resampling: Randomization

Goal: test a null hypothesis that some structure/regularity does notexist in the data.

Strategy:

I Define statistic to measure structureI Define shuffling (sampling without replacement) process to

destroy only that structure.I Repeat many times: statistic(shuffle(data)) to obtain many

samples of the distribution of statistic under null.I Calculate p value from samples.

Resampling recap

Randomization Shuffle data to obtain sampling distribution ofstatistic under the null, and thus test null hypothesis.

Bootstrapping Resample current data to obtain samplingdistribution of statistic, and thus get a confidence interval.

Cross-validation Subsample existing data into training and test toestimate prediction performance on unseen data.

Resampling: beware

You have lots of responsibility here. Lots of room make a mistake,and only check / catch it if mistake is unfavorable.

Questions?

Resampling: CrossValidation EdVul · 2021. 1. 7. · 4 0.00 15690.46 Note:...

Documents

Southeast Red Angus Association Grasstime€¦ · Lot 2 offers a complete outcross pedigree to nearly every Red Angus herd in ... This young cow has a lot of money making opportunities

Microsoft Word - Grade 5Test - paper I. Web viewExamination for Grade Five Students for Selection to Schools and ... • Each question from number 1 to 40 is provided with ... Microsoft

T S - Bouchard Livestock International Catalogue WEB.pdf · Dr Phil sons that are an outcross for many breeders. ... DAY MS RED PEPPER 839K BH LIMITED BLACK 015K MARTINS 315H TNT

Herd Established 1925 Stud Established 1933 HEREFORDS 42nd ... · outcross; that is the quickest way you can lose uniformity and consistency in your program. ... They’ve been measuring

How to improve the statistical power of the 10-fold crossvalidation scheme in Recommender Systems

HAMLET LIMOUSIN PROVEN SIRE - · PDF fileHAMLET Outcross genetics giving calves of phenominal quality. Suitable for pedigree breeders or crossbreeding. LODGE HAMLET PROGENY OF HAMLET

Unique Outcross Genetics 2010 :: Brandl Cattle Co

Week 2, Lecture 4 - Does my model work? Crossvalidation ... · 39 / 44 Example-GeneExpressionSignatures Figure4.Mostprognostictranscriptionalsignalsarecorrelated with meta-PCNA. A)

REPRESENTING THE WORLD s BEST SEEDSTOCK Register by 10 … · Sired by the great Solution, the eternally dominant Rose family on her topside is enhanced by the outcross TM Dazzler

Original de Stan Shubel - GUPPY - PASTOR€¦ · cruzando um guppy cinza com um ... dorsal também combina. ... ter que reverter para o outcross. A maioria dos cruzamentos é melhor

Improving Service Availability of Cloud Systems by ... · RQ3: How effective is the proposed ranking model? 26. Conclusion • Point out the CrossValidation-guided prediction does

sydenstricker 12-06sydgen.com/documents/pdf/2006/1206.pdfthe main sire summary that also ranks in the top 5% of proven sires for CED, SC, $W, and $B. He is the logical outcross to

Bias-variance trade-off. Crossvalidation. Regularization. · 2015. 3. 17. · CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics P. Posˇ´ık

Fokstrategieën met outcross - Dogsconnected.nl...1 Fokstrategieën met outcross Ras en kruising Outcross, oftewel het doelbewust kruisen van rassen, is een soort heiligschennis in

Outcross · 2016-04-27 · Nace Outcross Boxer Project de la mano de un grupo de criadores que comparten las mismas preocupaciones con el objetivo de abrir un periodo de reflexión

Automated Arabic Text Classification with P-Stemmer ... · Automated Arabic Text Classification with P-Stemmer, Machine ... 10-fold crossvalidation- and the Wilcoxon signed-rank test,

ANGUS SIRES - Bowen Stud€¦ · Page 40 - Bowen Poll Herefords & Angus Sale Catalogue 2018 Millwillah Lad E158 New Zealand outcross, excells in structure. The highest data performance

S 25 - sunrisesale.blog€¦ · Outcross mit +2823 GTPI & +3545 GLPI Timberlake x Dynamo x Eclair x Doorman GbR Güldenpfennig, Stendal (RA) Du Louvion Hawai EX-94-6YR-FR European

LEVELS TEST B 3–5TEST B LEVELS 3–5 PAGE MARKS 5 7 9 11 13 15 17 19 TOTAL First Name Last Name School Work as quickly and as carefully as you can. You have45 minutesfor this test

presentatie aankeuringen en lookalikes CC 2018 09 10 ...€¦ · vereniging(en) Een voorstel van de subwerkgroep outcross en aankeuren look-a-likes Aankeuren: administratie • Bijlage