36
Prediction, Estimation, and Attribution Bradley Efron [email protected] Department of Statistics Stanford University

Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Prediction, Estimation,and Attribution

Bradley [email protected]

Department of StatisticsStanford University

Page 2: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

RegressionGauss (1809), Galton (1877)

› Predictionrandom forests, boosting, support vector machines,neural nets, deep learning

› EstimationOLS, logistic regression, GLM: MLE

› Attribution (significance)

ANOVA, lasso, Neyman–Pearson

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 2 36

Page 3: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

EstimationNormal Linear Regression

› Observe

yi = —i + ›i for i = 1; : : : ; n

—i = xti˛

xi a p-dimensional covariate

›i ‰ N (0; ff2)

˛ unknown

yn

= Xnˆp

˛p

+ εn

› Surface plus noise y = µ(x) + ε

› Surface fµ(x); x 2 Xg: codes scientific truth (hidden by noise)

› Newton’s second law acceleration = force / mass

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 3 36

Page 4: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

mass

force

acceleration

Newton's 2nd law: acceleration=force/mass

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 4 36

Page 5: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

mass

force

Acceleration

If Newton had done the experiment

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 5 36

Page 6: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

ExampleThe Cholesterol Data

› n = 164 men took cholostyramine

› Observe (ci; yi)

ci = normalized compliance (how much taken)

yi = reduction in cholesterol

› Model yi = xti˛ + ›i

xti = (1; ci; c2i ; c

3i ) ›i ‰ N (0; ff2)

› n = 164, p = 4

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 6 36

Page 7: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

●●

● ●

● ●

●●

●●

−2 −1 0 1 2

−20

020

4060

8010

0

OLS cubic regression: cholesterol decrease vs normalized compliance;bars show 95% confidence intervals for the curve.

sigmahat=21.9; only intercept and linear coefs significantnormalized compliance

chol

este

rol d

ecre

ase

Adj Rsquared =.481

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 7 36

Page 8: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Neonate Example

› n = 800 babies in an African facility

› 600 lived, 200 died

› 11 covariates: apgar score, body weight, . . .

› Logistic regression n = 800, p = 11

glm( y800‰ X

800ˆ11; binomial)

yi = 1 or 0 as baby dies or lives

xi = ith row of X (vector of 11 covariates)

› Linear logistic surface, Bernoulli noise

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 8 36

Page 9: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Output of logistic regression programpredictive error 15%

estimate st.error z-value p-valuegest `:474 .163 `2:91 .004**ap `:583 .110 `5:27 .000***bwei `:488 .163 `2:99 .003**resp .784 .140 5.60 .000***cpap .271 .122 2.21 .027*ment 1.105 .271 4.07 .000***rate `:089 .176 `:507 .612hr .013 .108 .120 .905head .103 .111 .926 .355gen `:001 .109 `:008 .994temp .015 .124 .120 .905

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 9 36

Page 10: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Prediction AlgorithmsRandom Forests, Boosting, Deep Learning, . . .

› Data d = f(xi; yi); i = 1; 2; : : : ; ngyi = response

xi = vector of p predictors

(Neonate: n = 800, p = 11, y = 0 or 1)

› Prediction rule f(x; d)

New (x, ?) gives y = f(x; d)

› Strategy Go directly for high predictive accuracy;forget (mostly) about surface + noise

› Machine learning

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 10 36

Page 11: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Classification Using Regression Trees

› n cases: n0 = “0” and n1 = “1”

› p predictors (features)(Neonate: n = 800; n0 = 600; n1 = 200; p = 11)

› Split into two groups with predictor and split value chosento maximize difference in rates

› Then split the splits, etc.. . . (some stopping rule)

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 11 36

Page 12: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

|cpap< 0.6654

gest>=−1.672 gest>=−1.941

ap>=−1.343

resp< 1.210544/73

13/11

039/29

113/32

15/22

11/40

<− lived died −>

Classification Tree: 800 neonates, 200 died ( <<−− lived died −−>> )

worst bin●

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 12 36

Page 13: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Random ForestsBreiman (2001)

1. Draw a bootstrap sample of original n cases

2. Make a classification tree from the bootstrap data set exceptat each split use only a random subset of the p predictors

3. Do all this lots of times (ı 1000)

4. Prediction rule For any new x predict y = majority ofthe 1000 predictions

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 13 36

Page 14: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

The Prostate Cancer Microarray Study

› n = 100 men: 50 prostate cancer, 50 normal controls

› For each man measure activity of p = 6033 genes

› Data set d is 100ˆ 6033 matrix (“wide”)

› Wanted: Prediction rule f(x; d) that inputs new 6033-vector xand outputs y correctly predicting cancer/normal

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 14 36

Page 15: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Random Forestsfor Prostate Cancer Prediction

› Randomly divide the 100 subjects into

“training set” of 50 subjects (25 + 25)

“test set” of the other 50 (25 + 25)

› Run R program randomforest on the training set

› Use its rule f(x; dtrain) on the test set and see how manyerrors it makes

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 15 36

Page 16: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

0 100 200 300 400 500

0.0

0.1

0.2

0.3

0.4

0.5

Prostate cancer prediction using random forestsBlack is cross−validated training error, Red is test error rate

number trees

erro

rtrain err 5.9%test err 2.0%

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 16 36

Page 17: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

0 100 200 300 400

0.0

0.1

0.2

0.3

0.4

0.5

Now with boosting algorithm 'gbm'

# tree

err.r

ate

error ratestrain 0%, test=4%

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 17 36

Page 18: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Now using deep learning (“Keras”)# parameters = 780; 738

loss

acc

0 100 200 300 400 500

5

10

0.5

0.6

0.7

0.8

0.9

1.0

epoch

data

training

validation

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 18 36

Page 19: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Prediction is Easier than Estimation

› Observe

x1; x2; x3; : : : ; x25ind‰ N (—; 1)

—x = mean,?x = median

› Estimation

E

(—`?x)2

ffffiEn

(—` —x)2o

= 1:57

› Wish to predict new X0 ‰ N (—; 1)

› Prediction

E

(X0 `?x)2

ffffiEn

(X0 ` —x)2o

= 1:02

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 19 36

Page 20: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Prediction is Easier than Attribution

› Microarray study N genes: zjind‰ N (‹j; 1); j = 1; 2; : : : ; N

N0 : ‹j = 0 (null genes)

N1 : ‹j > 0 (non-null)

› New subject’s microarray: xj ‰ N (˚‹j; 1)

8><>:+ sick

` healthy

› Prediction

Possible if N1 = O„N

1=20

«

› AttributionRequires N1 = O(N0)

› Prediction allows accrual of “weak learners”

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 20 36

Page 21: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Prediction and Medical Science

› Random forest test set predictions made only 1 error out of 50!

› Promising for diagnosis

› Not so much for scientific understanding

› Next“Importance measures” for the predictor genes

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 21 36

Page 22: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

●●

● ●

● ● ●● ● ●

● ● ● ● ● ●

● ● ● ●●

● ● ● ● ● ●● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0 20 40 60 80 100

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Importance measures for genes in randomForest prostate analysis;Top two genes # 1022 and 5569

index

Impo

rtan

ce

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 22 36

Page 23: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Were the Test Sets Really a Good Test?

› Prediction can be highly context-dependent and fragile

› Before Randomly divided subjects into “training” and “test”

› Next50 earliest subjects for training

50 latest for test

both 25 + 25

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 23 36

Page 24: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

0 100 200 300 400 500

0.0

0.1

0.2

0.3

0.4

0.5

Random Forests: Train on 50 earliest, Test on 50 latest subjects;Test error was 2%, now 24%

number trees

erro

rtrain err 0%test err 24%

before 2%

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 24 36

Page 25: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

0 100 200 300 400

0.0

0.1

0.2

0.3

0.4

0.5

Same thing for boosting (gbm)Test error now 29%, was 4%

# tree

err.r

ate

error ratestrain 0, test=29%

●before 4%

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 25 36

Page 26: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Truth, Accuracy, and Smoothness

› Estimation and Attribution: seek long-lasting scientific truths

physics

astronomy

medicine

economics?

› Prediction algorithms: truths and ephemeral relationships

credit scores

movie recommendations

image recognition

› Estimation and Attribution: theoretical optimality(MLE, Neyman–Pearson)

› Prediction training-test performance

› Nature: rough or smooth?

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 26 36

Page 27: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

−2 −1 0 1 2

020

4060

80

Cholesterol data: randomForest estimate (X=poly(c,8)), 500 trees,compared with cubic regression curve

compliance

chol

dec

reas

eAdj R2 cubic .482RandomForest .404

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 27 36

Page 28: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

−2 −1 0 1 2

020

4060

80

Now using boosting algorithm gbm

green dashed curve: 8th degree poly fit, adjRsq=.474adjusted compliance

chol

este

rol r

educ

tion

Cubic adjRsq .482gbm crossval Rsq .461

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 28 36

Page 29: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Estimation v. Prediction Algorithms

1 Surface plus noise Direct prediction

2 Scientific truth Empirical prediction efficiency(eternal or at least long-lasting) (could be ephemeral, e.g., commerce)

3 Xnˆp

: p < n (p moderate) p > n (both possibly huge, “n = all”)

4 X chosen parsimoniously Anti-parsimony(main effects fl interactions) (algorithms expand X)

5 Parametric modeling Mostly nonparametric(condition on x’s; smoothness) ((x; y) pairs iid)

6 Homogeneous data (RCT) Very large heterogeneous data sets

7 Theory of optimal estimation Training and test sets(MLE) (CTF, asymptotics)

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 29 36

Page 30: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Estimation and Attributionin the Wide-Data Era

› Large p (the number of features) affects Estimation

MLS can be badly biased for individual parameters

“surface” if, say, p = 6033?

› Attribution still of interest

› GWAS n = 4000; p = 500; 000

› Two-sample p-values for each SNP

› Plotted: ` log10(p)

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 30 36

Page 31: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 31 36

Page 32: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Attribution and Estimationfor the Prostate Cancer Study

› Xnˆp

: n = 100 men (50 + 50), p = 6033 genes

genei gives zi ‰ N (‹i; 1)

‹i = effect size

› Local false discovery rate fdr(zi) = Prf‹i = 0 j zig› Effect size estimate E(zi) = Ef‹i j zig

Bayes and empirical Bayes

locfdr

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 32 36

Page 33: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

−6 −4 −2 0 2 4 6

−4

−2

02

4

fdr(z) and E{effect size|z}, prost data; Triangles:Red the 29 genes with fdr<.2; Green the 1st 29 glmnet genes

at z=4: fdr=.22 and E{del|z}=2.3z value

E{d

el|z

}

4*fdr

E{del|z}

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 33 36

Page 34: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Sparse Models and the Lasso

› We want to use OLS — min ky `X˛k2 — but p is too big

› Instead minimize ky `X˛k2 + –Pp

1

˛^j˛

Large – gives sparse ^

glmnet does this for logistic regression

› In between classical OLS and boosting algorithms

› Have it both ways?

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 34 36

Page 35: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

Two Trends

› Making prediction algorithms better for scientific use

smoother

more interpretable

less brittle

› Making traditional estimation/attribution methods betterfor large-scale (n; p) problems

less fussy

more flexible

better scaled

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 35 36

Page 36: Prediction, Estimation, and Attributionstatweb.stanford.edu/~ckirby/brad/talks/2019Predict-Estimat-Attribut.pdf · Attribution(signi˝cance) ANOVA,lasso,Neyman Pearson BradleyEfron,StanfordUniversityPrediction,Estimation,andAttribution2

References

Algorithms Hastie, Tibshirani and Friedman (2009). TheElements of Statistical Learning 2nd ed.

Random Forests Breiman (2001). “Random forests.”

Data Science Donoho (2015). “50 years of data science.”

CART Breiman, Friedman, Olshen and Stone (1984).Classification and Regression Trees.

locfdr Efron (2010). Large-Scale Inference.

Lasso & glmnet Friedman, Hastie and Tibshirani (2010).“Regularization paths for generalized linear models viacoordinate descent.”

Bradley Efron, Stanford University Prediction, Estimation, and Attribution 36 36