49
1 The Lasso: a brief review and a new significance test Robert Tibshirani, Stanford University PIMS-UBC Statistics Constance Van Eeden Lecture 2014 Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.

The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

1

The Lasso: a brief review and a new significancetest

Robert Tibshirani, Stanford University

PIMS-UBC Statistics Constance Van Eeden Lecture2014

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 2: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

2

Is Statistics boring?

The Statistician and the Biologist

They are both being executed, and are each granted one last request.

The Statistician asks that he be allowed to give one last lecture on his grandtheory of statistics.

The Biologist asks that he be executed first.

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 3: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

2

Is Statistics boring?

The Statistician and the Biologist

They are both being executed, and are each granted one last request.

The Statistician asks that he be allowed to give one last lecture on his grandtheory of statistics.

The Biologist asks that he be executed first.

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 4: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

2

Is Statistics boring?

The Statistician and the Biologist

They are both being executed, and are each granted one last request.

The Statistician asks that he be allowed to give one last lecture on his grandtheory of statistics.

The Biologist asks that he be executed first.

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 5: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

2

Is Statistics boring?

The Statistician and the Biologist

They are both being executed, and are each granted one last request.

The Statistician asks that he be allowed to give one last lecture on his grandtheory of statistics.

The Biologist asks that he be executed first.

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 6: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

3

Statistics is a hot field!

• our enrollments (Masters and PhD) are way up

• graduates find jobs in academia, drug companies, finance, and now: Google,FaceBook, Twitter etc.

• PhD graduates can step into Asst Professor positions right away, oftenwithout doing a postdoc

• Other fields– math, physics, biology etc– are not nearly as well off

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 7: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

4

Outline of this talk

• Sparsity and the Lasso- motivated by a problem in cancer surgery

• A new significance test for the Lasso

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 8: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

5

Sparsity and the Lasso

For motivation, a real problem:

• I am currently working in a cancer diagnosis project with co-workers atStanford;

• They have collected samples of tissue from a number of patients undergoingsurgery for stomach cancer.

• We are working to build a classifier than can distinguish three kinds of tissue:normal epithelial, normal stromal and cancer.

• Such a classifier could be used to assist surgeons in determining, in real time,whether they had successfully removed all of the tumor. Current pathologisterror rate for this call is about 20%.

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 9: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

6

Normal margin

Cancer

Stromal

Epithelial

Left in patient

Extracted part

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 10: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

7

Details

• 20 patients, each contributing a sample of epithelial, stromal and cancertissue. At each pixel in the image, the intensity of metabolites is measured bya new type of mass spectometry. Peaks in the spectrum representing differentmetabolites.

• The spectrum has been finely sampled, with the intensity measured at about11, 000 m/z sites across the spectrum, and 8000 pixels.

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 11: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

8

The data for one patient

��������������������������������������������������������������������������������������������������������������

��������������������������������������������������������������������������������������������������������������

������������������������������������������������

������������������������������������������������

������������������������������

������������������������������

Epithelial

Stromal

Cancer

Spectrum sampled at 11,000 m/z values

Spectrum for each pixel

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 12: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

9

The overall data

Patient 3

Patient 4

... Patient 20

−−− 11,000 m/z sites −−−−

Pixels

1.3

1

2

13

Patient 2

Patient 1

7.6

X

3=Cancer

1=Epithelial, 2= StromalLabel of pixel:Y

9.4

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 13: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

10

What we need to solve this problem

• A statistical classifier (algorithm) that sorts through the large number offeatures, and finds the most informative ones. This is called a sparse set offeatures.

• Then it must use these features in combination to accurately classify pixels.

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 14: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

11

The Lasso• Regression problem: We observe n feature-response pairs (xi , yi ), where xi is

a p-vector and yi is real-valued.

• Let xi = (xi1, xi2, . . . xip)

• Consider a linear regression model:

yi = β0 +∑j

xijβj + εi

yi= class of pixel i , (1,2, or 3); xij = height of spectrum at site j for pixel i .εi is an error term. βj is the weight given to height of spectrum at site j(not quite appropriate since yi is actually unordered)

• Least squares fitting is defined by

minimizeβ0,β

1

2

∑i

(yi − β0 −∑j

xijβj)2

• This won’t work when p > n (why not)?

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 15: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

12

The Lasso— continued

The Lasso is an estimator defined by the following optimization problem:

minimizeβ0,β

1

2

∑i

(yi − β0 −∑j

xijβj)2 subject to

∑|βj | ≤ s

• Penalty =⇒ sparsity (feature selection)

• Convex problem (good for computation and theory)

• Ridge regression uses penalty∑

j β2j ≤ s and does not yield sparsity

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 16: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

13

Why does the lasso give a sparse solution?

β^ β^2. .β

1

β2

β1β

Lasso∑

j |βj | ≤ s Ridge∑

j β2j ≤ s

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 17: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

14

Prostate cancer exampleN = 88, p = 8. Predicting log-PSA, in men after prostate cancer surgery

Shrinkage Factor s

Coe

ffici

ents

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

6

lcavol

lweight

age

lbph

svi

lcp

gleason

pgg45

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 18: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

15

Back to our problem

• K = 3 classes (epithelial vs stromal vs normal): multinomial model

logPr(Yi = k |x)

1− Pr(Yi = K |x)= β0k +

∑j

xijβjk , k = 1, 2, . . .K

Here xij is height of spectrum for sample i at jth m/z position

• We replace the least squares objective function by one that is appropriate fora 3-class outcome (no details here)

• Add lasso penalty∑|βj | ≤ s; optimize, using cross-validation to estimate

best value for budget s.

• yields a pixel classifier, and also reveals which m/z sites are informative.

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 19: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

16

Fast computation is essential

• Our lab has written a open-source R language package called glmnet forfitting lasso models

• It is very fast- can solve the current problem in a few minutes on a PC

• Not “off-the shelf”: Many clever computational tricks were used to achievethe impressive speed

Jerry Friedman Trevor Hastie

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 20: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

17

Results

Cross-validation- min at 129 peaks; overall error rate= 4.2%

Predictedtrue Epi Canc Strom Prop correctEpi 3277.00 80.00 145.00 0.94Canc 73.00 5106.00 13.00 0.98Strom 79.00 86.00 2409.00 0.94

Test set: overall error rate =5.7%

Predictedtrue Epi Canc Strom Prop correctEpi 1606.00 149.00 19.00 0.91Canc 23.00 1622.00 5.00 0.98Strom 67.00 5.00 1222.00 0.94

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 21: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

18

Peak # m/z value Epi Canc Strom1 101.52 107.5 0.093 110.5 0.44 0.444 123.55 132.5 0.066 134.5 0.10 0.107 135.5 0.13 0.138 137.5 -0.01 -0.019 145.5 -0.71

10 146.5 -0.4111 151.5 -0.15 0.35 0.3512 157.5 -0.07 -0.0713 170.514 171.515 174.5 0.0516 175.5 0.14 0.1417 179.518 188.519 212.520 214.5 -0.13 -0.1321 215.5 -1.17 -1.1722 222.5 0.2923 224.524 225.525 231.526 244.5 -0.0127 247.5 -0.31 -0.3128 258.5 0.2129 270.5 -0.14 -0.1430 278.5 0.3231 279.5 0.3932 280.533 285.534 289.535 293.5 0.2536 297.537 299.538 301.5 -0.45 0.21 0.2139 312.5 0.10 -0.44 -0.4440 322.541 324.5 0.0642 325.5 0.03 -0.00 -0.0043 333.5 0.82 0.8244 340.5 -0.02 -0.0245 341.546 347.5 0.0147 349.548 353.5 0.0549 355.5 0.0250 363.5 0.08 0.0851 364.5 -0.1052 365.5 0.48 0.4853 391.5 0.14 0.1454 392.5 0.14 0.1455 393.5 0.26 0.2656 394.5 -0.1257 412.5 -0.00 -0.0058 417.5 -0.0459 423.560 437.561 443.5 0.16 0.1662 444.5 0.08 0.0863 451.564 455.5 -0.07 -0.0765 496.566 505.5 -0.05 -0.0567 536.5 0.40 0.4068 614.5 0.29 0.2969 620.5 -0.02

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 22: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

19

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

patient= 167 at m/z= 788.6

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

true

●●●●●●●

●●●●●●

●●●●

●●●●●

●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

predicted

●●●●●●●

●●●●●●

●●●●

●●●●●

●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●

epithelial

cancer

stromal

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 23: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

20

A Significance test for the lasso

Lockhart, Taylor, Ryan Tibshirani, Rob TibshiraniTo appear, Annals of Statistics 2014 with discussion

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 24: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

21

 

 

Richard Lockhart Jonathan Taylor Ryan Tibshirani Simon Fraser University                  Stanford  University   Asst Prof, CMU

Vancouver    PhD  Student  of  Keith  Worsley   PhD Student of Taylor, PhD . Student of David Blackwell,   Stanford 2011

Berkeley, 1979                                                

 

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 25: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

22

Setup and basic question

• Given an outcome vector y ∈ Rn and a predictor matrix X ∈ Rn×p, weconsider the usual linear regression setup:

y = Xβ∗ + σε, (1)

where β∗ ∈ Rp are unknown coefficients to be estimated, and thecomponents of the noise vector ε ∈ Rn are i.i.d. N(0, 1).

• What we show today: how to provide a p-value for each variable as it isadded to lasso model via the LAR algorithm

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 26: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

23

Forward stepwise regression

• This procedure enters predictors one a time, choosing the predictor that mostdecreases the residual sum of squares at each stage.

• Defining RSS to be the residual sum of squares for the model containing kpredictors, and RSSnull the residual sum of squares before the kth predictorwas added, we can form the usual statistic

Rk = (RSSnull − RSS)/σ2 (2)

(with σ assumed known for now), and compare it to a χ21 distribution.

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 27: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

24

Simulated example- Forward stepwise- F statistic

0 2 4 6 8 10

05

1015

Chi−squared on 1 df

Test

sta

tistic

N = 100, p = 10, true model null

Test is too liberal: for nominal size 5%, actual type I error is 39%.Can get proper p-values by sample splitting: but messy, loss of power

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 28: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

25

Quick review of least angle regression

Least angle regression is a method for constructing the path of lasso solutions.

A more democratic version of forward stepwise regression.

• find the predictor most correlated with the outcome,

• move the parameter vector in the least squares direction until some otherpredictor has as much correlation with the current residual.

• this new predictor is added to the active set, and the procedure is repeated.

• If a non-zero coefficient hits zero, that predictor is dropped from the activeset, and the process is restarted. [This is “lasso” mode, which is what weconsider here.]

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 29: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

26

Least angle regression

−1 0 1 2

−1

01

23

45

− log(λ)

Coe

ffici

ents

λ1 λ2 λ3 λ4 λ5

1

1

1

1

3

3

3

2

2

4

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 30: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

27

The covariance test statisticSuppose that we want a p-value for predictor 2, entering at the 3rd step.

−1 0 1 2

−1

01

23

45

− log(λ)

Coe

ffici

ents

λ1 λ2 λ3 λ4 λ5

1

1

1

1

3

3

3

2

2

4

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 31: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

28

Compute covariance at λ4: 〈y,Xβ̂(λ4)〉

−1 0 1 2

−1

01

23

45

− log(λ)

Coe

ffici

ents

λ1 λ2 λ3 λ4 λ5

1

1

1

1

3

3

3

2

2

4

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 32: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

29

Drop x2, yielding active yet A; refit at λ4, and compute resulting covariance at λ4,giving

T =(〈y,Xβ̂(λ4)〉 − 〈y,XAβ̂A(λ4)〉

)/σ2

−1 0 1 2

−1

01

23

45

− log(λ)

Coe

ffici

ents

λ1 λ2 λ3 λ4 λ5

1

1

1

1

3

3

3

2

2

4

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 33: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

30

The covariance test statistic: formal definition

• Suppose that we wish to test significance of predictor that enters LARS at λj .

• Let A be the active set before this predictor added

• Let the estimates at the end of this step be β̂(λj+1)

• We refit the lasso, keeping λ = λj+1 but using just the variables in A. This

yields estimates β̂A(λj+1). The proposed covariance test statistic is definedby

Tj =1

σ2·(〈y,Xβ̂(λj+1)〉 − 〈y,XAβ̂A(λj+1)〉

). (3)

• measures how much of the covariance between the outcome and the fittedmodel can be attributed to the predictor which has just entered the model.

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 34: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

31

Main result

Under the null hypothesis that all signal variables are in the model:

Tj =1

σ2·(〈y,Xβ̂(λj+1)〉 − 〈y,XAβ̂A(λj+1)〉

)→ Exp(1)

as p, n→∞.

More details to come

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 35: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

32

Comments on the covariance test

Tj =1

σ2·(〈y,Xβ̂(λj+1)〉 − 〈y,XAβ̂A(λj+1)〉

). (4)

• Generalization of standard χ2 or F test, designed for fixed linear regression,to adaptive regression setting.

• Exp(1) is the same as χ22/2; its mean is 1, like χ2

1: overfitting due to adaptiveselection is offset by shrinkage of coefficients

• Form of the statistic is very specific- uses covariance rather than residual sumof squares (they are equivalent for projections)

• Covariance must be evaluated at specific knot λj+1

• Applies when p > n or p ≤ n: although asymptotic in p, Exp(1) seem to be avery good approximation even for small p

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 36: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

33

Simulated example- Lasso- Covariance statisticN = 100, p = 10, true model null

0 1 2 3 4 5 6 7

05

1015

Exp(1)

Test

sta

tistic

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 37: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

34

Example: Prostate cancer data

Stepwise Lasso

lcavol 0.000 0.000lweight 0.000 0.052

svi 0.041 0.174lbph 0.045 0.929

pgg45 0.226 0.353age 0.191 0.650lcp 0.065 0.051

gleason 0.883 0.978

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 38: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

35

Simplifications

• For any design, the first covariance test T1 can be shown to equalλ1(λ1 − λ2).

• For orthonormal design, Tj = λj(λj − λj+1) for all j ; for general designs,Tj = cjλj(λj − λj+1)

• For orthonormal design, λj = |β̂(j)|, the jth largest univariate coefficient inabsolute value. Hence

Tj = (|β̂(j)|(|β̂(j)| − |β̂(j+1)|). (5)

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 39: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

36

Rough summary of theoretical results

Under somewhat general conditions, after all signal variables are in the model,distribution of T for kth null predictor → Exp(1/k)

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 40: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

37

Theory for orthogonal case

Global null case: first predictor to enter

Recall that in this setting,

Tj = λj(λj − λj+1)

and λj = |β̂(j)|, β̂j ∼ N(0, 1)

So the question is:

Suppose V1 > V2 . . . > Vn are the order statistics from a χ1 distribution (absolutevalue of a standard Gaussian).

What is the asymptotic distribution of V1(V1 − V2)?

[Ask Richard Lockhart!]

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 41: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

38

Theory for orthogonal caseGlobal null case: first predictor to enter

Lemma

Lemma 1: Top two order statistics: Suppose V1 > V2 . . . > Vp are the orderstatistics from a χ1 distribution (absolute value of a standard Gaussian) withcumulative distribution function F (x) = (2Φ(x)− 1)1(x > 0), where Φ(x) isstandard normal cumulative distribution function. Then

V1(V1 − V2)→ Exp(1). (6)

Lemma

Lemma 2: All predictors. Under the same conditions as Lemma 1,

(V1(V1 − V2), . . . ,Vk(Vk − Vk+1))→ (Exp(1),Exp(1/2), . . .Exp(1/k))

Proof uses a uses theorem from de Haan & Ferreira (2006). We were unable tofind these remarkably simple results in the literature.Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 42: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

39

Heuristically, the Exp(1) limiting distribution for T1 can be seen as follows:

• The spacings |β̂(1)| − |β̂(2)| have an exponential distribution asymptotically,

while |β̂(1)| has an extreme value distribution with relatively small variance.

• It turns out that |β̂(1)| is just the right (stochastic) scale factor to give thespacings an Exp(1) distribution.

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 43: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

40

General X results

Under appropriate condition on X, as p,N →∞,

1 Global null case: T1 = λ1(λ1 − λ2)→ Exp(1).

2 Non-null case: After the k strong signal variables have entered, under thenull hypothesis that the rest are weak,

Tk+1

n,p→∞≤ Exp(1)

Jon Taylor: “Something magical happens in the math”

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 44: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

41

Conditions on X

• The main condition is that for each (j , sj) the random variable Mj,s(g) growssufficiently fast.

• A sufficient condition: for any j , we require the existence of a subset S notcontaining j such that the variables Ui , i ∈ S are not too correlated, in thesense that the conditional variance of any one on all the others is boundedbelow. This subset S has to be of size at least log p.

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 45: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

42

HIV mutation dataN = 1057 samplesp = 217 mutation sites (xij=0 or 1)y = a measure of drug resistance

2 4 6 8 10

0.0

0.4

0.8

Step

Pva

lue

Forward stepwiseLasso

2 4 6 8 10

040

00

Step

Test

err

or

The data were randomly divided 50 times into training and test sets of size (150,907).

Top row shows the training set p-values for forward stage regression and the lasso. The

bottom panels show the test set error for the models of each size.

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 46: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

43

Case of Unknown σ

LetWk =

(〈y ,X β̂(λk+1)〉 − 〈y ,XAβ̂A(λk+1)〉

). (7)

and assuming n > p, let σ̂2 =∑n

i=1(yi − µ̂full)2/(n − p). Then asymptotically

Fk =Wk

σ̂2∼ F2,n−p (8)

[Wj/σ2 is asymptotically Exp(1) which is the same as χ2

2/2, (n − p) · σ̂2/σ2 isasymptotically χ2

n−p and the two are independent.]

When p > n, σ2 miust be estimated with more care.

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 47: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

44

Extensions

• Elastic Net: Tj is simply scaled by (1 + λ2), where λ2 multiplies the `2

penalty.

• Generalized likelihood models:

Tj = [S0I−1/20 Xβ̂(λj+1)− S0

T I−1/20 XAβ̂A(λj+1)]/2

where S0, I0 are null score and information matrices, respectively. Works e.g.for generalized linear models and Cox model.

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 48: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

45

More recent developments

• Exact post-selection inference with the lasso (2013) (Lee et al) in arXiv

• Post-selection adaptive inference for Least Angle Regression and the Lasso(2014) (Taylor et al) in arXiv (provides exact p-values for finite n, p)!

1− Φ(λ1/σ)

1− Φ(λ2/σ)∼ U(0, 1)

• False Discovery Rate Control for Sequential Selection Procedures, withApplication to the Lasso (2013). (G’Sell et al) in arXiv

• In the works: extensions to PCA, group lasso, forward stepwise regression

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?

Page 49: The Lasso: a brief review and a new significance teststatweb.stanford.edu/~tibs/ftp/ubctalk.pdf · 14 Prostate cancer example N = 88;p = 8. Predicting log-PSA, in men after prostate

45

de Haan, L. & Ferreira, A. (2006), Extreme Value Theory: An Introduction,Springer.

Robert Tibshirani, Stanford University The Lasso: a brief review and a new significance test.?