36
Cross Validation and Penalized Linear Regression 1 Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Many slides attributable to: Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani (ISL/ESL books) Prof. Mike Hughes

04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Cross Validation and Penalized Linear Regression

1

Tufts COMP 135: Introduction to Machine Learninghttps://www.cs.tufts.edu/comp/135/2019s/

Many slides attributable to:Erik Sudderth (UCI)Finale Doshi-Velez (Harvard)James, Witten, Hastie, Tibshirani (ISL/ESL books)

Prof. Mike Hughes

Page 2: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

CV & Penalized LR Objectives

• Regression with transformations of features

• Cross Validation

• L2 penalties

• L1 penalties

3Mike Hughes - Tufts COMP 135 - Spring 2019

Page 3: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

What will we learn?

4Mike Hughes - Tufts COMP 135 - Spring 2019

SupervisedLearning

Unsupervised Learning

Reinforcement Learning

Data, Label PairsPerformance

measureTask

data x

labely

{xn, yn}Nn=1

Training

Prediction

Evaluation

Page 4: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

5Mike Hughes - Tufts COMP 135 - Spring 2019

Task: RegressionSupervisedLearning

Unsupervised Learning

Reinforcement Learning

regression

x

y

y is a numeric variable e.g. sales in $$

Page 5: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

6Mike Hughes - Tufts COMP 135 - Spring 2019

Review: Linear Regression

minw,b

NX

n=1

⇣yn y(xn, w, b)

⌘2

Optimization problem: “Least Squares”

Exact formula for optimal values of w, b exist!

Math works in 1D and for many dimensions

[w1 . . . wF b]T = (XTX)1

X

Ty

X =

2

664

x11 . . . x1F 1x21 . . . x2F 1

. . .

xN1 . . . xNF 1

3

775

Page 6: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Recap: solving linear regression• More examples than features (N > F)

• Same number of examples and features (N=F)

• Fewer examples than features (N < F) or low rank

7Mike Hughes - Tufts COMP 135 - Spring 2019

Then:Infinitely many optimal weight vectors exist with zero errorInverse of X^T X does not exist (naïvely, formula will fail)

And if inverse of X^T X exists (needs to be full rank):Then an optimal weight vector exists, can use formulaWill have zero error on training set.

And if inverse of X^T X exists (needs to be full rank)Then an optimal weight vector exists, can use formulaLikely has non-zero error (overdetermined)

Page 7: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Recap

• Squared error is special• Exact formulas for estimating parameters

• Most metrics do not have exact formulas• Take derivative, set to zero, try to solve, …. HARD!• Example: absolute error

• General algorithm: Gradient Descent!• As long as first derivative exists, we can do

iterations to estimate optimal parameters

8Mike Hughes - Tufts COMP 135 - Spring 2019

Page 8: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

9Mike Hughes - Tufts COMP 135 - Spring 2019

Transformations of Features

Page 9: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Fitting a line isn’t always ideal

10Mike Hughes - Tufts COMP 135 - Spring 2019

Page 10: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Can fit linear functions to nonlinear features

11Mike Hughes - Tufts COMP 135 - Spring 2019

y(xi) = ✓0 + ✓1xi + ✓2x2i + ✓3x

3i

A nonlinear function of x:

Can be written as a linear function of

“Linear regression” means linear in the parameters (weights, biases)

Features can be arbitrary transforms of raw data

(xi) = [1 xi x2i x

3i ]

y(xi) =4X

g=1

✓gφg(xi) = ✓

Tφ(xi)

Page 11: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

What feature transform to use?• Anything that works for your data!

• sin / cos for periodic data

• polynomials for high-order dependencies

• interactions between feature dimensions

• Many other choices possible

12Mike Hughes - Tufts COMP 135 - Spring 2019

(xi) = [1 xi x2i . . .]

(xi) = [1 xi1xi2 xi3xi4 . . .]

Page 12: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

13Mike Hughes - Tufts COMP 135 - Spring 2019

Linear Regression with Transformed Features

Optimization problem: “Least Squares”

Exact solution:

y(xi) = ✓

Tφ(xi)

(xi) = [1 1(xi) 2(xi) . . .G1(xi)]

min✓PN

n=1(yn ✓

Tφ(xi))

2

⇤ = (T)−1Ty

=

2

6664

1 1(x1) . . . G1(x1)1 1(x2) . . . G1(x2)...

. . .

1 1(xN ) . . . G1(xN )

3

7775

N x G matrix

Page 13: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Cross Validation

14Mike Hughes - Tufts COMP 135 - Spring 2019

Page 14: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

15Mike Hughes - Tufts COMP 135 - Spring 2019

Generalize: sample to population

Page 15: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Labeled dataset

16Mike Hughes - Tufts COMP 135 - Spring 2019

x yEach row represents one example

Assume rows are arranged “uniformly at random” (order doesn’t matter)

Page 16: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Split into train and test

17Mike Hughes - Tufts COMP 135 - Spring 2019

x y

train

test

Page 17: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Model Complexity vs Error

18Mike Hughes - Tufts COMP 135 - Spring 2019

OverfittingUnderfitting

Page 18: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

How to fit best model?

19Mike Hughes - Tufts COMP 135 - Spring 2019

Option: Fit on train, select on validation1) Fit each model to training data2) Evaluate each model on validation data3) Select model with lowest validation error4)Report error on test set

train

test

validation

x y

Page 19: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

How to fit best model?

20Mike Hughes - Tufts COMP 135 - Spring 2019

Option: Fit on train, select on validation1) Fit each model to training data2) Evaluate each model on validation data3) Select model with lowest validation error4)Report error on test set

train

test

validation

x y

Concerns• Will train be too small?• Make better use of data?

Page 20: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Estimating Heldout Error with Fixed Validation Set

21Mike Hughes - Tufts COMP 135 - Spring 2019Credit: ISL Textbook, Chapter 5

10 other random splitsSingle random split

Page 21: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

3-fold Cross Validation

22Mike Hughes - Tufts COMP 135 - Spring 2019

train

validation

x y x yx y

x yDivide labeled dataset into 3 even-sized parts

Fit model 3 independent times.Each time leave one fold as validation and keep remaining as training

fold 1

fold 2

fold 3

Heldout error estimate: average of the validation error across all 3 fits

Page 22: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

K-fold CV: How many folds K?

• Can do as low as 2 fold • Can do as high as N-1 folds (“Leave one out”)• Usual rule of thumb: 5-fold or 10-fold CV

• Computation runtime scales linearly with K• Larger K also means each fit uses more train data,

so each fit might take longer too

• Each fit is independent and parallelizable

23Mike Hughes - Tufts COMP 135 - Spring 2019

Page 23: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

24Mike Hughes - Tufts COMP 135 - Spring 2019

9 separate splitsEach one with 10 foldsLeave-one-out (N-1 folds)

Credit: ISL Textbook, Chapter 5

Estimating Heldout Error with Cross Validation

Page 24: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

What to do about underfitting?

• Increase model complexity • Add more features!

25Mike Hughes - Tufts COMP 135 - Spring 2019

Page 25: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

What to do about overfitting?

• Select complexity with cross validation

• Control single-fit complexity with a penalty!

26Mike Hughes - Tufts COMP 135 - Spring 2019

Page 26: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Zero degree polynomial

27Mike Hughes - Tufts COMP 135 - Spring 2019

Credit: Slides from course by Prof. Erik Sudderth (UCI)

Page 27: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

1st degree polynomial

28Mike Hughes - Tufts COMP 135 - Spring 2019

Credit: Slides from course by Prof. Erik Sudderth (UCI)

Page 28: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

3rd degree polynomial

29Mike Hughes - Tufts COMP 135 - Spring 2019

Credit: Slides from course by Prof. Erik Sudderth (UCI)

Page 29: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

9th degree polynomial

30Mike Hughes - Tufts COMP 135 - Spring 2019

Credit: Slides from course by Prof. Erik Sudderth (UCI)

Page 30: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Error vs Complexity

31Mike Hughes - Tufts COMP 135 - Spring 2019

polynomial degree

sqrtof

meansquared

errror

Page 31: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

32Mike Hughes - Tufts COMP 135 - Spring 2019

Polynomial degree0 1 3 9

Credit: Slides from course by Prof. Erik Sudderth (UCI)

Page 32: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Idea: Penalize magnitude of weights

33Mike Hughes - Tufts COMP 135 - Spring 2019

J(✓) =1

2

NX

n=1

(yn ✓

Txn)

2 + ↵

X

f

2f

↵ 0Penalty strength:

Larger alpha means we prefer smaller magnitude weights

Page 33: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Idea: Penalize magnitude of weights

34Mike Hughes - Tufts COMP 135 - Spring 2019

J(✓) =1

2

NX

n=1

(yn ✓

Txn)

2 + ↵

X

f

2f

J(✓) =1

2(y X✓)T (y X✓) + ↵✓

T✓

Written via matrix/vector product notation:

Page 34: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Exact solution forL2 penalized linear regression

35Mike Hughes - Tufts COMP 135 - Spring 2019

Optimization problem: “Penalized Least Squares”

min✓

1

2(y X✓)T (y X✓) + ↵✓

T✓

Solution:

⇤ = (XTX + ↵I)−1

X

Ty

If alpha > 0 , this is always invertible!

Page 35: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Slides on L1/L2 penalties

See slides 71-82 from UC-Irvine course here:https://canvas.eee.uci.edu/courses/8278/files/2735313/

36Mike Hughes - Tufts COMP 135 - Spring 2019

Page 36: 04 penalized linear regression - cs.tufts.edu€¦ · Prof. Mike Hughes. CV & Penalized LR Objectives •Regression with transformations of features •Cross Validation •L2 penalties

Pair Coding Activityhttps://github.com/tufts-ml-courses/comp135-19s-assignments/blob/master/labs/GradientDescentDemo.ipynb

• Try existing gradient descent code:• Optimizes scalar slope to produce minimum error• Try step sizes of 0.0001, 0.02, 0.05, 0.1

• Add L2 penalty with alpha > 0• Write calc_penalized_loss and calc_penalized_grad• What happens to estimated slope value w?

• Repeat with L1 penalty with alpha > 0

37Mike Hughes - Tufts COMP 135 - Spring 2019