44
©2018 Kevin Jamieson 1 Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 2, 2018

Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

©2018 Kevin Jamieson 1

Linear Regression

Machine Learning – CSE546 Kevin Jamieson University of Washington

Oct 2, 2018

Page 2: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

2

The regression problem

©2018 Kevin Jamieson

# square feet

Sal

e P

rice

Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.}

Training Data:

{(xi, yi)}ni=1

xi 2 Rd

yi 2 R

Page 3: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

3

The regression problem

©2018 Kevin Jamieson

# square feet

Sal

e P

rice

Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.}

Training Data:

{(xi, yi)}ni=1

xi 2 Rd

yi 2 R

Hypothesis: linear

Loss: least squares

yi ⇡ xTi w

minw

nX

i=1

�yi � xT

i w�2

best linear fit

Page 4: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

4©2018 Kevin Jamieson

The regression problem in matrix notation

y =

2

64y1...yn

3

75 X =

2

64xT1...xTn

3

75

= argminw

(y �Xw)T (y �Xw)

bwLS = argminw

nX

i=1

�yi � xT

i w�2

Page 5: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

5©2018 Kevin Jamieson

The regression problem in matrix notation

= argminw

(y �Xw)T (y �Xw)

bwLS = argminw

||y �Xw||22

Pwl 2Xtly Xu O

xtyXTXw_If I exists then I Xix x'y

Page 6: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

6©2018 Kevin Jamieson

The regression problem in matrix notation

= (XTX)�1XTy

bwLS = argminw

||y �Xw||22

What about an offset?

bwLS ,bbLS = argminw,b

nX

i=1

�yi � (xT

i w + b)�2

= argminw,b

||y � (Xw + 1b)||22

Page 7: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

7©2018 Kevin Jamieson

Dealing with an offset

bwLS ,bbLS = argminw,b

||y � (Xw + 1b)||22

til

Pw 2 XT y Xu 1b O

XT w X'Ib

Pb 2 IT ly Hut 1b 0 I I n

Ii LIXw t n b IT II Yib a Isi t IX w

Page 8: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

8©2018 Kevin Jamieson

Dealing with an offset

If XT1 = 0 (i.e., if each feature is mean-zero) then

bwLS = (XTX)�1XTY

bbLS =1

n

nX

i=1

yi

XTX bwLS +bbLSXT1 = XTy

1TX bwLS +bbLS1T1 = 1Ty

bwLS ,bbLS = argminw,b

||y � (Xw + 1b)||22

I XwFwtx

T µ 2xi mu O o

i 54 34 µL 3

i

Given a new x predict wT X µ 15

Page 9: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

9©2018 Kevin Jamieson

The regression problem in matrix notation

= (XTX)�1XTy

bwLS = argminw

||y �Xw||22

But why least squares?

Consider yi = xTi w + ✏i where ✏i

i.i.d.⇠ N (0,�2)

P (y|x,w,�) = exp Y

Page 10: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

10

Maximizing log-likelihood

Maximize:

©2018 Kevin Jamieson

logP (D|w,�) = log( 1p2⇡�

)nnY

i=1

e�(yi�xT

i w)2

2�2

Page 11: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

11

MLE is LS under linear model

©2018 Kevin Jamieson

bwLS = argminw

nX

i=1

�yi � xT

i w�2

if yi = xTi w + ✏i and ✏i

i.i.d.⇠ N (0,�2)

bwMLE = argmaxw

P (D|w,�)

bwLS = bwMLE = (XTX)�1XTY

Page 12: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

12

Analysis of error

©2018 Kevin Jamieson

if yi = xTi w + ✏i and ✏i

i.i.d.⇠ N (0,�2)

bwMLE = (XTX)�1XTY

Y = Xw + ✏

xtxTxTfXw c

EET w TtxxfxE ELeet o

ELIE wki wyJ EfcxixtxieetxcxixijcxixTXTIEeetxcxixl

t.ae5www.otxtx5

Page 13: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

13

Analysis of error

©2018 Kevin Jamieson

if yi = xTi w + ✏i and ✏i

i.i.d.⇠ N (0,�2)

bwMLE = (XTX)�1XTY

Y = Xw + ✏

= (XTX)�1XT (Xw + ✏)

= w + (XTX)�1XT ✏

Cov( bwMLE) = E[( bw � E[ bw])( bw � E[ bw])T ] = (XTX)�1

bwMLE ⇠ N (w, (XTX)�1)

Page 14: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

14

The regression problem

©2018 Kevin Jamieson

# square feet

Sal

e P

rice

Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.}

Training Data:

{(xi, yi)}ni=1

xi 2 Rd

yi 2 R

Hypothesis: linear

Loss: least squares

yi ⇡ xTi w

minw

nX

i=1

�yi � xT

i w�2

best linear fit

Page 15: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

15

The regression problem

©2018 Kevin Jamieson

date of sale

Sal

e P

rice

Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.}

Training Data:

{(xi, yi)}ni=1

xi 2 Rd

yi 2 R

Hypothesis: linear

Loss: least squares

yi ⇡ xTi w

minw

nX

i=1

�yi � xT

i w�2

best linear fit

Page 16: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

16

The regression problem

Training Data:

{(xi, yi)}ni=1

xi 2 Rd

yi 2 R

Hypothesis: linear

Loss: least squares

yi ⇡ xTi w

minw

nX

i=1

�yi � xT

i w�2

Transformed data:

Page 17: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

17

The regression problem

Training Data:

{(xi, yi)}ni=1

xi 2 Rd

yi 2 R

Hypothesis: linear

Loss: least squares

yi ⇡ xTi w

minw

nX

i=1

�yi � xT

i w�2

Transformed data:

in d=1:

h : Rd ! Rp maps originalfeatures to a rich, possiblyhigh-dimensional space

hj(x) =1

1 + exp(uTj x)

hj(x) = (uTj x)

2

for d>1, generate {uj}pj=1 ⇢ Rd

hj(x) = cos(uTj x)

h(x) =

2

6664

h1(x)h2(x)

...hp(x)

3

7775=

2

6664

xx2

...xp

3

7775

Page 18: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

7 51

a is

tx

Ext s tix next I0

Given x predict iuthlx he.la xk T

xinhai L Iri Xix x'y

Page 19: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

18

The regression problem

Training Data:

{(xi, yi)}ni=1

xi 2 Rd

yi 2 R

Hypothesis: linear

Loss: least squares

yi ⇡ xTi w

minw

nX

i=1

�yi � xT

i w�2

Transformed data:

h(x) =

2

6664

h1(x)h2(x)

...hp(x)

3

7775

Hypothesis: linear

Loss: least squares

yi ⇡ h(xi)Tw w 2 Rp

minw

nX

i=1

�yi � h(xi)

Tw�2

Page 20: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

19

The regression problem

Training Data:

{(xi, yi)}ni=1

xi 2 Rd

yi 2 RTransformed data:

h(x) =

2

6664

h1(x)h2(x)

...hp(x)

3

7775

Hypothesis: linear

Loss: least squares

yi ⇡ h(xi)Tw w 2 Rp

minw

nX

i=1

�yi � h(xi)

Tw�2

date of sale

Sal

e P

rice

best linear fit

Page 21: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

20

The regression problem

Training Data:

{(xi, yi)}ni=1

xi 2 Rd

yi 2 RTransformed data:

h(x) =

2

6664

h1(x)h2(x)

...hp(x)

3

7775

Hypothesis: linear

Loss: least squares

yi ⇡ h(xi)Tw w 2 Rp

minw

nX

i=1

�yi � h(xi)

Tw�2

date of sale

Sal

e P

rice

small p fit

Page 22: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

21

The regression problem

Training Data:

{(xi, yi)}ni=1

xi 2 Rd

yi 2 RTransformed data:

h(x) =

2

6664

h1(x)h2(x)

...hp(x)

3

7775

Hypothesis: linear

Loss: least squares

yi ⇡ h(xi)Tw w 2 Rp

minw

nX

i=1

�yi � h(xi)

Tw�2

date of sale

Sal

e P

rice

large p fit

What’s going on here?

Page 23: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

©2018 Kevin Jamieson 22

Bias-Variance Tradeoff

Machine Learning – CSE546 Kevin Jamieson University of Washington

Oct 5, 2018

Page 24: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

Statistical Learning

©2018 Kevin Jamieson

PXY (X = x, Y = y)

Goal: Predict Y given X

Find function ⌘ that minimizes

EXY [(Y � ⌘(X))2] ExLE kz x x

Ux ME Eyfly c IX x

E fit c l x E z Y c IX x 0

2 LELY Ix x 2C O

c Z x E Y lX x

Page 25: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

Statistical Learning

©2018 Kevin Jamieson

PXY (X = x, Y = y)

Goal: Predict Y given X

EXY [(Y � ⌘(X))2]

Find function ⌘ that minimizes

= EX

hEY |X [(Y � ⌘(x))2|X = x]

i

⌘(x) = argminc

EY |X [(Y � c)2|X = x] = EY |X [Y |X = x]

Under LS loss, optimal predictor: ⌘(x) = EY |X [Y |X = x]

Page 26: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

Statistical Learning

©2017 Kevin Jamieson

x

y

PXY (X = x, Y = y)

EXY [(Y � ⌘(X))2]

Page 27: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

Statistical Learning

©2018 Kevin Jamieson

x

y

PXY (X = x, Y = y)

x0 x1

PXY (Y = y|X = x0)

PXY (Y = y|X = x1)

EXY [(Y � ⌘(X))2]

Page 28: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

Statistical Learning

©2018 Kevin Jamieson

PXY (Y = y|X = x0)

PXY (Y = y|X = x1)x

y

PXY (X = x, Y = y)

x0 x1

Ideally, we want to find:

EXY [(Y � ⌘(X))2]

⌘(x) = EY |X [Y |X = x]

Page 29: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

Statistical Learning

©2018 Kevin Jamieson

x

y

PXY (X = x, Y = y) Ideally, we want to find:

⌘(x) = EY |X [Y |X = x]

Page 30: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

Statistical Learning

©2018 Kevin Jamieson

x

y

PXY (X = x, Y = y) Ideally, we want to find:

(xi, yi)i.i.d.⇠ PXY for i = 1, . . . , n

But we only have samples:

⌘(x) = EY |X [Y |X = x]

Page 31: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

Statistical Learning

©2018 Kevin Jamieson

x

y

PXY (X = x, Y = y)

bf = argminf2F

1

n

nX

i=1

(yi � f(xi))2

Ideally, we want to find:

(xi, yi)i.i.d.⇠ PXY for i = 1, . . . , n

But we only have samples:

and are restricted to afunction class (e.g., linear)so we compute:

⌘(x) = EY |X [Y |X = x]

Page 32: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

Statistical Learning

©2018 Kevin Jamieson

x

y

PXY (X = x, Y = y)

bf = argminf2F

1

n

nX

i=1

(yi � f(xi))2

Ideally, we want to find:

(xi, yi)i.i.d.⇠ PXY for i = 1, . . . , n

But we only have samples:

and are restricted to afunction class (e.g., linear)so we compute:

We care about future predictions: EXY [(Y � bf(X))2]

⌘(x) = EY |X [Y |X = x]

Page 33: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

Statistical Learning

©2018 Kevin Jamieson

x

y

PXY (X = x, Y = y)

Each draw D = {(xi, yi)}ni=1 results in di↵erent bf

bf = argminf2F

1

n

nX

i=1

(yi � f(xi))2

Ideally, we want to find:

(xi, yi)i.i.d.⇠ PXY for i = 1, . . . , n

But we only have samples:

and are restricted to afunction class (e.g., linear)so we compute:

⌘(x) = EY |X [Y |X = x]

Page 34: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

Statistical Learning

©2018 Kevin Jamieson

x

y

PXY (X = x, Y = y)

Each draw D = {(xi, yi)}ni=1 results in di↵erent bf

ED[ bf(x)]bf = argmin

f2F

1

n

nX

i=1

(yi � f(xi))2

Ideally, we want to find:

(xi, yi)i.i.d.⇠ PXY for i = 1, . . . , n

But we only have samples:

and are restricted to afunction class (e.g., linear)so we compute:

⌘(x) = EY |X [Y |X = x]

Page 35: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

Bias-Variance Tradeoff

©2018 Kevin Jamieson

bf = argminf2F

1

n

nX

i=1

(yi � f(xi))2⌘(x) = EY |X [Y |X = x]

EY |X [ED[(Y � bfD(x))2]��X = x] = EY |X [ED[(Y � ⌘(x) + ⌘(x)� bfD(x))2]

��X = x]

E y Eo Y 21 37 24 263 2cal Foca ax ElmYxO

Ey flHCxDYx x t2 ExDD tEoKuxH

Ey Ct Zan lx c Ey HX x 26

Page 36: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

Bias-Variance Tradeoff

©2018 Kevin Jamieson

irreducible error Caused by stochastic

label noise

learning error Caused by either using too “simple”

of a model or not enough data to learn the model accurately

bf = argminf2F

1

n

nX

i=1

(yi � f(xi))2⌘(x) = EY |X [Y |X = x]

EY |X [ED[(Y � bfD(x))2]��X = x] = EY |X [ED[(Y � ⌘(x) + ⌘(x)� bfD(x))2]

��X = x]

=EY |X

hED[(Y � ⌘(x))2 + 2(Y � ⌘(x))(⌘(x)� bfD(x))

+ (⌘(x)� bfD(x))2]��X = x

i

=EY |X [(Y � ⌘(x))2��X = x] + ED[(⌘(x)� bfD(x))2]

Page 37: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

Bias-Variance Tradeoff

©2018 Kevin Jamieson

ED[(⌘(x)� bfD(x))2] = ED[(⌘(x)� ED[ bfD(x)] + ED[ bfD(x)]� bfD(x))2]

bf = argminf2F

1

n

nX

i=1

(yi � f(xi))2⌘(x) = EY |X [Y |X = x]

Epfl Cas Eoff.CN Jt2IEIL EEDKEfEixD

IcatEolEoLf.cnFoix

Page 38: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

Bias-Variance Tradeoff

©2018 Kevin Jamieson

=(⌘(x)� ED[ bfD(x)])2 + ED[(ED[ bfD(x)]� bfD(x))2]

=ED[(⌘(x)� ED[ bfD(x)])2 + 2(⌘(x)� ED[ bfD(x)])(ED[ bfD(x)]� bfD(x))

+ (ED[ bfD(x)]� bfD(x))2]

ED[(⌘(x)� bfD(x))2] = ED[(⌘(x)� ED[ bfD(x)] + ED[ bfD(x)]� bfD(x))2]

biased squared variance

bf = argminf2F

1

n

nX

i=1

(yi � f(xi))2⌘(x) = EY |X [Y |X = x]

Page 39: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

Bias-Variance Tradeoff

biased squared variance

+(⌘(x)� ED[ bfD(x)])2 + ED[(ED[ bfD(x)]� bfD(x))2]

irreducible error

EY |X [ED[(Y � bfD(x))2]��X = x] = EY |X [(Y � ⌘(x))2

��X = x]

Page 40: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

39

Example: Linear LS

©2018 Kevin Jamieson

if yi = xTi w + ✏i and ✏i

i.i.d.⇠ N (0,�2)

bwMLE = (XTX)�1XTY

Y = Xw + ✏

= w + (XTX)�1XT ✏

bfD(x) = bwTx = wTx+ ✏TX(XTX)�1x

⌘(x) = EY |X [Y |X = x] ELxtwtE X x xTw

Ttx E LEAD Ellen litxxITIE wtx3 uTx

Page 41: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

40

Example: Linear LS

©2018 Kevin Jamieson

if yi = xTi w + ✏i and ✏i

i.i.d.⇠ N (0,�2)

bwMLE = (XTX)�1XTY

Y = Xw + ✏

= w + (XTX)�1XT ✏

bfD(x) = bwTx = wTx+ ✏TX(XTX)�1x

⌘(x) = EY |X [Y |X = x]

EXY [ED[(Y � bfD(x))2]��X = x] = EXY [(Y � ⌘(x))2

��X = x]= �2

irreducible error biased squared

+(⌘(x)� ED[ bfD(x)])2 + ED[(ED[ bfD(x)]� bfD(x))2]= 0In

Page 42: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

41

Example: Linear LS

©2018 Kevin Jamieson

if yi = xTi w + ✏i and ✏i

i.i.d.⇠ N (0,�2)

bwMLE = (XTX)�1XTY

Y = Xw + ✏

= w + (XTX)�1XT ✏

bfD(x) = bwTx = wTx+ ✏TX(XTX)�1x

variance

+(⌘(x)� ED[ bfD(x)])2 + ED[(ED[ bfD(x)]� bfD(x))2]= ED[xT (XTX)�1XT ✏✏TX(XTX)�1x]E xtfxtxYXIEXCXTXI.be

Ex Lxi Xix5 x o

Page 43: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

42

Example: Linear LS

©2018 Kevin Jamieson

if yi = xTi w + ✏i and ✏i

i.i.d.⇠ N (0,�2)

bwMLE = (XTX)�1XTY

Y = Xw + ✏

= w + (XTX)�1XT ✏

bfD(x) = bwTx = wTx+ ✏TX(XTX)�1x

variance

+(⌘(x)� ED[ bfD(x)])2 + ED[(ED[ bfD(x)]� bfD(x))2]= ED[xT (XTX)�1XT ✏✏TX(XTX)�1x]

= �2xT (XTX)�1x

XTX =nX

i=1

xixTi

n large! n⌃ ⌃ = E[XXT ], X ⇠ PX

= �2Trace((XTX)�1xxT )

+(⌘(x)� ED[ bfD(x)])2 + ED[(ED[ bfD(x)]� bfD(x))2]EX=x

⇥ ⇤=

�2

nEX [Trace(⌃�1XXT )] =

d�2

n

e

Page 44: Linear Regression - courses.cs.washington.edu · 2018-12-08 · 3 The regression problem ©2018 Kevin Jamieson # square feet ce Given past sales data on zillow.com, predict: y = House

43

Example: Linear LS

©2018 Kevin Jamieson

if yi = xTi w + ✏i and ✏i

i.i.d.⇠ N (0,�2)

bwMLE = (XTX)�1XTY

Y = Xw + ✏

= w + (XTX)�1XT ✏

bfD(x) = bwTx = wTx+ ✏TX(XTX)�1x

⌘(x) = EY |X [Y |X = x]

EXY [ED[(Y � bfD(x))2]��X = x] = EXY [(Y � ⌘(x))2

��X = x]= �2

irreducible error biased squared

+(⌘(x)� ED[ bfD(x)])2 + ED[(ED[ bfD(x)]� bfD(x))2]= 0

variance

+(⌘(x)� ED[ bfD(x)])2 + ED[(ED[ bfD(x)]� bfD(x))2]EX=x

⇥ ⇤=

�2

nEX [Trace(⌃�1XXT )] =

d�2

n