Download pdf - Linear Regressionkti.mff.cuni.cz/~marta/su2a.pdf2 Simple Linear Regresion First, we consider only one dimension X 1. Regression: We predict numeric goal Y. Linear: We assume linear

1

Linear Regression

2

Simple Linear Regresion

● First, we consider only one dimension X1.

● Regression: We predict numeric goal Y.● Linear: We assume linear relation Y=f(X),

● with intercept and slope● We have training data

● We minimize least square criterion.

3

Residual Sum of Squares RSS● Residuum: the difference between the true and

the predicted value y, .,● i is the observation index, i=1:N.

● We minimize ,equivalently

equivalently MSE(train.data).

4

Lin. Reg. Coeffitient Estimates● Simple linear regression

● Multivariate linear regression

● where X denotes Nx(p+1) train matrix <1,x>, ● y the N vector of training goal variable.

5

Assessing the Accuracy of Coefficients Estimates

● Different training data lead to different estimates.(red-true, blue-estimated models)

● The dispersion is characterized by variance.true variance sample variance

6

Standard Error, Variance● For data , ● (sample) variance (rozptyl) is:● (sample) standard error (směrodatná odchylka)

SE:● it is our estimate of true value .● variance of the mean estimate

is

● ubiased estimate:

7

Standard Error of Parameters● Standard error of parameters are:

where ● We estimate by residual standard error

● Notice that is smaller for xi more

spread out (more leverage).

8

Hypothesis Testing, Confidence Intervals

● There is approx. 95% chance that the interval

will contain the true value of .● Similarly, in .● Hypothesis test:

● Assume null hypothesis H0 versus alternative H

a.

● What is the probability of measured or higher?– p-value of the t-test

● (n-2) degree of freedom ● If suffitiently low (<5%), we reject null hypothesis.

9

Importance of Features

● If the Pr(>|t|) is low, the parameter is significant.● Usually, significancy level 0.05 is taken,● to be 'really' sure (medicin) 0.001 ,● a parameter with higher value than 0.05 can be

non-zero due a chance.

10

Assessing the Accuracy of the Model● Residual standard error:

●

● average amount that the response will deviate from the true regression line.

● RSE depends on the scale of Y.● mean(wage)=111.7036, RSE=41.64581● pred.y$fit[7]-pred.y$fit[1]=8.099244

11

R2 Statistics● The proportion of variance explained

● scale independent, always in [0,1].

● where TSS (total SS) relates to trivial model – the mean.

● 'Our' wage R2 = 0.0043 is very low.

12

Multiple Linear Regression● Model:

● p – number of variables (features)● Minimizing RSS we get coeffitients .

● one dimensional:● Is advertisement in newspaper important?

13

Linear Regression – Matrix Form● We look for function f in the form:

● that minimizes RSS:

14

Linear Regression - Derivation

● We take a derivative of RSS

● set it to =0

● and get the solution

● and the prediction .

15

CollinearityExtreme Colin.: non invertible XTX

16

Corellation of Variables

● Remarque 2: ● Too high number of predictors p – some are

correlated and with good F- stat. due a chance.● feature selection: Chapter 6, it is on shedule.

17

Pattern on Residuals - Nonlinearity

18

Kvalitative (discrete) Predictors● Encoding by 0/1, more valued we code each

value (except one) separetly.● Example: ethnicity

19

The Estimated Slope is Fixed

20

Non-linear Models● too many combination to check,

● if you know what, ADD IT – log, exp, product, ...

● simplified ideas of nonlinear models:● splines – piecewice polynomial functions● SVM – a trick to check higher degree polynoms● basis function, trees – piecewise 'kernel, constant'● stacking – LR on trained models● and others.

21

Non-linear Model

22

Corelated observations (rezidum)● usuall with time series● usually it leads to underestimate of the error.

23

Non-constant Variance of Error Terms● log transformation, weighted least squares

24

Outliers (odlehlá pozorování)

● Error in the dataset or missing predictor?

25

High leverage – vzdálená X

● leverage statistics: diagonal of H=X(XTX)-1XT.● One dimensional:

26

k – NN regression

27

Comparison of Lin. Reg. and k-NN

● almost linear relation – linear model is better,● highly nonlinear relation – better is k- NN.

28

29