51
CS 60050 Machine Learning Linear Regression Some slides taken from course materials of Andrew Ng

ML-02 Linear Regression - IITKGP

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ML-02 Linear Regression - IITKGP

CS 60050 Machine Learning

Linear Regression

Some slides taken from course materials of Andrew Ng

Page 2: ML-02 Linear Regression - IITKGP

Dataset of living area and price of houses in a city

This is a training set. How can we learn to predict the prices of houses of other sizes in the city, as a function of their living area?

Page 3: ML-02 Linear Regression - IITKGP

Dataset of living area and price of houses in a city

Example of supervised learning problem. When the target variable we are trying to predict is continuous, regression problem.

Page 4: ML-02 Linear Regression - IITKGP

Dataset of living area and price of houses in a city

m = number of training examples x's = input variables / features y's = output variables / "target" variables (x,y) - single training example (xi, yj) - specific example (ith training example) i is an index to training set

Page 5: ML-02 Linear Regression - IITKGP

How to use the training set?

Learn a function h(x), so that h(x) is a good predictor for the corresponding value of y h: hypothesis function

Page 6: ML-02 Linear Regression - IITKGP

How to represent hypothesis h?

θi are parameters - θ0 is zero condition - θ1 is gradient θ: vector of all the parameters We assume y is a linear function of x Univariate linear regression

Page 7: ML-02 Linear Regression - IITKGP

Digression: Multivariate linear regression

Page 8: ML-02 Linear Regression - IITKGP

How to represent hypothesis h?

θi are parameters - θ0 is zero condition - θ1 is gradient

We assume y is a linear function of x Univariate linear regression How to learn the values of the parameters θi?

Page 9: ML-02 Linear Regression - IITKGP

Intuition of hypothesis function

•  We are attempting to fit a straight line to the data in the training set

•  Values of the parameters decide the equation of the straight line

•  Which is the best straight line to fit the data?

Page 10: ML-02 Linear Regression - IITKGP

Intuition of hypothesis function

•  Which is the best straight line to fit the data? •  How to learn the values of the parameters θi?

•  Choose the parameters such that the prediction is close to the actual y-value for the training examples

Page 11: ML-02 Linear Regression - IITKGP

Cost function

•  Measure of how close the predictions are to the actual y-values

•  Average over all the m training instances

•  Squared error cost function J(θ) •  Choose parameters θ so that J(θ) is minimized

Page 12: ML-02 Linear Regression - IITKGP

Hypothesis:

Parameters:

Cost Function:

Goal:

Page 13: ML-02 Linear Regression - IITKGP

(for fixed , this is a function of x) (function of the parameters )

0

100

200

300

400

500

0 1000 2000 3000

Price ($) in 1000’s

Size in feet2 (x)

Page 14: ML-02 Linear Regression - IITKGP

Contour plot or Contour figure

Page 15: ML-02 Linear Regression - IITKGP

Minimizing a function

•  For now, let us consider some arbitrary function (not necessarily an error function)

•  Algorithm called gradient descent

•  Used in many applications of minimizing functions

Page 16: ML-02 Linear Regression - IITKGP

Have some function

Want

Outline:

•  Start with some

•  Keep changing to reduce

until we hopefully end up at a minimum

Page 17: ML-02 Linear Regression - IITKGP

θ1

θ0

J(θ0,θ1)

Page 18: ML-02 Linear Regression - IITKGP

θ0

θ1

J(θ0,θ1)

If the function has multiple local minima, where one starts can decide which minimum is reached

Page 19: ML-02 Linear Regression - IITKGP

Gradient descent algorithm

α is the learning rate – more on this later

Page 20: ML-02 Linear Regression - IITKGP

Gradient descent algorithm

Correct: Simultaneous update Incorrect:

Page 21: ML-02 Linear Regression - IITKGP

For simplicity, let us first consider a function of a single variable

Page 22: ML-02 Linear Regression - IITKGP

The learning rate

•  Gradient descent can converge to a local minimum, even with the learning rate α fixed

•  But, value needs to be chosen judiciously •  If α is too small, gradient descent can be slow

to converge •  If α is too large, gradient descent can

overshoot the minimum. It may fail to converge, or even diverge.

Page 23: ML-02 Linear Regression - IITKGP

Gradient descent algorithm Linear Regression Model

Gradient descent for univariate linear regression

Page 24: ML-02 Linear Regression - IITKGP

Gradient descent for univariate linear regression

update and

simultaneously

Page 25: ML-02 Linear Regression - IITKGP

“Batch”: Each step of gradient descent uses all the training examples. There are other variations like “stochastic gradient descent” (used in learning over huge datasets)

“Batch” Gradient Descent

Page 26: ML-02 Linear Regression - IITKGP

What about multiple local minima? •  The cost function in linear regression is always a

convex function – always has a single global minimum

•  So, gradient descent will always converge

Page 27: ML-02 Linear Regression - IITKGP

Convex cost function

Page 28: ML-02 Linear Regression - IITKGP

Gradient descent in action

Page 29: ML-02 Linear Regression - IITKGP

(for fixed , this is a function of x) (function of the parameters )

Page 30: ML-02 Linear Regression - IITKGP

(for fixed , this is a function of x) (function of the parameters )

Page 31: ML-02 Linear Regression - IITKGP

(for fixed , this is a function of x) (function of the parameters )

Page 32: ML-02 Linear Regression - IITKGP

(for fixed , this is a function of x) (function of the parameters )

Page 33: ML-02 Linear Regression - IITKGP

(for fixed , this is a function of x) (function of the parameters )

Page 34: ML-02 Linear Regression - IITKGP

(for fixed , this is a function of x) (function of the parameters )

Page 35: ML-02 Linear Regression - IITKGP

(for fixed , this is a function of x) (function of the parameters )

Page 36: ML-02 Linear Regression - IITKGP

(for fixed , this is a function of x) (function of the parameters )

Page 37: ML-02 Linear Regression - IITKGP

(for fixed , this is a function of x) (function of the parameters )

Page 38: ML-02 Linear Regression - IITKGP

Linear Regression for multiple variables

Page 39: ML-02 Linear Regression - IITKGP

Size (feet2)

Number of bedrooms

Number of floors

Age of home (years)

Price ($1000)

2104 5 1 45 460 1416 3 2 40 232 1534 3 2 30 315 852 2 1 36 178 … … … … …

Multiple features (variables).

Page 40: ML-02 Linear Regression - IITKGP

Size (feet2)

Number of bedrooms

Number of floors

Age of home (years)

Price ($1000)

2104 5 1 45 460 1416 3 2 40 232 1534 3 2 30 315 852 2 1 36 178 … … … … …

Multiple features (variables).

Notation: = number of features = input (features) of training example. = value of feature in training example.

Page 41: ML-02 Linear Regression - IITKGP

Hypothesis:

Previously:

For multi-variate linear regression:

For convenience of notation, define .

Page 42: ML-02 Linear Regression - IITKGP

Hypothesis:

Cost function:

Parameters:

(simultaneously update for every )

Repeat

Gradient descent:

Page 43: ML-02 Linear Regression - IITKGP

(simultaneously update )

Gradient Descent

Repeat

Previously (n=1):

New algorithm :

Repeat

simultaneously update for

Page 44: ML-02 Linear Regression - IITKGP

(simultaneously update )

Gradient Descent

Repeat

Previously (n=1):

New algorithm :

Repeat

simultaneously update for

Page 45: ML-02 Linear Regression - IITKGP

Practical aspects of applying gradient descent

Page 46: ML-02 Linear Regression - IITKGP

E.g. = size (0-2000 feet2) = number of bedrooms (1-5)

Feature Scaling Idea: Make sure features are on a similar scale.

size (feet2)

number of bedrooms

Page 47: ML-02 Linear Regression - IITKGP

E.g. = size (0-2000 feet2) = number of bedrooms (1-5)

Feature Scaling Idea: Make sure features are on a similar scale.

Replace with to make features have approximately zero mean (Do not apply to ).

Mean normalization:

Other types of normalization:

Page 48: ML-02 Linear Regression - IITKGP

Is gradient descent working properly?

•  Plot how J(θ) changes with every iteration of gradient descent

•  For sufficiently small learning rate, J(θ) should decrease with every iteration

•  If not, learning rate needs to be reduced

•  However, too small learning rate means slow convergence

Page 49: ML-02 Linear Regression - IITKGP

When to end gradient descent?

•  Example convergence test:

•  Declare convergence if J(θ) decreases by less than 0.001 in an iteration (assuming J(θ) is decreasing in every iteration)

Page 50: ML-02 Linear Regression - IITKGP

Polynomial Regression for multiple variables

Page 51: ML-02 Linear Regression - IITKGP

Choice of features

Price (y)

Size (x)