36
Week 1, video 2: Regressors

Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Embed Size (px)

Citation preview

Page 1: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Week 1, video 2:

Regressors

Page 2: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Prediction

Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other aspects of the data (predictor variables)

Sometimes used to predict the future Sometimes used to make inferences

about the present

Page 3: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Prediction: Examples

A student is watching a video in a MOOC right now. Is he bored or frustrated?

A student has used educational software for the last half hour. How likely is it that she knows the

skill in the next problem? A student has completed three years of

high school. What will be her score on the college

entrance exam?

Page 4: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

What can we use this for?

Improved educational design If we know when students get bored, we

can improve that content Automated decisions by software

If we know that a student is frustrated, let’s offer the student some online help

Informing teachers, instructors, and other stakeholders If we know that a student is frustrated, let’s

tell their teacher

Page 5: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Regression in Prediction

There is something you want to predict (“the label”)

The thing you want to predict is numerical Number of hints student requests How long student takes to answer How much of the video the student will

watch What will the student’s test score be

Page 6: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Regression in Prediction

A model that predicts a number is called a regressor in data mining

The overall task is called regression

Page 7: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Regression

To build a regression model, you obtain a data set where you already know the answer – called the training label

For example, if you want to predict the number of hints the student requests, each value of numhints is a training label

Skill pknow time totalactionsnumhints

ENTERINGGIVEN 0.704 9 10

ENTERINGGIVEN 0.502 10 20

USEDIFFNUM 0.049 6 13

ENTERINGGIVEN 0.967 7 30

REMOVECOEFF 0.792 16 11

REMOVECOEFF 0.792 13 20

USEDIFFNUM 0.073 5 20

….

Page 8: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Regression

Associated with each label are a set of “features”, other variables, which you will try to use to predict the label

Skill pknow time totalactionsnumhints

ENTERINGGIVEN 0.704 9 10

ENTERINGGIVEN 0.502 10 20

USEDIFFNUM 0.049 6 13

ENTERINGGIVEN 0.967 7 30

REMOVECOEFF 0.792 16 11

REMOVECOEFF 0.792 13 20

USEDIFFNUM 0.073 5 20

….

Page 9: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Regression

The basic idea of regression is to determine which features, in which combination, can predict the label’s value

Skill pknow time totalactionsnumhints

ENTERINGGIVEN 0.704 9 10

ENTERINGGIVEN 0.502 10 20

USEDIFFNUM 0.049 6 13

ENTERINGGIVEN 0.967 7 30

REMOVECOEFF 0.792 16 11

REMOVECOEFF 0.792 13 20

USEDIFFNUM 0.073 5 20

….

Page 10: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Linear Regression

The most classic form of regression is linear regression

Numhints = 0.12*Pknow + 0.932*Time – 0.11*Totalactions

Skill pknow timetotalactions numhintsCOMPUTESLOPE 0.544 9 1

?

Page 11: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Quiz

Numhints = 0.12*Pknow + 0.932*Time – 0.11*Totalactions

What is the value of numhints?

A) 8.34B) 13.58C) 3.67D) 9.21E) FNORD

Skill pknow timetotalactions numhintsCOMPUTESLOPE 0.322 15 4

?

Page 12: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Quiz

Numhints = 0.12*Pknow + 0.932*Time –

0.11*Totalactions

Which of the variables has the largest impact on numhints?(Assume they are scaled the same)

A) Pknow

B) Time

C) Totalactions

D) Numhints

E) They are equal

Page 13: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

However…

These variables are unlikely to be scaled the same!

If Pknow is a probability From 0 to 1 We’ll discuss this variable later in the class

And time is a number of seconds to respond From 0 to infinity

Then you can’t interpret the weights in a straightforward fashion You need to transform them first

Page 14: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Transform

When you make a new variable by applying some mathematical function to the previous variable

Xt = X2

Page 15: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Transform: Unitization

Increases interpretability of relative strength of features

Reduces interpretability of individual features

Xt = X – M(X) SD(X)

Page 16: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Linear Regression

Linear regression only fits linear functions…

Except when you apply transforms to the input variables

Which most statistics and data mining packages can do for you

Page 17: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Ln(X)

-15 -10 -5 0 5 10 15

-5

-4

-3

-2

-1

0

1

2

3

Page 18: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Sqrt(X)

-15 -10 -5 0 5 10 150

0.5

1

1.5

2

2.5

3

3.5

Page 19: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

X2

-15 -10 -5 0 5 10 150

20

40

60

80

100

120

Xt

Page 20: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

X3

-15 -10 -5 0 5 10 15

-1500

-1000

-500

0

500

1000

1500

Xt

Page 21: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

1/X

-15 -10 -5 0 5 10 15

-80

-60

-40

-20

0

20

40

60

80

Page 22: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Sin(X)

-15 -10 -5 0 5 10 15

-1.5

-1

-0.5

0

0.5

1

1.5

Page 23: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Linear Regression

Surprisingly flexible… But even without that It is blazing fast It is often more accurate than more complex

models, particularly once you cross-validate Caruana & Niculescu-Mizil (2006)

It is feasible to understand your model(with the caveat that the second feature in your model is in the context of the first feature, and so on)

Page 24: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Example of Caveat

Let’s graph the relationship between number of graduate students and number of papers per year

Page 25: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Data

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of graduate students

Papers

per

year

Page 26: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Data

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of graduate students

Papers

per

year

Too much time spent filling out personnel action forms?

Page 27: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Model

Number of papers =4 +2 * # of grad students- 0.1 * (# of grad students)2

But does that actually mean that (# of grad students)2 is associated with less publication?

No!

Page 28: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Example of Caveat

(# of grad students)2 is actually positively correlated with publications! r=0.46

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of graduate students

Papers

per

year

Page 29: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Example of Caveat

The relationship is only in the negative direction when the number of graduate students is already in the model…

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of graduate students

Papers

per

year

Page 30: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Example of Caveat

So be careful when interpreting linear regression models (or almost any other type of model)

Page 31: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Regression Trees

Page 32: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Regression Trees (non-linear; RepTree)

If X>3 Y = 2 else If X<-7

Y = 4 Else Y = 3

Page 33: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Linear Regression Trees (linear; M5’)

If X>3 Y = 2A + 3B else If X< -7

Y = 2A – 3B Else Y = 2A + 0.5B + C

Page 34: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Linear Regression Tree

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of graduate students

Papers

per

year

Page 35: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Later Lectures

Other regressors Goodness metrics for comparing

regressors Validating regressors

Page 36: Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other

Next Lecture

Classifiers – another type of prediction model