24
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises, examples being shown above. Why do some people go to college while others do not? Why do some women enter the labor force while others do not? Why do some people buy houses while others rent? Why do some people migrate while others stay put?

1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

Embed Size (px)

DESCRIPTION

Why do some people go to college while others do not? Why do some women enter the labor force while others do not? Why do some people buy houses while others rent? Why do some people migrate while others stay put? 3 Models with more than two possible outcomes have also been developed, but we will confine our attention to binary choice models. BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

Citation preview

Page 1: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

1

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

Economists are often interested in the factors behind the decision-making of individuals or enterprises, examples being shown above.

• Why do some people go to college while others do not?

• Why do some women enter the labor force while others do not?

• Why do some people buy houses while others rent?

• Why do some people migrate while others stay put?

Page 2: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

2

The models that have been developed for this purpose are known as qualitative response or binary choice models, with the outcome, which we will denote Y, being assigned a value of 1 if the event occurs and 0 otherwise.

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

• Why do some people go to college while others do not?

• Why do some women enter the labor force while others do not?

• Why do some people buy houses while others rent?

• Why do some people migrate while others stay put?

Page 3: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

• Why do some people go to college while others do not?

• Why do some women enter the labor force while others do not?

• Why do some people buy houses while others rent?

• Why do some people migrate while others stay put?

3

Models with more than two possible outcomes have also been developed, but we will confine our attention to binary choice models.

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

Page 4: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

4

The simplest binary choice model is the linear probability model where, as the name implies, the probability of the event occurring, p, is assumed to be a linear function of a set of explanatory variables.

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

iii XYpp 21)1(

Page 5: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

5

XXi

1

0

1 +2Xi

y, p

Graphically, the relationship is as shown, if there is just one explanatory variable.

iii XYpp 21)1(

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

1

Page 6: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

6

Of course p is unobservable. One has data on only the outcome, Y. In the linear probability model this is used like a dummy variable for the dependent variable.

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

iii XYpp 21)1(

Page 7: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

• Why do some people graduate from high school while others drop out?

7

As an illustration, we will take the question shown above. We will define a variable GRAD which is equal to 1 if the individual graduated from high school, and 0 otherwise.

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

Page 8: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

. g GRAD = 0

. replace GRAD = 1 if S > 11(509 real changes made)

. reg GRAD ASVABC

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 49.59 Model | 2.46607893 1 2.46607893 Prob > F = 0.0000 Residual | 26.7542914 538 .049729166 R-squared = 0.0844-------------+------------------------------ Adj R-squared = 0.0827 Total | 29.2203704 539 .05421219 Root MSE = .223

------------------------------------------------------------------------------ GRAD | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .0070697 .0010039 7.04 0.000 .0050976 .0090419 _cons | .5794711 .0524502 11.05 0.000 .4764387 .6825035------------------------------------------------------------------------------

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

8

The Stata output above shows the construction of the variable GRAD. It is first set to 0 for all respondents, and then changed to 1 for those who had more than 11 years of schooling.

Page 9: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

. g GRAD = 0

. replace GRAD = 1 if S > 11(509 real changes made)

. reg GRAD ASVABC

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 49.59 Model | 2.46607893 1 2.46607893 Prob > F = 0.0000 Residual | 26.7542914 538 .049729166 R-squared = 0.0844-------------+------------------------------ Adj R-squared = 0.0827 Total | 29.2203704 539 .05421219 Root MSE = .223

------------------------------------------------------------------------------ GRAD | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .0070697 .0010039 7.04 0.000 .0050976 .0090419 _cons | .5794711 .0524502 11.05 0.000 .4764387 .6825035------------------------------------------------------------------------------

9

Here is the result of regressing GRAD on ASVABC. It suggests that every additional point on the ASVABC score increases the probability of graduating by 0.007, that is, 0.7%.

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

Page 10: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

. g GRAD = 0

. replace GRAD = 1 if S > 11(509 real changes made)

. reg GRAD ASVABC

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 49.59 Model | 2.46607893 1 2.46607893 Prob > F = 0.0000 Residual | 26.7542914 538 .049729166 R-squared = 0.0844-------------+------------------------------ Adj R-squared = 0.0827 Total | 29.2203704 539 .05421219 Root MSE = .223

------------------------------------------------------------------------------ GRAD | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .0070697 .0010039 7.04 0.000 .0050976 .0090419 _cons | .5794711 .0524502 11.05 0.000 .4764387 .6825035------------------------------------------------------------------------------

10

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

The intercept has no sensible meaning. Literally it suggests that a respondent with a 0 ASVABC score has a 58% probability of graduating. However a score of 0 is not possible.

Page 11: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

11

Unfortunately, the linear probability model has some serious shortcomings. First, there are problems with the disturbance term.

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

iii XYpp 21)1(

Page 12: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

12

As usual, the value of the dependent variable Yi in observation i has a nonstochastic component and a random component. The nonstochastic component depends on Xi and the parameters. The random component is the disturbance term.

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

iii XYpp 21)1(

iii uYEY )(

Page 13: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

13

The nonstochastic component in observation i is its expected value in that observation. This is simple to compute, because it can take only two values. It is 1 with probability pi and 0 with probability (1 – pi) The expected value in observation i is therefore 1 + 2Xi.

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

iii XYpp 21)1(

iii uYEY )(

iiiii XpppYE 21)1(01)(

Page 14: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

14

This means that we can rewrite the model as shown.

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

iii XYpp 21)1(

iii uYEY )(

iiiii XpppYE 21)1(01)(

iii uXY 21

Page 15: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

XXi

1

0

1 +2Xi

Y, p iii XYpp 21)1(

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

15

The probability function is thus also the nonstochastic component of the relationship between Y and X.

1

Page 16: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

16

iiiii XpppYE 21)1(01)(

iii uXY 21

iii XuY 2111

iii XuY 210

In observation i, for Yi to be 1, ui must be (1 – 1 – 2Xi). For Yi to be 0, ui must be (– 1 – 2Xi).

iii uYEY )(

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

iii XYpp 21)1(

Page 17: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

XXi

1

0

1 +2Xi

Y, p iii XYpp 21)1(

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

1

17

The two possible values, which give rise to the observations A and B, are illustrated in the diagram. Since u does not have a normal distribution, the standard errors and test statistics are invalid. Its distribution is not even continuous.

A

B

1 + 2Xi

1 – 1 – 2Xi

Page 18: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

XXi

1

0

1 +2Xi

Y, p

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

1

A

1 – 1 – 2Xi

B

1 + 2Xi

18

Further, it can be shown that the population variance of the disturbance term in observation i is given by (1 + 2Xi)(1 – 1 – 2Xi). This changes with Xi, and so the distribution is heteroscedastic.

)1)(( 21212

iiu XXi

Page 19: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

XXi

1

0

1 +2Xi

Y, p

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

1

A

B

1 + 2Xi

19

Yet another shortcoming of the linear probability model is that it may predict probabilities of more than 1, as shown here. It may also predict probabilities less than 0.

1 – 1 – 2Xi

Page 20: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

20

The Stata command for saving the fitted values from a regression is predict, followed by the name that you wish to give to the fitted values. We are calling them PROB.

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

. g GRAD = 0

. replace GRAD = 1 if S > 11(509 real changes made)

. reg GRAD ASVABC

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 1, 538) = 49.59 Model | 2.46607893 1 2.46607893 Prob > F = 0.0000 Residual | 26.7542914 538 .049729166 R-squared = 0.0844-------------+------------------------------ Adj R-squared = 0.0827 Total | 29.2203704 539 .05421219 Root MSE = .223

------------------------------------------------------------------------------ GRAD | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .0070697 .0010039 7.04 0.000 .0050976 .0090419 _cons | .5794711 .0524502 11.05 0.000 .4764387 .6825035------------------------------------------------------------------------------

. predict PROB

Page 21: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

. tab PROB if PROB > 1 Fitted | values | Freq. Percent Cum.------------+----------------------------------- 1.000381 | 6 4.76 4.76 1.002308 | 9 7.14 11.90 1.004236 | 7 5.56 17.46 1.006163 | 3 2.38 19.84 *********************************************

1.040855 | 11 8.73 93.65 1.042783 | 3 2.38 96.03 1.04471 | 2 1.59 97.62 1.046638 | 3 2.38 100.00------------+----------------------------------- Total | 126 100.00

21

tab is the Stata command for tabulating the values of a variable, and for cross-tabulating two or more variables. We see that there are 126 observations where the fitted value is greater than 1. (The middle rows of the table have been omitted.)

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

Page 22: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

22

In this example there were no fitted values of less than 0.

. tab PROB if PROB > 1 Fitted | values | Freq. Percent Cum.------------+----------------------------------- 1.000381 | 6 4.76 4.76 1.002308 | 9 7.14 11.90 1.004236 | 7 5.56 17.46 1.006163 | 3 2.38 19.84 *********************************************

1.040855 | 11 8.73 93.65 1.042783 | 3 2.38 96.03 1.04471 | 2 1.59 97.62 1.046638 | 3 2.38 100.00------------+----------------------------------- Total | 126 100.00

. tab PROB if PROB < 0no observations

Page 23: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

23

The main advantage of the linear probability model over logit and probit analysis, the alternatives considered in the next two sequences, is that it is much easier to fit. For this reason it used to be recommended for initial, exploratory work.

. tab PROB if PROB > 1 Fitted | values | Freq. Percent Cum.------------+----------------------------------- 1.000381 | 6 4.76 4.76 1.002308 | 9 7.14 11.90 1.004236 | 7 5.56 17.46 1.006163 | 3 2.38 19.84 *********************************************

1.040855 | 11 8.73 93.65 1.042783 | 3 2.38 96.03 1.04471 | 2 1.59 97.62 1.046638 | 3 2.38 100.00------------+----------------------------------- Total | 126 100.00

. tab PROB if PROB < 0no observations

Page 24: 1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL

24

However, this consideration is no longer relevant, now that computers are so fast and powerful, and logit and probit are typically standard features of regression applications.

. tab PROB if PROB > 1 Fitted | values | Freq. Percent Cum.------------+----------------------------------- 1.000381 | 6 4.76 4.76 1.002308 | 9 7.14 11.90 1.004236 | 7 5.56 17.46 1.006163 | 3 2.38 19.84 *********************************************

1.040855 | 11 8.73 93.65 1.042783 | 3 2.38 96.03 1.04471 | 2 1.59 97.62 1.046638 | 3 2.38 100.00------------+----------------------------------- Total | 126 100.00

. tab PROB if PROB < 0no observations