36
Limited Limited Dependent Dependent Variables Variables Ciaran S. Phibbs Ciaran S. Phibbs

Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Embed Size (px)

Citation preview

Page 1: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

LimitedLimitedDependent VariablesDependent Variables

Ciaran S. PhibbsCiaran S. Phibbs

Page 2: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Limited Dependent VariablesLimited Dependent Variables

0-1, small number of options, small 0-1, small number of options, small counts, etc.counts, etc.

Non-linear in this case really Non-linear in this case really means that the dependent variable means that the dependent variable is not continuous, or even close to is not continuous, or even close to continuous. continuous.

Page 3: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

OutlineOutline

Binary ChoiceBinary Choice Multinomial ChoiceMultinomial Choice CountsCounts Most models in general framework Most models in general framework

of probability modelsof probability models– Prob (event/occurs)Prob (event/occurs)

Page 4: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Basic ProblemsBasic Problems

Heteroscedastic error termsHeteroscedastic error termsPredictions not constrained Predictions not constrained

to match actual outcomesto match actual outcomes

Page 5: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Yi = βo + βX + εi

Yi=0 if lived, Yi=1 if died

Prob (Yi=1) = F(X, ) Prob (Yi=0) = 1 – F(X,)

OLS, also called a linear probability modeli i is heteroscedastic, depends on is heteroscedastic, depends on

Predictions not constrained to (0,1)Predictions not constrained to (0,1)

Page 6: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Binary Outcomes Binary Outcomes Common in Health CareCommon in Health Care

MortalityMortality Other outcomeOther outcome

– InfectionInfection

– Patient safety eventPatient safety event

– Rehospitalization <30 daysRehospitalization <30 days Decision to seek medical careDecision to seek medical care

Page 7: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Standard Approaches Standard Approaches to Binary Choice-1to Binary Choice-1

Logistic regressionLogistic regression

( 1)1

X

X

eprob Y

e

Page 8: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Advantages of Logistic RegressionAdvantages of Logistic Regression

Designed for relatively rare eventsDesigned for relatively rare events Commonly used in health care; most Commonly used in health care; most

readers can interpret an odds ratioreaders can interpret an odds ratio

Page 9: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Standard Approaches Standard Approaches to Binary Choice-2to Binary Choice-2

Probit regression (classic Probit regression (classic example is decision to make a example is decision to make a large purchase)large purchase)

y* = y* = X +

y=1 if y* >0y=0 if y* ≤0

Page 10: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Binary ChoiceBinary Choice

There are other methods, using There are other methods, using other distributions.other distributions.

In general, logistic and probit give In general, logistic and probit give about the same answer.about the same answer.

It used to be a lot easier to It used to be a lot easier to calculate marginal effects with calculate marginal effects with probit, not so any moreprobit, not so any more

Page 11: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Odds Ratios vs. Relative RisksOdds Ratios vs. Relative Risks

Standard method of interpreting Standard method of interpreting logistic regression is odds ratios. logistic regression is odds ratios.

Convert to % effect, really relative Convert to % effect, really relative riskrisk

This approximation starts to break This approximation starts to break down at 10% outcome incidencedown at 10% outcome incidence

Page 12: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,
Page 13: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Can Convert OR to RRCan Convert OR to RR

Zhang J, Yu KF. What’s the Relative Risk? Zhang J, Yu KF. What’s the Relative Risk? A Method of Correcting the Odds Ratio in A Method of Correcting the Odds Ratio in Cohort Studies of Common Outcomes. Cohort Studies of Common Outcomes. JAMA 1998;280(19):1690-1691.JAMA 1998;280(19):1690-1691.

RR = RR = OR . OR .

(1-P(1-P00) + (P) + (P00 x OR) x OR)

Where PWhere P00 is the sample probability of the is the sample probability of the

outcomeoutcome

Page 14: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Effect of Correction for RREffect of Correction for RRFrom Phibbs et al., NEJM 5/24/2007, From Phibbs et al., NEJM 5/24/2007, 20% mortality20% mortality

Odds RatioOdds Ratio Calculated RRCalculated RR

2.722.72 2.082.08

2.392.39 1.911.91

1.781.78 1.561.56

1.511.51 1.381.38

1.081.08 1.061.06

Page 15: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

ExtensionsExtensions

Panel data, can now estimate both Panel data, can now estimate both random effects and fixed effects random effects and fixed effects models. The Stata manual lists 34 models. The Stata manual lists 34 related estimation commandsrelated estimation commands

All kinds of variations. All kinds of variations. – Panel dataPanel data

– Grouped dataGrouped data

Page 16: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

ExtensionsExtensions

Goodness of fit tests. Several tests. Goodness of fit tests. Several tests. Probably the most commonly reported Probably the most commonly reported

statistics are:statistics are:– Area under ROC curve, c-statistic in SAS Area under ROC curve, c-statistic in SAS

output. Range 0.50 to 1.0. output. Range 0.50 to 1.0.

– Hosmer-Lemeshow testHosmer-Lemeshow test

– NEJM paper, c=0.86, H-L p=0.34NEJM paper, c=0.86, H-L p=0.34

Page 17: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

More on Hosmer-Lemeshow TestMore on Hosmer-Lemeshow Test The H-L test breaks the sample up into n (usually The H-L test breaks the sample up into n (usually

10, some programs (Stata) let you vary this) equal 10, some programs (Stata) let you vary this) equal groups and compares the number of observed and groups and compares the number of observed and expected events in each group.expected events in each group.

If your model predicts well, the events will be If your model predicts well, the events will be concentrated in the highest risk groups; most can concentrated in the highest risk groups; most can be in the highest risk group.be in the highest risk group.

Alternate specification, divide the sample so that Alternate specification, divide the sample so that the events are split into equal groups. the events are split into equal groups.

Page 18: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Multinomial ChoiceMultinomial Choice

What if more than one choice or What if more than one choice or outcome?outcome?

Options are more limitedOptions are more limited– Multivariable Probit (multiple decisions, Multivariable Probit (multiple decisions,

each with two alternatives)each with two alternatives)

– Several logit models (single decision, Several logit models (single decision, multiple alternatives)multiple alternatives)

Page 19: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Logit Models for Multiple ChoicesLogit Models for Multiple Choices

Conditional Logit Model (McFadden)Conditional Logit Model (McFadden)– Unordered choicesUnordered choices

Multinomial Logit ModelMultinomial Logit Model– Choices can be ordered.Choices can be ordered.

Page 20: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Examples of Health Care Uses for Examples of Health Care Uses for Logit Models for Multiple ChoicesLogit Models for Multiple Choices

Choice of what hospital to use, among Choice of what hospital to use, among those in market areathose in market area

Choice of treatment among several Choice of treatment among several optionsoptions

Page 21: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Conditional Logit ModelConditional Logit Model

1

( )ij

ij

X

JX

j

eprob Yi j

e

Page 22: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Conditional logit modelConditional logit model Also known as the random utility modelAlso known as the random utility model Is derived from consumer theoryIs derived from consumer theory How consumers choose from a set of optionsHow consumers choose from a set of options Model driven by the characteristics of the Model driven by the characteristics of the

choices.choices. Individual characteristics “cancel out” but Individual characteristics “cancel out” but

can be included. For example, in hospital can be included. For example, in hospital choice, can interact with distance to hospitalchoice, can interact with distance to hospital

Can express the results as odds ratios. Can express the results as odds ratios.

Page 23: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Estimation of McFadden’s ModelEstimation of McFadden’s Model Some software packages (e.g. SAS) Some software packages (e.g. SAS)

require that the number of choices be require that the number of choices be equal across all observations.equal across all observations.

LIMDEP, allows a “NCHOICES” LIMDEP, allows a “NCHOICES” options that lets you set the number of options that lets you set the number of choices for each observation. This is a choices for each observation. This is a very useful feature. May be able to do very useful feature. May be able to do this in Stata (clogit) with “group”this in Stata (clogit) with “group”

Page 24: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Example of Conditional Logit Example of Conditional Logit EstimatesEstimates

Study I did looking at elderly service-Study I did looking at elderly service-connected veterans choice of VA or non-connected veterans choice of VA or non-VA hospitalVA hospital

Log distanceLog distance 0.660.66p<0.001p<0.001

Population densityPopulation density 0.9996 0.9996 p<0.001p<0.001

VAVA 2.802.80 p<0.001p<0.001

Page 25: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Multinomial Logit ModelMultinomial Logit Model

1

1

( )1

1( 0)

1

j i

j i

j i

X

JX

j

JX

j

eprob Y j

e

prob Ye

Page 26: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Multinomial Logit ModelMultinomial Logit Model

Must identify a reference choice, model Must identify a reference choice, model yields set of parameter estimates for yields set of parameter estimates for each of the other choiceseach of the other choices

Allows direct estimation of parameters Allows direct estimation of parameters for individual characteristics. Model for individual characteristics. Model can (should) also include parameters can (should) also include parameters for choice characteristicsfor choice characteristics

Page 27: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Example of a Multinomial Logit ModelExample of a Multinomial Logit Model

Effect on VLBW delivery at hospital if Effect on VLBW delivery at hospital if nearby hospital opens mid-level NICU.nearby hospital opens mid-level NICU.

Hosp w/ no NICUHosp w/ no NICU -0.65-0.65 Hosp w/ high-level NICUHosp w/ high-level NICU -0.70-0.70

Page 28: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Independence of Irrelevant Independence of Irrelevant AlternativesAlternatives

Results should be robust to varying the Results should be robust to varying the number of alternative choices number of alternative choices – Can re-estimate model after deleting some of Can re-estimate model after deleting some of

the choices.the choices.– McFadden, regression based test. Regression-McFadden, regression based test. Regression-

Based Specification Tests for the Multinomial Based Specification Tests for the Multinomial Logit Model. J Econometrics 1987;34(1/2):63-Logit Model. J Econometrics 1987;34(1/2):63-82.82.

If fail IIA, may need to estimate a nested If fail IIA, may need to estimate a nested logit modellogit model

Page 29: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Independence of Irrelevant Independence of Irrelevant Alternatives - 2Alternatives - 2

McFadden test is fairly weak, likely to McFadden test is fairly weak, likely to pass. Note, this test can also be used to pass. Note, this test can also be used to test for omitted variables.test for omitted variables.

For many health applications, doesn’t For many health applications, doesn’t matter, the models are very robust (e.g. matter, the models are very robust (e.g. hospital choice models driven by hospital choice models driven by distance).distance).

Page 30: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Count Data (integers)Count Data (integers)

Continuation of the same problemContinuation of the same problem Problem diminishes as counts increaseProblem diminishes as counts increase Rule of Thumb. Need to use count Rule of Thumb. Need to use count

data models for counts under 30data models for counts under 30

Page 31: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Count DataCount Data

Some examples of where count data models Some examples of where count data models are needed in health careare needed in health care– Dependent variable is number of outpatient Dependent variable is number of outpatient

visitsvisits– Number of times a prescription of a chronic Number of times a prescription of a chronic

disease medication is refilled in a yeardisease medication is refilled in a year– Number of adverse events in a unit (or hospital) Number of adverse events in a unit (or hospital)

over a period of timeover a period of time

Page 32: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Count DataCount Data Poisson distribution. A distribution for Poisson distribution. A distribution for

counts.counts.– Problem: very restrictive assumption that Problem: very restrictive assumption that

mean and variance are equalmean and variance are equal

( )!

i iyi

ii

eprob Yi y

y

Page 33: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Count DataCount Data In general, negative binomial is a better choice. In general, negative binomial is a better choice.

Stata, test for what distribution is part of the Stata, test for what distribution is part of the package. Other distributions can also be used.package. Other distributions can also be used.

( )( , )

!

i i iu yi i

i i ii

e uf y x u

y

Page 34: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Other ModelsOther Models

New models are being introduced all of New models are being introduced all of the time. More and better ways to the time. More and better ways to address the problems of limited address the problems of limited dependent variables. dependent variables.

Includes semi-parametric and non-Includes semi-parametric and non-parameteric methods. parameteric methods.

Page 35: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Reference TextsReference Texts

Greene. Greene. Econometric Analysis, Ch. 19 Econometric Analysis, Ch. 19 and 20and 20. .

Maddala. Maddala. Limited-Dependent and Limited-Dependent and Qualitative Variables in EconometricsQualitative Variables in Econometrics

Page 36: Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Journal ReferencesJournal References McFadden D. Specification Tests for McFadden D. Specification Tests for

the Multinomial Logit Model. J the Multinomial Logit Model. J Econometrics 1987;34(1/2):63-82.Econometrics 1987;34(1/2):63-82.

Zhang J, Yu KF. What’s the Relative Zhang J, Yu KF. What’s the Relative Risk? A Method of Correctingthe Risk? A Method of Correctingthe Odds Ratio in Cohort Studies of Odds Ratio in Cohort Studies of Common Outcomes. JAMA Common Outcomes. JAMA 1998;280(19):1690-1691.1998;280(19):1690-1691.