Upload
vincent-fitzgerald
View
220
Download
0
Embed Size (px)
Citation preview
LimitedLimitedDependent VariablesDependent Variables
Ciaran S. PhibbsCiaran S. Phibbs
Limited Dependent VariablesLimited Dependent Variables
0-1, small number of options, small 0-1, small number of options, small counts, etc.counts, etc.
Non-linear in this case really Non-linear in this case really means that the dependent variable means that the dependent variable is not continuous, or even close to is not continuous, or even close to continuous. continuous.
OutlineOutline
Binary ChoiceBinary Choice Multinomial ChoiceMultinomial Choice CountsCounts Most models in general framework Most models in general framework
of probability modelsof probability models– Prob (event/occurs)Prob (event/occurs)
Basic ProblemsBasic Problems
Heteroscedastic error termsHeteroscedastic error termsPredictions not constrained Predictions not constrained
to match actual outcomesto match actual outcomes
Yi = βo + βX + εi
Yi=0 if lived, Yi=1 if died
Prob (Yi=1) = F(X, ) Prob (Yi=0) = 1 – F(X,)
OLS, also called a linear probability modeli i is heteroscedastic, depends on is heteroscedastic, depends on
Predictions not constrained to (0,1)Predictions not constrained to (0,1)
Binary Outcomes Binary Outcomes Common in Health CareCommon in Health Care
MortalityMortality Other outcomeOther outcome
– InfectionInfection
– Patient safety eventPatient safety event
– Rehospitalization <30 daysRehospitalization <30 days Decision to seek medical careDecision to seek medical care
Standard Approaches Standard Approaches to Binary Choice-1to Binary Choice-1
Logistic regressionLogistic regression
( 1)1
X
X
eprob Y
e
Advantages of Logistic RegressionAdvantages of Logistic Regression
Designed for relatively rare eventsDesigned for relatively rare events Commonly used in health care; most Commonly used in health care; most
readers can interpret an odds ratioreaders can interpret an odds ratio
Standard Approaches Standard Approaches to Binary Choice-2to Binary Choice-2
Probit regression (classic Probit regression (classic example is decision to make a example is decision to make a large purchase)large purchase)
y* = y* = X +
y=1 if y* >0y=0 if y* ≤0
Binary ChoiceBinary Choice
There are other methods, using There are other methods, using other distributions.other distributions.
In general, logistic and probit give In general, logistic and probit give about the same answer.about the same answer.
It used to be a lot easier to It used to be a lot easier to calculate marginal effects with calculate marginal effects with probit, not so any moreprobit, not so any more
Odds Ratios vs. Relative RisksOdds Ratios vs. Relative Risks
Standard method of interpreting Standard method of interpreting logistic regression is odds ratios. logistic regression is odds ratios.
Convert to % effect, really relative Convert to % effect, really relative riskrisk
This approximation starts to break This approximation starts to break down at 10% outcome incidencedown at 10% outcome incidence
Can Convert OR to RRCan Convert OR to RR
Zhang J, Yu KF. What’s the Relative Risk? Zhang J, Yu KF. What’s the Relative Risk? A Method of Correcting the Odds Ratio in A Method of Correcting the Odds Ratio in Cohort Studies of Common Outcomes. Cohort Studies of Common Outcomes. JAMA 1998;280(19):1690-1691.JAMA 1998;280(19):1690-1691.
RR = RR = OR . OR .
(1-P(1-P00) + (P) + (P00 x OR) x OR)
Where PWhere P00 is the sample probability of the is the sample probability of the
outcomeoutcome
Effect of Correction for RREffect of Correction for RRFrom Phibbs et al., NEJM 5/24/2007, From Phibbs et al., NEJM 5/24/2007, 20% mortality20% mortality
Odds RatioOdds Ratio Calculated RRCalculated RR
2.722.72 2.082.08
2.392.39 1.911.91
1.781.78 1.561.56
1.511.51 1.381.38
1.081.08 1.061.06
ExtensionsExtensions
Panel data, can now estimate both Panel data, can now estimate both random effects and fixed effects random effects and fixed effects models. The Stata manual lists 34 models. The Stata manual lists 34 related estimation commandsrelated estimation commands
All kinds of variations. All kinds of variations. – Panel dataPanel data
– Grouped dataGrouped data
ExtensionsExtensions
Goodness of fit tests. Several tests. Goodness of fit tests. Several tests. Probably the most commonly reported Probably the most commonly reported
statistics are:statistics are:– Area under ROC curve, c-statistic in SAS Area under ROC curve, c-statistic in SAS
output. Range 0.50 to 1.0. output. Range 0.50 to 1.0.
– Hosmer-Lemeshow testHosmer-Lemeshow test
– NEJM paper, c=0.86, H-L p=0.34NEJM paper, c=0.86, H-L p=0.34
More on Hosmer-Lemeshow TestMore on Hosmer-Lemeshow Test The H-L test breaks the sample up into n (usually The H-L test breaks the sample up into n (usually
10, some programs (Stata) let you vary this) equal 10, some programs (Stata) let you vary this) equal groups and compares the number of observed and groups and compares the number of observed and expected events in each group.expected events in each group.
If your model predicts well, the events will be If your model predicts well, the events will be concentrated in the highest risk groups; most can concentrated in the highest risk groups; most can be in the highest risk group.be in the highest risk group.
Alternate specification, divide the sample so that Alternate specification, divide the sample so that the events are split into equal groups. the events are split into equal groups.
Multinomial ChoiceMultinomial Choice
What if more than one choice or What if more than one choice or outcome?outcome?
Options are more limitedOptions are more limited– Multivariable Probit (multiple decisions, Multivariable Probit (multiple decisions,
each with two alternatives)each with two alternatives)
– Several logit models (single decision, Several logit models (single decision, multiple alternatives)multiple alternatives)
Logit Models for Multiple ChoicesLogit Models for Multiple Choices
Conditional Logit Model (McFadden)Conditional Logit Model (McFadden)– Unordered choicesUnordered choices
Multinomial Logit ModelMultinomial Logit Model– Choices can be ordered.Choices can be ordered.
Examples of Health Care Uses for Examples of Health Care Uses for Logit Models for Multiple ChoicesLogit Models for Multiple Choices
Choice of what hospital to use, among Choice of what hospital to use, among those in market areathose in market area
Choice of treatment among several Choice of treatment among several optionsoptions
Conditional Logit ModelConditional Logit Model
1
( )ij
ij
X
JX
j
eprob Yi j
e
Conditional logit modelConditional logit model Also known as the random utility modelAlso known as the random utility model Is derived from consumer theoryIs derived from consumer theory How consumers choose from a set of optionsHow consumers choose from a set of options Model driven by the characteristics of the Model driven by the characteristics of the
choices.choices. Individual characteristics “cancel out” but Individual characteristics “cancel out” but
can be included. For example, in hospital can be included. For example, in hospital choice, can interact with distance to hospitalchoice, can interact with distance to hospital
Can express the results as odds ratios. Can express the results as odds ratios.
Estimation of McFadden’s ModelEstimation of McFadden’s Model Some software packages (e.g. SAS) Some software packages (e.g. SAS)
require that the number of choices be require that the number of choices be equal across all observations.equal across all observations.
LIMDEP, allows a “NCHOICES” LIMDEP, allows a “NCHOICES” options that lets you set the number of options that lets you set the number of choices for each observation. This is a choices for each observation. This is a very useful feature. May be able to do very useful feature. May be able to do this in Stata (clogit) with “group”this in Stata (clogit) with “group”
Example of Conditional Logit Example of Conditional Logit EstimatesEstimates
Study I did looking at elderly service-Study I did looking at elderly service-connected veterans choice of VA or non-connected veterans choice of VA or non-VA hospitalVA hospital
Log distanceLog distance 0.660.66p<0.001p<0.001
Population densityPopulation density 0.9996 0.9996 p<0.001p<0.001
VAVA 2.802.80 p<0.001p<0.001
Multinomial Logit ModelMultinomial Logit Model
1
1
( )1
1( 0)
1
j i
j i
j i
X
JX
j
JX
j
eprob Y j
e
prob Ye
Multinomial Logit ModelMultinomial Logit Model
Must identify a reference choice, model Must identify a reference choice, model yields set of parameter estimates for yields set of parameter estimates for each of the other choiceseach of the other choices
Allows direct estimation of parameters Allows direct estimation of parameters for individual characteristics. Model for individual characteristics. Model can (should) also include parameters can (should) also include parameters for choice characteristicsfor choice characteristics
Example of a Multinomial Logit ModelExample of a Multinomial Logit Model
Effect on VLBW delivery at hospital if Effect on VLBW delivery at hospital if nearby hospital opens mid-level NICU.nearby hospital opens mid-level NICU.
Hosp w/ no NICUHosp w/ no NICU -0.65-0.65 Hosp w/ high-level NICUHosp w/ high-level NICU -0.70-0.70
Independence of Irrelevant Independence of Irrelevant AlternativesAlternatives
Results should be robust to varying the Results should be robust to varying the number of alternative choices number of alternative choices – Can re-estimate model after deleting some of Can re-estimate model after deleting some of
the choices.the choices.– McFadden, regression based test. Regression-McFadden, regression based test. Regression-
Based Specification Tests for the Multinomial Based Specification Tests for the Multinomial Logit Model. J Econometrics 1987;34(1/2):63-Logit Model. J Econometrics 1987;34(1/2):63-82.82.
If fail IIA, may need to estimate a nested If fail IIA, may need to estimate a nested logit modellogit model
Independence of Irrelevant Independence of Irrelevant Alternatives - 2Alternatives - 2
McFadden test is fairly weak, likely to McFadden test is fairly weak, likely to pass. Note, this test can also be used to pass. Note, this test can also be used to test for omitted variables.test for omitted variables.
For many health applications, doesn’t For many health applications, doesn’t matter, the models are very robust (e.g. matter, the models are very robust (e.g. hospital choice models driven by hospital choice models driven by distance).distance).
Count Data (integers)Count Data (integers)
Continuation of the same problemContinuation of the same problem Problem diminishes as counts increaseProblem diminishes as counts increase Rule of Thumb. Need to use count Rule of Thumb. Need to use count
data models for counts under 30data models for counts under 30
Count DataCount Data
Some examples of where count data models Some examples of where count data models are needed in health careare needed in health care– Dependent variable is number of outpatient Dependent variable is number of outpatient
visitsvisits– Number of times a prescription of a chronic Number of times a prescription of a chronic
disease medication is refilled in a yeardisease medication is refilled in a year– Number of adverse events in a unit (or hospital) Number of adverse events in a unit (or hospital)
over a period of timeover a period of time
Count DataCount Data Poisson distribution. A distribution for Poisson distribution. A distribution for
counts.counts.– Problem: very restrictive assumption that Problem: very restrictive assumption that
mean and variance are equalmean and variance are equal
( )!
i iyi
ii
eprob Yi y
y
Count DataCount Data In general, negative binomial is a better choice. In general, negative binomial is a better choice.
Stata, test for what distribution is part of the Stata, test for what distribution is part of the package. Other distributions can also be used.package. Other distributions can also be used.
( )( , )
!
i i iu yi i
i i ii
e uf y x u
y
Other ModelsOther Models
New models are being introduced all of New models are being introduced all of the time. More and better ways to the time. More and better ways to address the problems of limited address the problems of limited dependent variables. dependent variables.
Includes semi-parametric and non-Includes semi-parametric and non-parameteric methods. parameteric methods.
Reference TextsReference Texts
Greene. Greene. Econometric Analysis, Ch. 19 Econometric Analysis, Ch. 19 and 20and 20. .
Maddala. Maddala. Limited-Dependent and Limited-Dependent and Qualitative Variables in EconometricsQualitative Variables in Econometrics
Journal ReferencesJournal References McFadden D. Specification Tests for McFadden D. Specification Tests for
the Multinomial Logit Model. J the Multinomial Logit Model. J Econometrics 1987;34(1/2):63-82.Econometrics 1987;34(1/2):63-82.
Zhang J, Yu KF. What’s the Relative Zhang J, Yu KF. What’s the Relative Risk? A Method of Correctingthe Risk? A Method of Correctingthe Odds Ratio in Cohort Studies of Odds Ratio in Cohort Studies of Common Outcomes. JAMA Common Outcomes. JAMA 1998;280(19):1690-1691.1998;280(19):1690-1691.