Upload
giles-malone
View
216
Download
0
Embed Size (px)
Citation preview
Ordinal Logistic Regression
“Good, better, best; never let it rest till your good is better and your better is
best” (Anonymous)
Ordinal Logistic Regression Also known as the “ordinal logit,” “ordered
polytomous logit,” “constrained cumulative logit,” “proportional odds,” “parallel regression,” or “grouped continuous model”
Generalization of binary logistic regression to an ordinal DVWhen applied to a dichotomous DV identical to
binary logistic regression
Ordinal Variables
Three or more ordered categories Sometimes called “ordered categorical”
or “ordered polytomous” variables
Ordinal DVs
Job satisfaction:very dissatisfied, somewhat dissatisfied,
neutral, somewhat satisfied, or very satisfied Severity of child abuse injury:
none, mild, moderate, or severe Willingness to foster children with
emotional or behavioral problems: least acceptable, willing to discuss, or most
acceptable
Single (Dichotomous) IV Example DV = satisfaction with foster care
agencies (1) dissatisfied; (2) neither satisfied nor
dissatisfied; (3) satisfied IV = agencies provided sufficient
information about the role of foster care workers0 (no) or 1 (yes)
N = 300 foster mothers
Single (Dichotomous) IV Example (cont’d)Are foster mothers who report that they
were provided sufficient information about the role of foster care workers more satisfied with their foster care agencies?
Crosstabulation
Table 4.1
Relationship between information and satisfaction is statistically significant [2(2, N = 300) = 23.52, p < .001]
Cumulative Probability
Ordinal logistic regression focuses on cumulative probabilities of the DV and odds and ORs based on cumulative probabilities.By cumulative probability we mean the
probability that the DV is less than or equal to a particular value (e.g., 1, 2, or 3 in our example).
Cumulative Probabilities
Dissatisfied Insufficient Info: .2857Sufficient Info: .1151
Dissatisfied or neutral Insufficient Info: .5590 (.2857 + .2733)Sufficient Info: .2878 (.1151 + .1727)
Dissatisfied, neutral, or satisfied Insufficient Info: 1.00 (.2867 + .2733
+ .4410)Sufficient Info: 1.00 (.1151 + .1727 + .7121)
Cumulative Odds
Probability that the DV is less than or equal to a particular value is compared to (divided by) the probability that it is greater than that valueReverse of what you do in binary and
multinomial logistic regressionProbability that the DV is 1 (dissatisfied) vs. the
probability that it is either 2 or 3 (neutral or satisfied); probability that the DV is 1 or 2 (dissatisfied or neutral) vs. the probability that it is 3 (satisfied)
Cumulative Odds & Odds Ratios Odds of being dissatisfied (vs. neutral or
satisfied) Insufficient Info: .4000 (.2857 / [1 - .2857])Sufficient Info: .1301 (.1151 / [1 - .1151])OR = .33 (.1301 / .4000) (-67%)
Odds of being dissatisfied or neutral (vs. satisfied) Insufficient Info: 1.2676 (.5590 / [1 - .5590])Sufficient Info: .4041 (.2878 / [1 - .2878])OR = .32 (.4041 / 1.2676) (-68%)
Question & Answer
Are foster mothers who report that they were provided sufficient information about the role of foster care workers more satisfied with their foster care agencies?
The odds of being dissatisfied (vs. being neutral or satisfied) are .33 times (67%) smaller for mothers who received sufficient information. The odds of being dissatisfied or neutral (vs. being satisfied) are .32 times (68%) smaller for mothers who received sufficient information.
Ordinal Logistic Regression
Set of binary logistic regression models estimated simultaneously (like multinomial logistic regression)Number of non-redundant binary logistic
regression equations equals the number of categories of the DV minus one
Focus on cumulative probabilities and odds, and ORs are computed from cumulative odds (unlike multinomial logistic regression)
Threshold
Suppose our three-point variable is a rough measure of an underlying continuous satisfaction variable. At a certain point on this continuous variable the population threshold (symbolized by τ, the Greek letter tau), that is a person’s level of satisfaction, goes from one value to another on the ordinal measure of satisfaction.
e.g., the first threshold (τ1) would be the point at which the level of satisfaction goes from dissatisfied to neutral (i.e., 1 to 2), and the second threshold (τ2) would be the point at which the level of satisfaction goes from neutral to satisfied (i.e., 2 to 3).
Threshold (cont’d)
The number of thresholds is always one fewer than the number of values of the DV.
Usually thresholds are of little interest except in the calculation of estimated values.
Thresholds typically are used in place of the intercept to express the ordinal logistic regression model
Estimated Cumulative Logits
L (Dissatisfied vs. Neutral/Satisfied) = t1 - BXL (Dissatisfied/Neutral vs. Satisfied) = t2 – BX
Table 4.2L (Dissatisfied vs. Neutral/Satisfied) = -.912 – 1.139XL (Dissatisfied/Neutral vs. Satisfied) = .235 – 1.139X
Estimated Cumulative Logits (cont’d) Each equation has a different threshold
(e.g., t1 and t2) One common slope (B).
It is assumed that the effect of the IVs is the same for different values of the DV (“parallel regression” assumption)
Slope is multiplied by a value of the IV and subtracted from, not added to, the threshold.
Statistical Significance
Table 4.2(Info) = 0
• Reject
Estimated Cumulative Logits (X = 1)
L (Dissatisfied vs. Neutral/Satisfied) = -2.051 = -.912 – (1.139)(1)
L (Dissatisfied/Neutral vs. Satisfied) = -.904 = .235 – (1.139)(1)
Effect of Information on Satisfaction (Cumulative Logits)
-3.00
-2.00
-1.00
0.00
1.00
Information
Log
its
Dissatisfied -0.91 -2.05
Dissatisfied/Neutral
0.23 -0.90
(0) Insufficient (1) Sufficient
Cumulative Logits to Cumulative Odds (X = 1)
L (Dissatisfied vs. Neutral/Satisfied) = e-2.051 = .129
L (Dissatisfied/Neutral vs. Satisfied) = e-.904 = .405
Effect of Information on Satisfaction (Cumulative Odds)
0.00
0.50
1.00
1.50
Information
Odd
s
Dissatisfied 0.40 0.13
Dissatisfied/Neutral
1.26 0.40
(0) Insufficient (1) Sufficient
Cumulative Logits to Cumulative Probabilities (X = 1) (cont’d)
.e
ep̂
.
.
tisfied)Neutral/Sa vs.ied(Dissatisf
.e
ep̂
.
.
Satisfied) vs.lied/Neutra(Dissatisf
Effect of Information on Satisfaction (Cumulative Probabilities)
.00
.10
.20
.30
.40
.50
.60
Information
Cum
ulat
ive
Pro
babi
litie
s
Dissatisfied 0.29 0.11
Dissatisfied/Neutral
0.56 0.29
(0) Insufficient (1) Sufficient
Odds Ratio
Reverse the sign of the slope and exponentiate it.
e.g., OR equals .31, calculated as e-1.139
In contrast to binary logistic regression, in which odds are calculated as a ratio of probabilities for higher to lower values of the DV (odds of 1 vs. 0), in ordinal logistic regression it is the reverse
Odds Ratio (cont’d)
SPSS reports the exponentiated slope (e1.139= 3.123)--the sign of the slope is not reversed before it is exponentiated (e-1.139 = .320)
Question & Answer
Are foster mothers who report that they were provided sufficient information about the role of foster care workers more satisfied with their foster care agencies?
The odds of being dissatisfied (vs. neutral or satisfied) are .32 times smaller (68%) for mothers who received sufficient information. Similarly, the odds of dissatisfied or neutral (vs. satisfied) are .32 times smaller (68%) for mothers who received sufficient information.
Single (Quantitative) IV Example DV = satisfaction with foster care
agencies (1) dissatisfied; (2) neither satisfied nor
dissatisfied; (3) satisfied IV = available time to foster (Available
Time Scale); higher scores indicate more time to fosterConverted to z-scores
N = 300 foster mothers
Single (Quantitative) IV Example (cont’d) Are foster mothers with more time to
foster more satisfied with their foster care agencies?
Statistical Significance
Table 4.3(zTime) = 0
• Reject
Odds Ratio
OR equals .76 (e-.281)For a one standard-deviation increase in
available time, the odds of being dissatisfied (vs. neutral or satisfied) decrease by a factor of .76 (24%). Similarly, for one standard-deviation increase in available time the odds of being dissatisfied or neutral (vs. satisfied) decrease by a factor of .76 (24%).
Figures
zATS.xls
Estimated Cumulative Logits
L (Dissatisfied vs. Neutral/Satisfied) = t1 - BXL (Dissatisfied/Neutral vs. Satisfied) = t2 – BX
Table 4.3L (Dissatisfied vs. Neutral/Satisfied) = -1.365 – .281XL (Dissatisfied/Neutral vs. Satisfied) = -.269 – .281X
Effect of Time on Satisfaction (Cumulative Logits)
-3.00
-2.00
-1.00
0.00
1.00
Available Time to Foster
Log
its
Dissatisfied -0.52 -0.80 -1.08 -1.36 -1.65 -1.93 -2.21
Dissatisfied/Neutral 0.57 0.29 0.01 -0.27 -0.55 -0.83 -1.11
-3 -2 -1 0 1 2 3
Effect of Time on Satisfaction (Cumulative Odds)
0.00
0.50
1.00
1.50
2.00
Available Time to Foster
Od
ds
Dissatisfied 0.59 0.45 0.34 0.26 0.19 0.15 0.11
Dissatisfied/Neutral 1.77 1.34 1.01 0.76 0.58 0.44 0.33
-3 -2 -1 0 1 2 3
Effect of Time on Satisfaction (Cumulative Probabilities)
.00
.10
.20
.30
.40
.50
.60
.70
Available Time to Foster
Cu
mu
lati
ve P
rob
abil
itie
s
Dissatisfied 0.37 0.31 0.25 0.20 0.16 0.13 0.10
Dissatisfied/Neutral 0.64 0.57 0.50 0.43 0.37 0.30 0.25
-3 -2 -1 0 1 2 3
Question & Answer
Are foster mothers with more time to foster more satisfied with their foster care agencies?
For a one standard-deviation increase in available time, the odds of being dissatisfied (vs. neutral or satisfied) decrease by a factor of .76 (24%). Similarly, for one standard-deviation increase in available time the odds of being dissatisfied or neutral (vs. satisfied) decrease by a factor of .76 (24%).
Multiple IV Example
DV = satisfaction with foster care agencies (1) dissatisfied; (2) neither satisfied nor
dissatisfied; (3) satisfied IV = available time to foster (Available Time
Scale); higher scores indicate more time to foster Converted to z-scores
IV = agencies provided sufficient information about the role of foster care workers0 (no) or 1 (yes)
N = 300 foster mothers
Multiple IV Example (cont’d)
Are foster mothers who receive sufficient information about the role of foster care workers more satisfied with their foster care agencies, controlling for available time to foster?
Statistical Significance
Table 4.4 (Info) = (zTime) = 0
• Reject Table 4.5
(Info) = 0• Reject
(zTime) = 0• Reject
Table 4.6 (Info) = 0
• Reject (zTime) = 0
• Reject
Odds Ratio: Information
OR equals .33 (e-1.116)The odds of being dissatisfied (vs. neutral or
satisfied) are .33 times (67%) smaller for mothers who received sufficient information, when controlling for available time to foster. Similarly, the odds of being dissatisfied or neutral (vs. satisfied) are .33 times (67%) smaller for mothers who received sufficient information, when controlling for time.
Odds Ratio: Time
OR equals .77 (e-.260)For a one standard-deviation increase in
available time, the odds of being dissatisfied (vs. neutral or satisfied) decrease by a factor of .76 (24%), when controlling for information. Similarly, for one standard-deviation increase in available time the odds of being dissatisfied or neutral (vs. satisfied) decrease by a factor of .76 (24%), when controlling for information.
Estimated Cumulative Logits
Table 4.6 L(Dissatisfied vs. Neutral/Satisfied) =
-.941 – [(1.116)(XInfo) + (.260)(XzTime)]
L(Dissatisfied/Neutral vs. Satisfied) =
.222 – [(1.116)(XInfo) + (.260)(XzTime)]
Estimated Odds as a Function of Available Time and Information
See Table 4.7
Estimated Probabilities as a Function of Available Time and Information See Table 4.9
Question & Answer
Are foster mothers who receive sufficient information about the role of foster care workers more satisfied with their foster care agencies, controlling for available time to foster?
The odds of being dissatisfied (vs. neutral or satisfied) are .33 times (67%) smaller for mothers who received sufficient information, when controlling for available time to foster. Similarly, the odds of being dissatisfied or neutral (vs. satisfied) are .33 times (67%) smaller for mothers who received sufficient information, when controlling for time.
Assumptions Necessary for Testing Hypotheses Assumptions discussed in GZLM lecture Effect of the IVs is the same for all values of
the DV (“parallel lines assumption”)
L(Dissatisfied vs. Neutral/Satisfied) = t1 – (BInfoXInfo + BzTimeXzTime)L(Dissatisfied/Neutral vs. Satisfied) = t2 - (BInfoXInfo + BzTimeXzTime)
Ordinal logistic regression assumes that BInfo is the same for both equations, and BzTime is the same for both equations
See Table 4.10
Model Evaluation
Create a set of binary DVs from the polytomous DV
compute Satisfaction (1=1) (2=0) (3=0) into SatisfactionLessThan2.compute Satisfaction (1=1) (2=1) (3=0) into SatisfactionLessThan3.
Run separate binary logistic regressions Use binary logistic regression methods to
detect outliers and influential observations
Model Evaluation (cont’d)
Index plotsLeverage valuesStandardized or unstandardized deviance
residualsCook’s D
Graph and compare observed and estimated counts
Analogs of R2
None in standard use and each may give different results
Typically much smaller than R2 values in linear regression
Difficult to interpret
Multicollinearity
SPSS GZLM doesn’t compute multicollinearity statistics
Use SPSS linear regression Problematic levels
Tolerance < .10 or VIF > 10
Additional Topics
Polytomous IVs Curvilinear relationships Interactions
Additional Regression Models for Polytomous DVs Ordinal probit regression
Substantive results essentially indistinguishable from ordinal logistic regression
Choice between this and ordinal logistic regression largely one of convenience and discipline-specific convention
Many researchers prefer ordinal logistic regression because it provides odds ratios whereas ordinal probit regression does not, and ordinal logistic regression comes with a wider variety of fit statistics
Additional Regression Models for Polytomous DVs (cont’d) Adjacent-category logistic model
Compares each value of the DV to the next higher value
Continuation-ratio logistic modelCompares each value of the DV to all lower
values Generalized ordered logit model
Relaxes the parallel lines assumption
Additional Regression Models for Polytomous DVs (cont’d) Complementary log-log link (also known
as clog-log)Useful when higher categories more
probable Negative log-log link
Useful when lower categories more probable Cauchit link
Useful when DV has a number of extreme values