Upload
sasawt
View
41
Download
5
Embed Size (px)
DESCRIPTION
IMPORTANT FOR QUANTITATIVE MARKETING
Citation preview
Modeling Categorical Dependent Variables
What will happen if we have categorical dependent variables?
A Dichotomous Dependent Variable
Noif0Yesif1
yi
According to the regression model
yi = 0 + xi1 + ei
1i0ii x)y(Ey
We define
How Do Choice Probabilities Fit In?]YesPr[]1yPr[p i1i
]NoPr[]0yPr[p i2i
1i
2i1ii
p
p)0(p)1()y(E
1i01i xp
From the definition of Expectation of a Discrete Variable
Two Requirements for a Probability
1p0 1i
1pp 2i1i
Logical Consistency
Sum Constraint
A Requirement for Regression
0yfor)x(0
1yfor)x(1e
i1i0
i1i0i
V(ei) = E[ei E(ei)]2
21i02i
21i01i
2i )x(p)x1(p)e(E
V(e) = 2I
Gauss-Markov Assumption
Two possibilities exist
Since E(ei) = 0
by the Definition of E()
Heteroskedasticity Rears Its Head
)x1)(x(
)p1(p
)p)(p1()p1(p
1i01i0
1i1i
21i1i
21i1i
21i02i
21i01i
2i )x(p)x1(p)e(E
Note that the subscript i appears on the right hand side!
The Logit Model
1i0
1i0
1i0L1i xe1
xe)x(Fp
0.0
0.2
0.4
0.6
0.8
1.0
ix
Example
Age CD Age CD Age CD
22 0 40 0 54 0 23 0 41 1 55 1 24 0 46 0 58 1 27 0 47 0 60 1 28 0 48 0 60 0 30 0 49 1 62 1 30 0 49 0 65 1 32 0 50 1 67 1 33 0 51 0 71 1 35 1 51 1 77 1 38 0 52 0 81 1
Age and Cash Discount Approval
How can we analyse these data?
Compare mean age of people having cash discount and non-CD
Non-CD: 38.6 yearsCD: 58.7 years (p
Dot-plot
AGE (years)
No
Yes
0 20 40 60 80 100
Cas
h D
isco
unt
Logistic regressionPrevalence (%) of CD according to age group
Age group # in group # %
20 - 29 5 0 0
30 - 39 6 1 17
40 - 49 7 2 29
50 - 59 7 4 57
60 - 69 5 4 80
70 - 79 2 2 100
80 - 89 1 1 100
CD
Dot-plot
0
20
40
60
80
100
0 2 4 6 8
CD %
Age group
Logistic function
0.0
0.2
0.4
0.6
0.8
1.0Probabilityof CD
x
P y x ee
x
x( )
1
ln( )
( )P y x
P y xx
1
Logistic transformation
logit of P(y|x)
P y x ee
x
x( )
1
The Expression for Not Buying
i
i
1i ue1
uep
i
i
i
i
i
1i2i
ue1
ue1
ueue1
ue1p1p
1
where ui = 0 + xi1
The Logit Is a Special Case of Bell, Keeney and Littles (1975) Market Share Theorem
J
m im
ijij
a
ap
ai1 = and ai2 = 1iue
For J = 2 and the logit model,
My Share of the Market Is My Share of the Attraction
21
1
aaa)1Pr(
where a1 is a function of Marketing Variables brought to bear on behalf of brand 1
The Model Can Be Linearized for Least Squares
i
i
1i ue1
uep
i
i
i
i
2i1i
ue
ueue1
uepp1
1
1i0i2i1i xu)ppln(
Odds and Odds Ratios Odds is the probability of an event occurring
divided by the probability of the event not occurring
An odds ratio is the ratio of the odds for two different groups An odds ratio = 1 implies equal risk in the two
groups
Probability and Odds We begin with a frequency distribution for the
variable Buying an insurance due to Mortality Risk
The probability of buying insurance to cover MR is 0.34 or 34% (50/147)
The odds of buying insurance due to MR = MR/INV = 50/97 = 0.5155
Mortality Risk (MR) 50 34%Investment (Inv) 97 66%Total 147 100%
Interpreting Odds The odds of 0.5155 can be stated in different ways:
Insurance companies can expect to win a customer by having good MR scheme instead of good money return schemes in about half of the cases
Winning a customer with good MR policy is half as likely as winning with good investment return policy
Or, inverting the odds, Winning a customer with good investment return policy
is twice as likely as winning with good MR policy
Impact of an Independent Variable
If an independent variable impacts or has a relationship to a dependent variable, it will change the odds of being in the key dependent variable group, e.g. buying the insurance policy.
The following table shows the relationship between age and buying behaviour:
Age < 40 Age >= 40 TotalMortality Risk (MR) 28 22 50Investment (Inv) 45 52 97Total 73 74 147
Odds for Independent Variable Groups
We can compute the odds of buying an insurance policy for each of the groups:
The odds of buying a MR if the customer was having age < 40 = 28/45 = 0.6222
The odds of buying a MR if the customer was having age >= 40 = 22/52 = 0.4231
Age < 40 Age >= 40 TotalMortality Risk (MR) 28 22 50Investment (Inv) 45 52 97Total 73 74 147
The Odds Ratio Measures the Effect The impact of age on busing an insurance policy is
measured by the odds ratio which equals:= the odds if age < 40 the odds if age >= 40 = 0.6222 0.4231 = 1.47
Which we interpret as: Young customers are 1.47 times more likely to buy
a MR policy as compared to old customers The odds of a buying a MR for young customers are
47% higher than the odds for old. (1.47 - 1.00) A one unit change in the independent variable age
(old to young) increases the odds of buying a MR by a factor of 1.47.
Odds & Odds ratios
Bankruptcy
Delinquency Yes No
Yes 75 175 ?
No 20 180 ?
? ? ?
P(ibk) P(nibk) Odds ibk Odds ratio
0.3 0.7 0.43
0.1 0.9 0.11
3.86
Advantages of Logit Transform
Probabilities range between zero and one Odds = P/(1-P) Odds range between zero and infinity Logit = ln(P/(1-P)) The logit transform ranges between negative infinity
and infinity
Logistic Regression Model the logarithm of the odds of an
outcome as a linear combination of predictor variables
Logit = ln(P/(1-P) = b0+b1X1+b2X2+. . . Estimate the coefficients b0, b1, b2 based on a
random sample of subjects data Determine which of the predictors are good Assess model fit Use the model to predict future cases
Logi
t
Age
Pro
babi
lity
Age
Estimating & Interpreting Logistic Regression
Logistic Regression Model
The general model for Logistic Regression is
RxxRxUxU
3211ln
Re-write to define U(x)
RxxRRxxRxU321
321
exp1)exp(
TermsTerm Definition
U(x) Logistic Regression Function
R Categorical Variable
x Continuous Variable
Parameters to be estimated
123
Properties of Logistic Regression
Dependent Variable takes on value of 0 or 1 Therefore, Pr(Y = 1 | x) = U(x) Y is transformed as an odds ratio i.e., probability an event occurs relative to its
converse Odds ratio of 1.0 indicates equal probability of an
event and its converse (p = 0.5) Natural log of the odds ratio is the logit
transformation
How to Estimate the Parameters
Logistic Regression Uses Maximum Likelihood Estimation
Maximum Likelihood Estimates
Fit the Likelihood Function
ii
ui
un
ii xUxUL
11
1
Probability an Event Occurred
Probability
an Event
Did Not Occur
ii uun
i xxx
1
101 10
10
exp11
exp1exp
We seek those values of 0 & 1 that maximize the likelihood function
How to interpret the results?- a 1-unit change in X is associated with a b-units
change in the value of the latent preference variable (Y*), which determines the dichotomous) value of the observed variable (Y))
)}(exp{}exp{}exp{}exp{
01010
1 XXbbXabXabXabXa
Interpreting Results
Interpreting Results
Alternative interpretations? P/(1-P) = exp{a+bX} is an ODDS RATIO
for two mutually exclusive odds When X changes (from X0 to X1), then the
odds ratio changes by: A 1-unit change in X (=X1-X0) is associated
with a b-units change in the log of the ODDS RATIO (logit) for Y=1
But what about a change in PROBABILITY??? (marginal effect)
What is the marginal effect on the PROBABILITY?
- Logit is a non-linear relationship, thus b is not an independent effect
- The effect of dX on P(Y) is a function of Z=a+bX (depends on all Xs)
- Thus, the effect of dX has to be evaluated at each value of Z
What can we infer about how the probability changes?
- The b-coefficient shows the sign (direction) of the relationship, but does not by itself determine the magnitude of the effect; the size ff the effect depends on the values of all the model parameters
- Hence, logit allows a specification where changes in the observed behaviour (outcome) that are due to the change in an exogenous variable (characteristic) are conditioned on the values & effects of all other characteristics
- However, the underlying relationship (effect of each characteristic on preference over observed outcome) is still linear (i.e., impact of each characteristic on preferences is independent of other characteristics)
Case Study
Predicting default behavior
Data Customer level information
Credit quality Interest rate premium
Sno CS Preminum DefaultCust 1 637 0.63 0Cust 2 653 1.68 1Cust 3 556 0.63 0Cust 4 664 1.905 0Cust 5 544 1.98 0Cust 6 632 0.78 0Cust 7 595 2.63 0Cust 8 557 3.83 0Cust 9 651 2.38 0Cust 10 666 0.73 0Cust 11 686 0.23 0Cust 12 712 0.88 0
Default values by CS
0
0.2
0.4
0.6
0.8
1
1.2
400 450 500 550 600 650 700 750 800 850
CS
Def
ault
Default values by Premium
0
0.2
0.4
0.6
0.8
1
1.2
-2 0 2 4 6 8 10
Premium
Def
ault
Default Rate by CS
0%
5%
10%
15%
20%
25%
30%
35%
< 560 560 -< 620 620 -< 700 700+
CS
Def
ault
Rat
e
Default Rate by Premium bins
0%
5%
10%
15%
20%
25%
30%
35%
< 1 1 -< 2 2 -< 3 3+
Premium
Def
ault
Rat
e
Parameter DF Estimate Std Err Wald ChiSq Pr > ChiSqIntercept 1 -0.0448 0.561 0.0064 0.9363
CS 1 -0.00466 0.000835 31.1981
CutoffCutoff
BadsBads GoodsGoods
10%10%% D
efau
lt
35%
Computing Cutoff
Logistic Regression Predicted Probabilities and Classification with 0.30 cutoff
Sno CS Preminum Actual Default Logistic P Pred Default ClassifyCust 1 637 0.63 0 0.01155 0 NDCust 2 653 1.68 1 0.95213 1 DCust 3 556 0.63 0 0.89124 1 DCust 4 664 1.905 0 0.84625 1 DCust 5 544 1.98 0 0.67182 1 DCust 6 632 0.78 0 0.78328 1 DCust 7 595 2.63 0 0.61989 1 DCust 8 557 3.83 0 0.00001 0 NDCust 9 651 2.38 0 0.95435 1 D
Cust 10 666 0.73 0 0.85686 1 DCust 11 686 0.23 0 0.83464 1 DCust 12 712 0.88 0 0.65759 1 DCust 13 632 2.73 0 0.17796 0 NDCust 14 619 5.32 0 0.36792 1 DCust 15 664 1.13 0 0.23750 0 NDCust 16 575 2.47 0 0.12322 0 NDCust 17 750 0.92 1 0.11146 0 NDCust 18 645 3.13 0 0.05473 0 NDCust 19 644 2.97 1 0.03869 0 NDCust 20 678 1.17 0 0.03869 0 ND
Sensitivity & Specificity
Sensitivity Power to identify positives Sensitivity = TP / (TP + FN)
Specificity Power to identify negatives Specificity = TN / (TN + FP)
ModelP N
Reality P TP FPN FN TN
Sen
sitiv
ity/S
peci
ficity
Probability cutoff
Sensitivity Specificity
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.000.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
57
Logistic regression for binary response variables
Basic Syntax:
proc logistic data=two outest=parms descending;class x3 (ref='1') c4 (ref='F') /param= ref;
model y=x1 x2 x3 c4 / rsquare lackfitselection = stepwise ctable pprob = (0 to 1 by 0.1) outroc=roc1;
proc score data=chdage1 score = parms out=scored type=parms; var age;
run;rsquare requests a generalized R2 measure for the fitted model.lackfit performs the Hosmer and Lemeshow goodness-of-fit test.ctable classifies the input response observations according to whether the predicted probability of (Y=1) is above or below some cutpoint value, for a number of cutpoint values in the range (0,1). An observation is predicted as an event, that is, in our case, 1, if the predicted probability of (Y=1) exceeds the cutpoint value. The table allows to assess the ability of the model to discriminate between the two groups of cases, Y=1 and Y=0.
58
Classification Table: The model classifies an observation as an event if its estimated probability is greater
than or equal to a given probability cutpoints.
Percentages (%)Prob. Level Event Non Event Event Non Event Correct Sensitivity Specificity FALSE POS FALSE NEG
0 57 0 43 0 57 100 0 43 .0.1 57 1 42 0 58 100 2.3 42.4 00.2 55 7 36 2 62 96.5 16.3 39.6 22.20.3 51 19 24 6 70 89.5 44.2 32 240.4 50 25 18 7 75 87.7 58.1 26.5 21.90.5 45 27 16 12 72 78.9 62.8 26.2 30.80.6 41 32 11 16 73 71.9 74.4 21.2 33.30.7 32 36 7 25 68 56.1 83.7 17.9 410.8 24 39 4 33 63 42.1 90.7 14.3 45.80.9 6 42 1 51 48 10.5 97.7 14.3 54.8
1 0 43 0 57 43 0 100 . 57
Tot Correct / Total
Correct Event/ Tot Event
Correct N.Event/ Tot N.Event
F.Pos / (F.Pos+Pos)
F.Neg / (F.Neg+Neg)
Item a b c d(a+b) / (a+b+c+d) a / (a+d) b / (b+c) c / (a+c) d / (b+d)
Correct Incorrect
59
Interpretation of SAS output - continued
Model Selection Criteria: Convergence - difference in parameter estimates is small enough.
Model Fit Statistics Criteria:
Likelihood Function:
2 * log (likelihood ) AIC = 2 * log ( max likelihood ) + 2 * k SIC = 2 * log ( max likelihood ) + log (N) * k
Testing Global Null Hypothesis: BETA=0
Likelihood ratio: ln(L intercept)- ln(L int + covariates), Score: 1st and 2nd derivative of Log(L) Wald: (coefficient / std error)2
iiy yi
n
ii ppL
11
)1(
60
Interpretation of SAS output - continued
Analysis of Maximum Likelihood Estimates Parameter estimates and significance test
Odds Ratio Estimates
Odds:
Odds ratio: Oi / Oj per unit change in covariate.
Association of Predicted Probabilities and Observed Responses Pairs: 43 (event) * 57 (non event) = 2451 Concordant (0- lower prob vs. 1- higher prob) Discordant (0- higher prob vs. 1- lower prob) Tie all other
ROC used to visualize model model prediction strength.
)exp(0
ijj
k
ji xO
61
Interpretation of SAS output - continuedClassification Table:
The model classifies an observation as an event if its estimated probability is greater than or equal to a given probability cutpoints.
Percentages (%)Prob. Level Event Non Event Event Non Event Correct Sensitivity Specificity FALSE POS FALSE NEG
0 57 0 43 0 57 100 0 43 .0.1 57 1 42 0 58 100 2.3 42.4 00.2 55 7 36 2 62 96.5 16.3 39.6 22.20.3 51 19 24 6 70 89.5 44.2 32 240.4 50 25 18 7 75 87.7 58.1 26.5 21.90.5 45 27 16 12 72 78.9 62.8 26.2 30.80.6 41 32 11 16 73 71.9 74.4 21.2 33.30.7 32 36 7 25 68 56.1 83.7 17.9 410.8 24 39 4 33 63 42.1 90.7 14.3 45.80.9 6 42 1 51 48 10.5 97.7 14.3 54.8
1 0 43 0 57 43 0 100 . 57
Tot Correct / Total
Correct Event/ Tot Event
Correct N.Event/ Tot N.Event
F.Pos / (F.Pos+Pos)
F.Neg / (F.Neg+Neg)
Item a b c d(a+b) / (a+b+c+d) a / (a+d) b / (b+c) c / (a+c) d / (b+d)
Correct Incorrect