61
Modeling Categorical Dependent Variables

Modeling Categorical Dependent Variables

  • Upload
    sasawt

  • View
    41

  • Download
    5

Embed Size (px)

DESCRIPTION

IMPORTANT FOR QUANTITATIVE MARKETING

Citation preview

  • Modeling Categorical Dependent Variables

  • What will happen if we have categorical dependent variables?

  • A Dichotomous Dependent Variable

    Noif0Yesif1

    yi

    According to the regression model

    yi = 0 + xi1 + ei

    1i0ii x)y(Ey

    We define

  • How Do Choice Probabilities Fit In?]YesPr[]1yPr[p i1i

    ]NoPr[]0yPr[p i2i

    1i

    2i1ii

    p

    p)0(p)1()y(E

    1i01i xp

    From the definition of Expectation of a Discrete Variable

  • Two Requirements for a Probability

    1p0 1i

    1pp 2i1i

    Logical Consistency

    Sum Constraint

  • A Requirement for Regression

    0yfor)x(0

    1yfor)x(1e

    i1i0

    i1i0i

    V(ei) = E[ei E(ei)]2

    21i02i

    21i01i

    2i )x(p)x1(p)e(E

    V(e) = 2I

    Gauss-Markov Assumption

    Two possibilities exist

    Since E(ei) = 0

    by the Definition of E()

  • Heteroskedasticity Rears Its Head

    )x1)(x(

    )p1(p

    )p)(p1()p1(p

    1i01i0

    1i1i

    21i1i

    21i1i

    21i02i

    21i01i

    2i )x(p)x1(p)e(E

    Note that the subscript i appears on the right hand side!

  • The Logit Model

    1i0

    1i0

    1i0L1i xe1

    xe)x(Fp

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    ix

  • Example

    Age CD Age CD Age CD

    22 0 40 0 54 0 23 0 41 1 55 1 24 0 46 0 58 1 27 0 47 0 60 1 28 0 48 0 60 0 30 0 49 1 62 1 30 0 49 0 65 1 32 0 50 1 67 1 33 0 51 0 71 1 35 1 51 1 77 1 38 0 52 0 81 1

    Age and Cash Discount Approval

  • How can we analyse these data?

    Compare mean age of people having cash discount and non-CD

    Non-CD: 38.6 yearsCD: 58.7 years (p

  • Dot-plot

    AGE (years)

    No

    Yes

    0 20 40 60 80 100

    Cas

    h D

    isco

    unt

  • Logistic regressionPrevalence (%) of CD according to age group

    Age group # in group # %

    20 - 29 5 0 0

    30 - 39 6 1 17

    40 - 49 7 2 29

    50 - 59 7 4 57

    60 - 69 5 4 80

    70 - 79 2 2 100

    80 - 89 1 1 100

    CD

  • Dot-plot

    0

    20

    40

    60

    80

    100

    0 2 4 6 8

    CD %

    Age group

  • Logistic function

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0Probabilityof CD

    x

    P y x ee

    x

    x( )

    1

  • ln( )

    ( )P y x

    P y xx

    1

    Logistic transformation

    logit of P(y|x)

    P y x ee

    x

    x( )

    1

  • The Expression for Not Buying

    i

    i

    1i ue1

    uep

    i

    i

    i

    i

    i

    1i2i

    ue1

    ue1

    ueue1

    ue1p1p

    1

    where ui = 0 + xi1

  • The Logit Is a Special Case of Bell, Keeney and Littles (1975) Market Share Theorem

    J

    m im

    ijij

    a

    ap

    ai1 = and ai2 = 1iue

    For J = 2 and the logit model,

  • My Share of the Market Is My Share of the Attraction

    21

    1

    aaa)1Pr(

    where a1 is a function of Marketing Variables brought to bear on behalf of brand 1

  • The Model Can Be Linearized for Least Squares

    i

    i

    1i ue1

    uep

    i

    i

    i

    i

    2i1i

    ue

    ueue1

    uepp1

    1

    1i0i2i1i xu)ppln(

  • Odds and Odds Ratios Odds is the probability of an event occurring

    divided by the probability of the event not occurring

    An odds ratio is the ratio of the odds for two different groups An odds ratio = 1 implies equal risk in the two

    groups

  • Probability and Odds We begin with a frequency distribution for the

    variable Buying an insurance due to Mortality Risk

    The probability of buying insurance to cover MR is 0.34 or 34% (50/147)

    The odds of buying insurance due to MR = MR/INV = 50/97 = 0.5155

    Mortality Risk (MR) 50 34%Investment (Inv) 97 66%Total 147 100%

  • Interpreting Odds The odds of 0.5155 can be stated in different ways:

    Insurance companies can expect to win a customer by having good MR scheme instead of good money return schemes in about half of the cases

    Winning a customer with good MR policy is half as likely as winning with good investment return policy

    Or, inverting the odds, Winning a customer with good investment return policy

    is twice as likely as winning with good MR policy

  • Impact of an Independent Variable

    If an independent variable impacts or has a relationship to a dependent variable, it will change the odds of being in the key dependent variable group, e.g. buying the insurance policy.

    The following table shows the relationship between age and buying behaviour:

    Age < 40 Age >= 40 TotalMortality Risk (MR) 28 22 50Investment (Inv) 45 52 97Total 73 74 147

  • Odds for Independent Variable Groups

    We can compute the odds of buying an insurance policy for each of the groups:

    The odds of buying a MR if the customer was having age < 40 = 28/45 = 0.6222

    The odds of buying a MR if the customer was having age >= 40 = 22/52 = 0.4231

    Age < 40 Age >= 40 TotalMortality Risk (MR) 28 22 50Investment (Inv) 45 52 97Total 73 74 147

  • The Odds Ratio Measures the Effect The impact of age on busing an insurance policy is

    measured by the odds ratio which equals:= the odds if age < 40 the odds if age >= 40 = 0.6222 0.4231 = 1.47

    Which we interpret as: Young customers are 1.47 times more likely to buy

    a MR policy as compared to old customers The odds of a buying a MR for young customers are

    47% higher than the odds for old. (1.47 - 1.00) A one unit change in the independent variable age

    (old to young) increases the odds of buying a MR by a factor of 1.47.

  • Odds & Odds ratios

    Bankruptcy

    Delinquency Yes No

    Yes 75 175 ?

    No 20 180 ?

    ? ? ?

  • P(ibk) P(nibk) Odds ibk Odds ratio

    0.3 0.7 0.43

    0.1 0.9 0.11

    3.86

  • Advantages of Logit Transform

    Probabilities range between zero and one Odds = P/(1-P) Odds range between zero and infinity Logit = ln(P/(1-P)) The logit transform ranges between negative infinity

    and infinity

  • Logistic Regression Model the logarithm of the odds of an

    outcome as a linear combination of predictor variables

    Logit = ln(P/(1-P) = b0+b1X1+b2X2+. . . Estimate the coefficients b0, b1, b2 based on a

    random sample of subjects data Determine which of the predictors are good Assess model fit Use the model to predict future cases

  • Logi

    t

    Age

  • Pro

    babi

    lity

    Age

  • Estimating & Interpreting Logistic Regression

  • Logistic Regression Model

    The general model for Logistic Regression is

    RxxRxUxU

    3211ln

  • Re-write to define U(x)

    RxxRRxxRxU321

    321

    exp1)exp(

  • TermsTerm Definition

    U(x) Logistic Regression Function

    R Categorical Variable

    x Continuous Variable

    Parameters to be estimated

    123

  • Properties of Logistic Regression

    Dependent Variable takes on value of 0 or 1 Therefore, Pr(Y = 1 | x) = U(x) Y is transformed as an odds ratio i.e., probability an event occurs relative to its

    converse Odds ratio of 1.0 indicates equal probability of an

    event and its converse (p = 0.5) Natural log of the odds ratio is the logit

    transformation

  • How to Estimate the Parameters

    Logistic Regression Uses Maximum Likelihood Estimation

  • Maximum Likelihood Estimates

    Fit the Likelihood Function

    ii

    ui

    un

    ii xUxUL

    11

    1

    Probability an Event Occurred

    Probability

    an Event

    Did Not Occur

  • ii uun

    i xxx

    1

    101 10

    10

    exp11

    exp1exp

    We seek those values of 0 & 1 that maximize the likelihood function

  • How to interpret the results?- a 1-unit change in X is associated with a b-units

    change in the value of the latent preference variable (Y*), which determines the dichotomous) value of the observed variable (Y))

    )}(exp{}exp{}exp{}exp{

    01010

    1 XXbbXabXabXabXa

    Interpreting Results

  • Interpreting Results

    Alternative interpretations? P/(1-P) = exp{a+bX} is an ODDS RATIO

    for two mutually exclusive odds When X changes (from X0 to X1), then the

    odds ratio changes by: A 1-unit change in X (=X1-X0) is associated

    with a b-units change in the log of the ODDS RATIO (logit) for Y=1

    But what about a change in PROBABILITY??? (marginal effect)

  • What is the marginal effect on the PROBABILITY?

    - Logit is a non-linear relationship, thus b is not an independent effect

    - The effect of dX on P(Y) is a function of Z=a+bX (depends on all Xs)

    - Thus, the effect of dX has to be evaluated at each value of Z

  • What can we infer about how the probability changes?

    - The b-coefficient shows the sign (direction) of the relationship, but does not by itself determine the magnitude of the effect; the size ff the effect depends on the values of all the model parameters

    - Hence, logit allows a specification where changes in the observed behaviour (outcome) that are due to the change in an exogenous variable (characteristic) are conditioned on the values & effects of all other characteristics

    - However, the underlying relationship (effect of each characteristic on preference over observed outcome) is still linear (i.e., impact of each characteristic on preferences is independent of other characteristics)

  • Case Study

    Predicting default behavior

  • Data Customer level information

    Credit quality Interest rate premium

    Sno CS Preminum DefaultCust 1 637 0.63 0Cust 2 653 1.68 1Cust 3 556 0.63 0Cust 4 664 1.905 0Cust 5 544 1.98 0Cust 6 632 0.78 0Cust 7 595 2.63 0Cust 8 557 3.83 0Cust 9 651 2.38 0Cust 10 666 0.73 0Cust 11 686 0.23 0Cust 12 712 0.88 0

  • Default values by CS

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    400 450 500 550 600 650 700 750 800 850

    CS

    Def

    ault

    Default values by Premium

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    -2 0 2 4 6 8 10

    Premium

    Def

    ault

  • Default Rate by CS

    0%

    5%

    10%

    15%

    20%

    25%

    30%

    35%

    < 560 560 -< 620 620 -< 700 700+

    CS

    Def

    ault

    Rat

    e

  • Default Rate by Premium bins

    0%

    5%

    10%

    15%

    20%

    25%

    30%

    35%

    < 1 1 -< 2 2 -< 3 3+

    Premium

    Def

    ault

    Rat

    e

  • Parameter DF Estimate Std Err Wald ChiSq Pr > ChiSqIntercept 1 -1.7501 0.0518 1140.3188 ChiSqIntercept 1 3.6161 0.4638 60.7754
  • Parameter DF Estimate Std Err Wald ChiSq Pr > ChiSqIntercept 1 -0.0448 0.561 0.0064 0.9363

    CS 1 -0.00466 0.000835 31.1981

  • CutoffCutoff

    BadsBads GoodsGoods

    10%10%% D

    efau

    lt

    35%

    Computing Cutoff

  • Logistic Regression Predicted Probabilities and Classification with 0.30 cutoff

    Sno CS Preminum Actual Default Logistic P Pred Default ClassifyCust 1 637 0.63 0 0.01155 0 NDCust 2 653 1.68 1 0.95213 1 DCust 3 556 0.63 0 0.89124 1 DCust 4 664 1.905 0 0.84625 1 DCust 5 544 1.98 0 0.67182 1 DCust 6 632 0.78 0 0.78328 1 DCust 7 595 2.63 0 0.61989 1 DCust 8 557 3.83 0 0.00001 0 NDCust 9 651 2.38 0 0.95435 1 D

    Cust 10 666 0.73 0 0.85686 1 DCust 11 686 0.23 0 0.83464 1 DCust 12 712 0.88 0 0.65759 1 DCust 13 632 2.73 0 0.17796 0 NDCust 14 619 5.32 0 0.36792 1 DCust 15 664 1.13 0 0.23750 0 NDCust 16 575 2.47 0 0.12322 0 NDCust 17 750 0.92 1 0.11146 0 NDCust 18 645 3.13 0 0.05473 0 NDCust 19 644 2.97 1 0.03869 0 NDCust 20 678 1.17 0 0.03869 0 ND

  • Sensitivity & Specificity

    Sensitivity Power to identify positives Sensitivity = TP / (TP + FN)

    Specificity Power to identify negatives Specificity = TN / (TN + FP)

    ModelP N

    Reality P TP FPN FN TN

  • Sen

    sitiv

    ity/S

    peci

    ficity

    Probability cutoff

    Sensitivity Specificity

    0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.000.00

    0.10

    0.20

    0.30

    0.40

    0.50

    0.60

    0.70

    0.80

    0.90

    1.00

  • 57

    Logistic regression for binary response variables

    Basic Syntax:

    proc logistic data=two outest=parms descending;class x3 (ref='1') c4 (ref='F') /param= ref;

    model y=x1 x2 x3 c4 / rsquare lackfitselection = stepwise ctable pprob = (0 to 1 by 0.1) outroc=roc1;

    proc score data=chdage1 score = parms out=scored type=parms; var age;

    run;rsquare requests a generalized R2 measure for the fitted model.lackfit performs the Hosmer and Lemeshow goodness-of-fit test.ctable classifies the input response observations according to whether the predicted probability of (Y=1) is above or below some cutpoint value, for a number of cutpoint values in the range (0,1). An observation is predicted as an event, that is, in our case, 1, if the predicted probability of (Y=1) exceeds the cutpoint value. The table allows to assess the ability of the model to discriminate between the two groups of cases, Y=1 and Y=0.

  • 58

    Classification Table: The model classifies an observation as an event if its estimated probability is greater

    than or equal to a given probability cutpoints.

    Percentages (%)Prob. Level Event Non Event Event Non Event Correct Sensitivity Specificity FALSE POS FALSE NEG

    0 57 0 43 0 57 100 0 43 .0.1 57 1 42 0 58 100 2.3 42.4 00.2 55 7 36 2 62 96.5 16.3 39.6 22.20.3 51 19 24 6 70 89.5 44.2 32 240.4 50 25 18 7 75 87.7 58.1 26.5 21.90.5 45 27 16 12 72 78.9 62.8 26.2 30.80.6 41 32 11 16 73 71.9 74.4 21.2 33.30.7 32 36 7 25 68 56.1 83.7 17.9 410.8 24 39 4 33 63 42.1 90.7 14.3 45.80.9 6 42 1 51 48 10.5 97.7 14.3 54.8

    1 0 43 0 57 43 0 100 . 57

    Tot Correct / Total

    Correct Event/ Tot Event

    Correct N.Event/ Tot N.Event

    F.Pos / (F.Pos+Pos)

    F.Neg / (F.Neg+Neg)

    Item a b c d(a+b) / (a+b+c+d) a / (a+d) b / (b+c) c / (a+c) d / (b+d)

    Correct Incorrect

  • 59

    Interpretation of SAS output - continued

    Model Selection Criteria: Convergence - difference in parameter estimates is small enough.

    Model Fit Statistics Criteria:

    Likelihood Function:

    2 * log (likelihood ) AIC = 2 * log ( max likelihood ) + 2 * k SIC = 2 * log ( max likelihood ) + log (N) * k

    Testing Global Null Hypothesis: BETA=0

    Likelihood ratio: ln(L intercept)- ln(L int + covariates), Score: 1st and 2nd derivative of Log(L) Wald: (coefficient / std error)2

    iiy yi

    n

    ii ppL

    11

    )1(

  • 60

    Interpretation of SAS output - continued

    Analysis of Maximum Likelihood Estimates Parameter estimates and significance test

    Odds Ratio Estimates

    Odds:

    Odds ratio: Oi / Oj per unit change in covariate.

    Association of Predicted Probabilities and Observed Responses Pairs: 43 (event) * 57 (non event) = 2451 Concordant (0- lower prob vs. 1- higher prob) Discordant (0- higher prob vs. 1- lower prob) Tie all other

    ROC used to visualize model model prediction strength.

    )exp(0

    ijj

    k

    ji xO

  • 61

    Interpretation of SAS output - continuedClassification Table:

    The model classifies an observation as an event if its estimated probability is greater than or equal to a given probability cutpoints.

    Percentages (%)Prob. Level Event Non Event Event Non Event Correct Sensitivity Specificity FALSE POS FALSE NEG

    0 57 0 43 0 57 100 0 43 .0.1 57 1 42 0 58 100 2.3 42.4 00.2 55 7 36 2 62 96.5 16.3 39.6 22.20.3 51 19 24 6 70 89.5 44.2 32 240.4 50 25 18 7 75 87.7 58.1 26.5 21.90.5 45 27 16 12 72 78.9 62.8 26.2 30.80.6 41 32 11 16 73 71.9 74.4 21.2 33.30.7 32 36 7 25 68 56.1 83.7 17.9 410.8 24 39 4 33 63 42.1 90.7 14.3 45.80.9 6 42 1 51 48 10.5 97.7 14.3 54.8

    1 0 43 0 57 43 0 100 . 57

    Tot Correct / Total

    Correct Event/ Tot Event

    Correct N.Event/ Tot N.Event

    F.Pos / (F.Pos+Pos)

    F.Neg / (F.Neg+Neg)

    Item a b c d(a+b) / (a+b+c+d) a / (a+d) b / (b+c) c / (a+c) d / (b+d)

    Correct Incorrect