29
Logistic Regression Applications Hu Lunchao

Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Embed Size (px)

Citation preview

Page 1: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Logistic Regression Applications

Hu Lunchao

Page 2: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

2

Contents

11 What Is Logistic Regression?

22 Modeling Categorical Responses

33 Modeling Ordinal Variables

44 Matched Case-Control Studies

Page 3: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

An Example

Page 4: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables
Page 5: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

An Example

Page 6: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

What Is Logistic Regression?

Logistic Function:

Logit transformation:

The variance:

))(,0(~

)(

xB

xy

0 1

0 1( )

1

x

x

ex

e

( )( ) ln

1 ( )

xg x

x

Page 7: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Modeling Categorical Responses

Logistic regression model for binary response variables. For instance, logistic regression can model

A voter's choice in a presidential election , with predictor variables political ideology, annual income,education level, and religious affiliation .

Whether a person uses illegal drugs (yes or no), with predictors education level, whether employed, religiosity, marital status, and annual income.

Page 8: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Modeling Categorical Responses

Income and Having Travel Credit Cards

Page 9: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Modeling Categorical Responses

Let x = annual income and y = whether have a travel credit card (1 = yes, 0 = no).

Software provides results shown in Table Fit of Logistic Regression Model for Italian Credit Card Data

beta S.E. Exp(beta)

income .1054 .0262 1.111

Constant -3.5179 .7103

Page 10: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Modeling Categorical Responses

The logistic prediction equation is

log [ ( 1)] 3.518 0.105it P y x

Page 11: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Modeling Categorical Responses

Interpretation for beta:The sign of beta tells us whether it is increasing or decreasing as x increases;

The curve has slope beta*P(y = 1)[1-P(y = 1)],where P(y = 1) is the probability at that point.

( 1)1

x

x

eP y

e

-3.52 0.105

-3.52 0.105( 1)

1

x

x

eP y

e

Page 12: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

This exponential relationship implies that every unitincrease in x has a multiplicative effect of ebeta on the odds.

Interpretation Using the Odds and Odds Ratio

( 1)( )

1 ( 1)x xP y

e e eP y

Page 13: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Modeling for Ordinal Variables

Many applications have a categorical response variable with more than two categories.For instance, the General Social Survey recently asked subjects whether government spending on the environment should increase,remain the same, or decrease

Page 14: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Cumulative Probabilities

Let y denote an ordinal response variable. Let P(y <=j) denote the probability that the response falls in category j or below.With four categories, for example, the cumulative probabilities are

( 1), ( 2) ( 1) ( 2),

( 3) ( 1) ( 2) ( 3)

and P( 4)=1

P y P y P y P y

P y P y P y P y

y

Page 15: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Cumulative Logits

A c-category response has c cumulative probabilities

With c = 4, for example, the logits are

( 1) ( 2) ( )P y P y P y c

( 1) ( 1)log [ ( 1)] log[ ] log[ ]

( 1) ( 2) ( 3) ( 4)

( 2) ( 1) ( 2)log [ ( 2)] log[ ] log[ ]

( 2) ( 3) ( 4)

( 3) ( 1) ( 2) ( 3)log [ ( 3)] log[ ] log[ ]

( 3) ( 4)

log [ ( 4)]

P y P yit P y

P y P y P y P y

P y P y P yit P

P y P y P y

P y P y P y P yit P

P y P y

it P

0

Page 16: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Cumulative Logit Models for an Ordinal Response

The model requires a separate interceptparameter j for each cumulative probability. Since the cumulative probabilities increase as j increases, so do .The effect of an explanatory variableon all the cumulative probabilities for y should be the same, so beta keeps unchanged.

log [ ( )] , 1, 2, , 1jit P y j x j c

{ }j

Page 17: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Software estimates the parameters using all the cumulative probabilities at once.

Page 18: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Do Republicans tend to be more conservative than Democrats?

Let x be a dummy variable for party affiliation, with x = 1 for Democrats and x = 0 for Republicans.

log [ ( )] , 1, 2, , 1jit P y j x j c

Page 19: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Democrats tending to be more liberal than Republicans

Page 20: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Matched Case-Control Studies

Case-control studies are used to identify factors that may contribute to a medical condition by comparing subjects who have that condition (the 'cases') with patients who do not have the condition but are otherwise similar.Within each stratum, samples of cases(y=1) and controls(y=0) are chosen. The number of cases andcontrols need not be constant across strata, but the most common matched designs include one case and form one to five controls per stratum and are thus referred to as 1-M matched studies.

Page 21: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

In a case–control study, the logistic regression outcome variable Y is binary; here Y = 1 for cases and Y = 0 for controls. Then, it is written as:

0 1( )

1( 1) ( )

1 xP y x

e

Page 22: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

ML Approaches to Estimate beta: unconditional and conditional

The unconditional likelihood is

where n is the total number of individuals.

0 1

0 1

0 1

( )( ) 1

( )1

[1 ][1 ]

j j

j j

j j

x ynx y

U x yj

eL e

e

Page 23: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

ML Approaches to Estimate beta: unconditional and conditional

The conditional likelihood is

where m is the number of cases, and the sum in the denominator is over the Cm

n possibilities of dividing the numbers from 1 to n into one group {1, 2, . . . ,m} of size m and its complement {lm+1, . . . ,ln}.

1

1

1

1

j

j

m x

jC m x

j

eL

e

Page 24: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

What Factors Attribute to Low-birth-weight

Dataset: 189 women, of whom 59 had low-birth-weight babies and 130 had normal-weight babiesRisk factors: weight in pounds at the last menstrual period (LWT), presence of hypertension (HT),smoking status during pregnancy (Smoke), presence of uterine irritability (UI). The woman’s age (Age),used as the matching variable.

Page 25: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Data Input

The variable Low is used to determine whether the subject is a case (Low=1, low-birth-weight baby) or a control (Low=0, normal-weight baby). The dummy time variable Time takes the value 1 for cases and 2 for controls

Page 26: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Conditional logistic regression Result

The variable Time is the response, and Low is the censoring variable. The matching variable Age is used in the STRATA statement so that each unique age value defines a stratum.

Page 27: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

27

Page 28: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

Results indicate that women were more likely to have low-birth-weight babies if they were underweight in the last menstrual cycle, were hypertensive, smoked during pregnancy, or suffered uterine irritability

Page 29: Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables

THANKS FOR YOUR ATTENTION!