32
C. Logit model, logistic regression, and log-linear model A comparison

C. Logit model, logistic regression, and log-linear model A comparison

Embed Size (px)

Citation preview

Page 1: C. Logit model, logistic regression, and log-linear model A comparison

C. Logit model, logistic regression, and log-linear model

A comparison

Page 2: C. Logit model, logistic regression, and log-linear model A comparison

R o w i , C o l u m n j S e x : A , B

u u u u ln ABij

Bj

Aiij

o r

o r

w i t h A T I M E [ e a r l y = 0 ; l a t e = 1 ] a n d B S E X [ f e m a l e = 0 ; m a l e = 1 ]

E A R L Y i s r e f e r e n c e c a t e g o r y

... ln xxx 3322110

ijjiij ln

Leaving home

Models of counts: log-linear model

Page 3: C. Logit model, logistic regression, and log-linear model A comparison

Model 1: null model

= 4.887 ij = 133.5 for all i and j (=530/4)

Model 2: + TIME

= 4.649

i = 0.4291

ln = exp[4.649 + 0.4291 t] 104.5 for ‘early’ (t=0) and 160.5 for ‘late’ (t=1)

or

ln = exp[4.649] = 104.5 for early

ln = exp[4.649 + 0.4291] = 160.5 for late

Leaving home

Page 4: C. Logit model, logistic regression, and log-linear model A comparison

M o d e l 3 : T I M E A N D S E X

= 4 . 6 9 7 ; 2 = 0 . 4 2 9 1 ; 2 = - 0 . 0 9 8 2

R e f e r e n c e c a t e g o r i e s : ‘ e a r l y ’ [ 1 = 0 ] a n d ‘ F e m a l e s ’ [ 1 = 0 ]

jiij ln

TablePredicted number of young adults leaving home by age and sex

(unsaturated log-linear model)Females Males Total

< 20 109.6 99.4 209

20 168.4 152.6 321

Total 278 252 530

Leaving home

Page 5: C. Logit model, logistic regression, and log-linear model A comparison

11 = exp[4.697] = 109.6

21 = exp[4.697 + 0.4291] = 168.4

12 = exp[4.697 - 0.0982] = 99.4

22 = exp[4.697 + 0.4291 - 0.0982] = 152.8

Model 3: Time and Sex (unsaturated log-linear model)

jiij ln

jiij exp

Leaving home

Page 6: C. Logit model, logistic regression, and log-linear model A comparison

M o d e l 4 : T I M E A N D S E X A N D T I M E * S E X i n t e r a c t i o n( S a t u r a t e d l o g - l i n e a r m o d e l

= 4 . 9 0 5 o v e r a l l e f f e c t 2 = 0 . 0 5 7 6 T I M E 2 = - 0 . 6 0 1 2 G E N D E R 2 2 = 0 . 8 2 0 1 T I M E * G E N D E R

o r

1 i = 0 f o r < 2 0x 1 i = 1 f o r 2 0

x 2 i = 0 f e m a l e sx 2 i = 1 m a l e s

x 3 i = 0 < 2 0 a n d f e m a l e sx 3 i = 0 < 2 0 a n d m a l e sx 3 i = 0 2 0 a n d f e m a l e sx 3 i = 1 2 0 a n d m a l e s

S a t u r a t e d m o d e l p r e d i c t s p e r f e c t l y

i jjii j ln

x i332 i21 i10ij ln xx

Leaving home

Page 7: C. Logit model, logistic regression, and log-linear model A comparison

M o d e l 4 : T I M E A N D S E X A N D T I M E * S E X i n t e r a c t i o n

= 4 . 9 0 5 o v e r a l l e f f e c t 2 = 0 . 0 5 7 5 7 T I M E ( 2 ) 2 = - 0 . 6 0 1 2 S E X ( 2 ) 2 2 = 0 . 8 2 0 1 T I M E ( 2 ) * S E X ( 2 )

ijjiij ln

TablePredicted number of young adults leaving home by age and sex

(saturated log-linear model)Females Males Total

< 20 135 74 209

20 143 178 321

Total 278 252 530

Leaving home

Page 8: C. Logit model, logistic regression, and log-linear model A comparison

Model 4: TIME AND SEX AND TIME*SEX interaction

11 = exp[4.905

= 135

21 = exp[4.905 + 0.0576]

= 143

12 = exp[4.905 - 0.6012]

= 74

22 = exp[4.905 + 0.0576 - 0.6012 + 0.8201]

= 178

ijjiij ln

ijjiij exp

Leaving home

Page 9: C. Logit model, logistic regression, and log-linear model A comparison

Log-linear and logit model

Page 10: C. Logit model, logistic regression, and log-linear model A comparison

Log-linear model: μ ln μμμλAB

ij

B

j

A

iij

Select one variable as a dependent variable: response variable, e.g. does voting behaviour differ by sex

Are females more likely to vote conservative than males?

Logit model: γ ln B

j

2j

1j

λλ γ

Political attitudes

Page 11: C. Logit model, logistic regression, and log-linear model A comparison

μμμμμμλλ AB

21

B

1

A

2

AB

11

B

1

A

1

21

11 μ μ ln

Males voting conservative rather than labour:

Females voting conservative rather than labour:

μμμμμμλλ AB

22

B

2

A

2

AB

12

B

2

A

1

22

12 μ μ ln

Are females more likely to vote conservative than males?

Log-odds = logit

2 - - ln μ2μμμμμλλ AB

21

A

1

AB

21

AB

11

A

2

A

1

21

11

2 - - ln μ2μμμμμλλ AB

22

A

1

AB

22

AB

12

A

2

A

1

22

12

Effect coding (1)

θγγ B

1

B

1ln

θγγ B

2

B

2ln

A = Party; B = Sex

Political attitudes

Page 12: C. Logit model, logistic regression, and log-linear model A comparison

Are women more conservative than men? Do women vote more conservative than men? The odds ratio.

γγγγθθ B

1

B

2

B

1

B

2B

1

B

2 - γ γ ln

If the odds ratio is positive, then the odds of voting conservative rather than labour is larger for women than men. In that case, women vote more conservative than men.

0* - γ ln γγγθB

1

B

2

B

1

B

1

1* - γ ln γγγθB

1

B

2

B

1

B

2

bx a p-1

pln ln logit(p) η

pp

2

1 Logit model:

with a = γB

1 γ

and b = γγB

1

B

2

Log odds of reference category (males)

Log odds ratio (odds females / odds males)

with x = 0, 1

Political attitudes

Page 13: C. Logit model, logistic regression, and log-linear model A comparison

The logit model as a regression model

Page 14: C. Logit model, logistic regression, and log-linear model A comparison

• Select a response variable proportion

• Dependent variable of logit model is the log of (odds of) being in one category rather than in another.

• Number of observations in each subpopulation (males, females) is assumed to be fixed.

• Intercept (a) = log odds of reference category

• Slope (b) = log odds ratio

Page 15: C. Logit model, logistic regression, and log-linear model A comparison

DATA SexParty Male Female TotalConservative 279 352 631Labour 335 291 626Total 614 643 1257

Logit model: descriptive statisticsCounts in terms of odds and odds ratio

Male Female TotalOdds 0.8328 1.2096 1.0080Odds ratio (ref.cat: males): 1.4524

Sex

Reference categories: Labour; Males

Party Odds Odds ratioConservative 1.2616Labour 0.8687Total 1.0472 1.4524

F11 = 279

F21 = 335 = 279 * 335/279 = 279 / 0.8328

F12 = 352 = 279 * 352/279 = 279 1.2616

F22 = 291 = 279 * 352/279 * 291/352 = 279 * 1.2616 * [1/1.2096]

Political attitudes

Page 16: C. Logit model, logistic regression, and log-linear model A comparison

DATA SexParty Male Female TotalConservative 279 352 631Labour 335 291 626Total 614 643 1257

Proportion voting conservative: SexParty Male Female Males Females Conservative 0.454 0.547 0.8328 1.2096

Are females more likely to vote conservative than males?Logit model: logit(p) = a + bX (males reference category)

v exp(v) pln(odds) (odds)

a = -0.18292 0.8328 0.454 Males = 0.833/(1+0.833)b = 0.37323 1.4524 Odds ratioa+b = 0.19031 1.2096 0.547 Females = 1.2096/(1+1.2096)

logit(p) = -0.18292 + 0.37323X (with X = 0 for males and X = 1 for females)

If number of males and number of females are known, the counts can be calculated.

Odds of voting cons. rather than labour

LOGIT MODEL

Political attitudes

Page 17: C. Logit model, logistic regression, and log-linear model A comparison

Logistic regression SPSS

Variable Param S.E. Exp(param) SEX(1) .3732 .1133 1.4524Constant -.1903 .0792

Females voting labour: 1/[1+exp[-(-0.1903)]] = 45% 291/626 (females ref.cat)Males voting labour: 1/[1+exp[-(-0.1903+0.3732)]] = 55% 335/626

Reference category: females (X = 1 for males and X = 0 for females)

Different parameter coding: X = -0.5 for males and X = 0.5 for females

Variable Param S.E. Exp(param)SEX(1) -.3732 .1133 0.6885 Constant -.0037 .0567

Females voting labour: 1/[1+exp[-(-0.0037 + 0.5*(-0.3732))]] = 45% 291/626Males voting labour: 1/[1+exp[-(-0.0037 - 0.5 * (-0.3732))]] = 55% 335/626

Political attitudes

Page 18: C. Logit model, logistic regression, and log-linear model A comparison

Observation from a binomial distribution with parameter p and index m

The logit model andthe logistic regression

Leaving parental home

Page 19: C. Logit model, logistic regression, and log-linear model A comparison

L o g i t m o d e l a n d l o g i s t i c r e g r e s s i o n

N u m b e r o f y o u n g a d u l t s l e a v i n g h o m e e a r l y : 2 0 9T o t a l n u m b e r o f y o u n g a d u l t s l e a v i n g h o m e : 5 3 0P r o b a b i l i t y o f l e a v i n g h o m e e a r l y : 2 0 9 / 5 3 0 = 0 . 3 9 4

R E F E R E N C E C A T E G O R Y : l e a v i n g h o m e l a t e ( l a t e = 0 ; e a r l y = 1 )

O D D S o f l e a v i n g h o m e e a r l y v e r s u s l a t e : 2 0 9 / ( 5 3 0 - 2 0 9 ) = 0 . 6 5 1 1L o g i t o f l e a v i n g h o m e e a r l y : l n 0 . 6 5 1 1 = - 0 . 4 2 9 1

S p e c i f y a m o d e l :

L o g i t m o d e l

0.4291- 0 .394-1

0 .394ln

p-1

pln pLogit

Leaving Home

Page 20: C. Logit model, logistic regression, and log-linear model A comparison

L o g i s t i c r e g r e s s i o n

0.394 (-0.4291)-exp1

1 p

S t a n d a r d e r r o r :

0.0889 321

1

209

1

C o n fi d e n c e i n t e r v a l : - 0 . 4 2 9 1 1 . 9 6 * 0 . 0 8 8 9 = ( - 0 . 6 0 3 , - 0 . 2 5 5 ) O N L O G I T S C A L E

a n d

0.4366) (0.3546, 549)]exp[-(-0.21

1 ,

)][-(-0.6033exp1

1

O N P R O B A B I L I T Y S C A L E

Leaving home

Page 21: C. Logit model, logistic regression, and log-linear model A comparison

Relation logit and log-linear modelThe unsaturated model

Log-linear model:

With i effect of timing and j effect of sex

Odds of leaving parental home late rather than early: females:

ln jiij

1.536 109.6

168.4

11

21

21ODDS

1.536 0-0.4291exp -exp

exp

exp 2112

11

12

11

21

21ODDS

Leaving home

Page 22: C. Logit model, logistic regression, and log-linear model A comparison

Relation logit and log-linear modelThe unsaturated model

Odds of leaving parental home late rather than early: males:

1.536 99.4

152.6

12

22

21ODDS

1.536 0-0.4291exp -exp

exp

exp 2112

21

22

12

22

21ODDS

0.0889) (s.e.result same gives modellogit ofOutput

males. and femalesfor 0.4291 Logit pp

early

late

Leaving home

Page 23: C. Logit model, logistic regression, and log-linear model A comparison

Relation logit and log-linear modelThe saturated model

Log-linear model:

With i effect of timing and j effect of sex and ij the effect of interaction between timing and sex

Odds of leaving parental home late rather than early: females (ref):

ijjiij ln

1.059 135

143

11

21

21ODDS

1.059 0) - (0 0)-(0.0576exp

) - ( ) -exp exp

exp 21112112

1111

2112

11

21

21 (ODDS

Leaving home

Page 24: C. Logit model, logistic regression, and log-linear model A comparison

Relation logit and log-linear modelThe saturated model

Odds of leaving parental home late rather than early: males:

2.405 74

178

12

22

22ODDS

males)for 1 and femalesfor 0 X(with X 0.8201 0.0573 logit(p) :modellogit

[ref]) females odds / males (odds RATIO ODDS log is 0.8201 0.0573 - 0.8775

malesfor odds log is 0.8775 2.405ln

cat) ref. (females modellogit ofeffect overall is 0.0573 1.059ln

2.405 0) -(0.8201 0)-(0.0576exp

) - ( ) -exp exp

exp 22122212

1221

2222

12

22

22 (ODDS

Leaving home

Page 25: C. Logit model, logistic regression, and log-linear model A comparison

females 278

143 0.514

0.8201)]-77exp[-(0.871

1 p

males 252

178 0.706

77)]exp[-(0.871

1 p

0.8201X - 0.8777 p-1

pln Logit(p)

Logit model:

Logistic regression: probability of leaving home late

X=0 for males

X=1 for females

Leaving home

Page 26: C. Logit model, logistic regression, and log-linear model A comparison

T a b l eN u m b e r o f y o u n g a d u l t s l e a v i n g h o m e b y a g e a n d s e x

F e m a l e s M a l e s T o t a l

< 2 0 1 3 5 7 4 2 0 9

2 0 1 4 3 1 7 8 3 2 1

T o t a l 2 7 8 2 5 2 5 3 0

D u m m y c o d i n g : r e f e r e n c e c a t e g o r y : ( i ) f e m a l e s ; ( i i ) l e a v i n g h o m e l a t e

L o g i t m o d e l : xx ii10i

i 0.8201 - 0.05757- p-1

pln pLogit

x i i s 0 f o r f e m a l e s a n d 1 f o r m a l e s

L O G I T p i s – 0 . 0 5 7 5 7 f o r f e m a l e s a n d – 0 . 0 5 7 5 7 – 0 . 8 2 0 1 = - 0 . 8 7 7 7 f o r m a l e s

O D D SF e m a l e s ( r e f e r e n c e ) : e x p [ - 0 . 0 5 7 5 7 ] = 0 . 9 4 4 0 = 1 3 5 / 1 4 3M a l e s : e x p [ - 0 . 8 7 7 7 ] = 0 . 4 1 5 7 = 7 4 / 1 7 8

O D D S R A T I OO D D S m a l e s / O D D S f e m a l e s = e x p [ - 0 . 8 2 0 1 ] = 0 . 4 4 0 4 = 0 . 4 1 5 7 / 0 . 9 4 4 0

A r e m a l e s m o r e l i k e l y t o l e a v e h o m e e a r l y t h a n f e m a l e s ?

Leaving home

Page 27: C. Logit model, logistic regression, and log-linear model A comparison

L o g i s t i c r e g r e s s i o n

0.486 (-0.05757)-exp1

1 p f

0.294 0.8201) - (-0.05757-exp1

1 p m

xx ii10i

i 0.4101 0.4676- p-1

pln pLogit

x i i s 1 f o r f e m a l e s a n d - 1 f o r m a l e s

L o g i t p i s – 0 . 4 6 7 6 + 0 . 4 1 0 1 = - 0 . 0 5 7 6 f o r f e m a l e s a n d - 0 . 4 6 7 6 + 0 . 4 1 0 1 * ( - 1 ) = - 0 . 8 7 7 7 f o r m a l e s

xx ii10

i

i 0.8201 - 0.05757- p-1

pln pLogit

Dummy coding: ref.cat: females, late

Effect coding or marginal coding: females +1; males –1

Leaving home

Page 28: C. Logit model, logistic regression, and log-linear model A comparison

The logistic regression in SPSS

Micro data and tabulated data

Page 29: C. Logit model, logistic regression, and log-linear model A comparison

SPSS: Micro-data

• Micro-data: age at leaving home in months

• Crosstabs: Number leaving home by reason (row) and sex (column)

• Create variable: Age in years• Age = TRUNC[(month-1)/12]

• Create variable: TIMING2 based on MONTH: • TIMING2 =1 (early) if month 240 & reason < 4

• TIMING2 =2 (late) if month > 240 & reason < 4

• For analysis: select cases that are NOT censored: SELECT CASES with reason < 4

Page 30: C. Logit model, logistic regression, and log-linear model A comparison

SPSS: tabulated data

• Number of observations: WEIGHT cases (in data)

• No difference between model for tabulated data and

micro-data

Page 31: C. Logit model, logistic regression, and log-linear model A comparison

The logistic regression in SPSS

SPSS: regression/logisticNote: Dependent variable: TIMING2 (p = probability of leaving home LATE)

Covariate: sex (CATEGORICAL)

Logit[p/(1-p)] = 0.8777 – 0.8201 X with males reference categoryMales coded 0; hence X is 1 for females

OUTPUT SPSS:

---------------------- Variables in the Equation -----------

Variable B S.E. Wald df Sig R Exp(B)

SEX(1) -.8201 .1831 20.0598 1 .0000 -.1594 .4404Constant .8777 .1383 40.2681 1 .0000

Leaving home

Page 32: C. Logit model, logistic regression, and log-linear model A comparison

Related models

• Poisson distribution: counts have Poisson distribution (total number not fixed)

• Poisson regression

• Log-linear model: model of count data (log of counts)

• Binomial and multinomial distributions: counts follow multinomial distribution (total number is fixed)

• Logit model: model of proportions [and odds (log of odds)]

• Logistic regression

• Log-rate model: log-linear model with OFFSET (constant term)

Parameters of these models are related