37
Comparing MLE, MUE and Firth Estimates for Logistic Regression Nitin R Patel, Chairman & Co-founder, Cytel Inc. Research Affiliate, MIT [email protected]

Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Comparing MLE, MUE and Firth Estimates

for Logistic Regression

Nitin R Patel, Chairman & Co-founder, Cytel Inc.

Research Affiliate, MIT

[email protected]

Page 2: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

2

Acknowledgements This presentation is based on joint work with:

•  Pralay Senchaudhuri, Cytel Inc. •  Hrishikesh Kulkarni, Cytel Inc.

JSM, Montreal, Aug 2013

Page 3: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

3

Outline

•  Separation and Maximum Likelihood Estimates •  Firth’s Method of Maximum Penalized Likelihood

Estimation •  Numerical experiments comparing MUE with FirthE

when there is separation •  Near separation and problems with MLE •  Numerical experiments comparing MLE with FirthE

when there is near separation •  Conclusions

JSM, Montreal, Aug 2013

Page 4: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

4

Maximum Likelihood Estimation •  Almost universally used method for logistic

regression models. •  ML estimates are asymptotically unbiased and have

minimum variance but not for finite samples. •  MLE’s can have serious shortcomings when applied

to datasets with the following characteristics: –  Small/moderate in size –  Unbalanced responses (Rare outcomes) –  Unequally spaced covariate values –  Many parameters relative to number of

observations.

JSM, Montreal, Aug 2013

Page 5: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

5

Separation •  Example 1

seq# x1 x2 y1 10 10 12 11 11 13 12 12 14 13 13 15 14 14 16 15 15 17 16 16 18 17 17 19 19 19 110 10 16 011 11 17 012 12 18 013 13 19 014 14 20 015 15 21 016 16 22 017 17 23 018 18 18 019 18 24 020 19 25 0

covariate plot of data

5

10

15

20

25

30

5 10 15 20

x1

x2

JSM, Montreal, Aug 2013

Page 6: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

6

MLE’s and Separation

•  When separation occurs one or more MLE’s do not exist.

•  In other words, one or more MLE’s are unbounded (and so are their standard errors).

•  This means that the maximum likelihood method fails to provide either point or interval estimates.

JSM, Montreal, Aug 2013

Page 7: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

7

A useful characterization of separation

•  Separation occurs if and only if the observed vector of sufficient statistics is on the boundary of the convex hull of the (finite) set of possible sufficient statistics vectors.

JSM, Montreal, Aug 2013

Page 8: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

8

Example 2: Simple Logistic Regression (one covariate, two parameters)

( ) 0 1logit i ixπ β β= +

( )1i iP Yπ = =

Response Yi , covariate xi for observation i Model:

∑= i iYT0 ∑= i iiYxT1

Sufficient statistics vector is (T0 , T1)

and where JSM, Montreal, Aug 2013

Page 9: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

9

Example 2: Simple Logistic Regression (contd.)

x5101520253035404550556065707580859095100

0

200

400

600

800

1000

1200

0 5 10 15 20 25

t0: sufficient stat. for beta0

t1: s

uffic

ient

sta

t. fo

r bet

a1

∑= i iYT0 ∑= i iiYxT1Sufficient statistics vector is (T0 , T1) where and

JSM, Montreal, Aug 2013

Page 10: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

10

Example 2 (contd.)

•  If we observe: yi = 0 for xi = 5, 10, 15, 20, 25, 30, 35, 40, 45 yi = 1 for xi = 50, 55, 60, 65, 70, 75, 80, 85,

90, 95, 100. •  The observed sufficient statistics vector is

(t0 = 11, t1 = 825). •  The MLE for β1 does not exist since (11,825)

is on the boundary of (T0, T1) space.

JSM, Montreal, Aug 2013

Page 11: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

11

Firth’s Penalized Likelihood Method •  The MLE is the root when the score function

(derivative of the loglikelihood) is equated to zero.

•  Firth’s method removes the O(n-1) term from the bias of the MLE by modifying the score function by subtracting a penalty function.

•  The solution obtained as the root when the score function is set to zero is Firth’s Penalized Likelihood Estimate (FirthE)

JSM, Montreal, Aug 2013

Page 12: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

12

Logistic Regression

•  The loglikelihood has the form where t is the observed sufficient statistic vector

•  The score function is therefore

•  Firth’s modified score function is

where is Fisher’s information matrix

•  Firth’s modification shrinks the MLE estimate towards zero

( ) ( )l t Kβ β β= −

( ) ( ) ( )U l t Kβ β βʹ′ ʹ′= = −

* 1 ( )( ) ( ) 1/ 2 ( )j jj

IU U trace I ββ β β

β−

⎡ ⎤∂= + ⎢ ⎥

∂⎢ ⎥⎣ ⎦( )I β

JSM, Montreal, Aug 2013

Page 13: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

13

Boundary points of Sufficient Statistics space

Boundary Points in space of sufficient stats

0

200

400

600

800

1000

1200

0 5 10 15 20 25

t0

t1

t_0 t_10 01 51 1002 152 1953 303 2854 504 3705 755 4506 1056 5257 1407 5958 1808 660

9 2259 72010 27510 77511 33011 82512 39012 87013 45513 91014 52514 94515 60015 97516 68016 100017 76517 102018 85518 103519 95019 104520 1050

There are 40 points on the boundary of the set of possible values of (t0 , t1)

Page 14: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

14

Comparison of MUE with FirthE when MLE does not exist

•  Several numerical experiments with one covariate models and a limited number with two covariate models.

•  Used exhaustive enumeration of t-vectors as well as Monte Carlo simulations with sample sizes of 1000.

•  We will illustrate with Example 2 data

JSM, Montreal, Aug 2013

Page 15: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Bias Comparison for MUE with FirthE for ED50 = 52.5

Copyright © Cytel Inc. 2005. All rights reserved.

15

Based on complete enumeration

Page 16: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

MSE Comparison for MUE with FirthE for ED50 = 52.5

Copyright © Cytel Inc. 2005. All rights reserved.

16

Based on complete enumeration

Page 17: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

17

Findings from numerical experiments •  Our experiments with several numerical experiments

with one covariate and some with two covariates suggest that both from the point of view of bias and Mean Square Error Firth’s method gives better estimates when there is complete separation.

•  Additional Advantages of Firth’s method are: –  Unlike MUE it does not depend on the conditional

distribution of the sufficient statistic, so it does not have problems associated with having few support points (e.g. with continuous covariates).

–  It is much faster to compute.

JSM, Montreal, Aug 2013

Page 18: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

18

A real dataset

•  Two hundred rats treated with a toxic at four levels of dose, binary response examined was development of an intestinal tumor.The covariates were levels of dose (as factor variables) and a binary survival variable to control for death. (Data from US Toxicology Program Tech Report 405, 1991,

LogXact manual gives details.)

•  There was separation in this dataset. •  Output from current beta version of LogXact that provides

Firth’s method as an option.

JSM, Montreal, Aug 2013

Page 19: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

19

LogXact Results

2*1-sidedModel Term Type Beta SE(Beta) Type Lower Upper P-Value%Const FirthE -3.861 2.108 Asymptotic -7.993 0.2713 0.0671dose_0 FirthE -2.873 1.937 Asymptotic -6.67 0.9241 0.1381

MUE -1.053 NA Exact -INF 1.909 0.4824dose_150 FirthE -1.24 1.438 Asymptotic -4.057 1.578 0.3886

CMLE -1.444 1.667 Exact -6.437 2.471 0.9367dose_300 FirthE -2.733 1.656 Asymptotic -5.978 0.5116 0.0988

MUE -1.677 NA Exact -INF 0.869 0.2068survival FirthE 0.09387 0.1402 Asymptotic -0.1808 0.3686 0.5030

CMLE 0.1246 0.174 Exact -0.2128 0.5058 0.5345

Point Estimate 95% Conf. Interval

JSM, Montreal, Aug 2013

Page 20: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

20

Near Separation •  MLE is unstable – small shift in data leads to huge

change in ML estimate of coefficients seq# x1 x2 y

1 10 10 12 11 11 13 12 12 14 13 13 15 14 14 16 15 15 17 16 16 18 17 17 19 19 19 110 10 16 011 11 17 012 12 18 013 13 19 014 14 20 015 15 21 016 16 22 017 17 23 018 18 k 019 18 24 020 19 25 0

covariate plot of data

5

10

15

20

25

30

5 10 15 20

x1

x2k

Example 1 JSM, Montreal, Aug 2013

Page 21: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

21

MLE and Near separation: Example 1 (contd.)

coefficients vs k

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 5 10 15 20

k

bet

a

beta1 beta2

JSM, Montreal, Aug 2013

MLE

Page 22: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Interior Points grouped into Layers by closeness to the boundary

Interior Point Layers

0

200

400

600

800

1000

1200

0 5 10 15 20

t0

t1

Layer 1 Layer 5 Layer 10 Layer 20 Layer 40 Layer 50

Copyright © Cytel Inc. 2013. All rights reserved.

22

JSM, Montreal, Aug 2013

Page 23: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is
Page 24: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

24

Bias Comparison of MLE to FirthE ED50=52.5

Based on complete enumeration

JSM, Montreal, Aug 2013

Page 25: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Bias Comparison of MLE to FirthE ED50=5

Copyright © Cytel Inc. 2013. All rights reserved.

25

JSM, Montreal, Aug 2013

Based on complete enumeration

Page 26: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Bias Comparison of MLE to FirthE ED50=100

Copyright © Cytel Inc. 2013. All rights reserved.

26

JSM, Montreal, Aug 2013

Based on complete enumeration

Page 27: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Significant Models (pval < 0.05) Bias Comparison of MLE to FirthE

ED50=52.5

Copyright © Cytel Inc. 2013. All rights reserved.

27

JSM, Montreal, Aug 2013 Based on complete enumeration

Page 28: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

MSE Comparison of MLE to FirthE ED50 = 52.5

Copyright © Cytel Inc. 2013. All rights reserved.

28

JSM, Montreal, Aug 2013

Based on complete enumeration

Page 29: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

MSE Comparison of MLE to FirthE ED50 = 5

Copyright © Cytel Inc. 2013. All rights reserved.

29

JSM, Montreal, Aug 2013

Based on complete enumeration

Page 30: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

MSE Comparison of MLE to FirthE ED50 = 100

Copyright © Cytel Inc. 2013. All rights reserved.

30

JSM, Montreal, Aug 2013

Based on complete enumeration

Page 31: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Significant Models (pval < 0.05) MSE Comparison of MLE to FirthE

ED50=52.5

Copyright © Cytel Inc. 2013. All rights reserved.

31

JSM, Montreal, Aug 2013

Based on complete enumeration

Page 32: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

32

Conclusions from Experiments

•  Our numerical experiments and simulations suggest that FirthE reduces bias as well as Mean Square Error in comparison to MLE when the maximum slope of the logistic curve is not very high.

•  However when the max slope is high the FirthE correction for bias produces excessive shrinkage and the MLE is superior.

•  In many data sets that arise in we don’t expect large changes in response for small changes in the covariate values so FirthE will be superior

•  We conjecture that this conclusion will also hold when we compare conditional MLE and conditional FirthE

JSM, Montreal, Aug 2013

Page 33: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Detecting near separation in data sets

•  We have a research project to create an index to signal near separation in data sets to alert LogXact users about the bias in MLE.

•  Please let us know if you have datasets you can share which seem to exhibit near separation

•  Experiments suggest that we can use Confidence Intervals based on the Firth Profile Likelihood to detect near separation.

•  The ratio of the Upper CI width to the Lower CI appears to have promise as an index of near separation

Copyright © Cytel Inc. 2013. All rights reserved.

JSM, Montreal, Aug 2013

33

Page 34: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Copyright © Cytel Inc. 2013. All rights reserved.

34

Example 2: Simple Logistic Regression (contd.)

x5101520253035404550556065707580859095100

0

200

400

600

800

1000

1200

0 5 10 15 20 25

t0: sufficient stat. for beta0

t1: s

uffic

ient

sta

t. fo

r bet

a1

∑= i iYT0 ∑= i iiYxT1Sufficient statistics vector is (T0 , T1) where and

JSM, Montreal, Aug 2013

Page 35: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Interior Points grouped into Layers by closeness to the boundary

Copyright © Cytel Inc. 2013. All rights reserved.

35

JSM, Montreal, Aug 2013

Ratios were calculated for each interior point

Page 36: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Ratio of Firth Profile Likelihood 95%CI widths

Ratio  =  UCIwidth/LCIwidth  

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 10 20 30 40 50 60

#  Layers  from  boundary

Ratio

Copyright © Cytel Inc. 2013. All rights reserved.

36

JSM, Montreal, Aug 2013

Fitted polynomial

Page 37: Comparing MLE, MUE and Firth Estimates for Logistic …...Firth’s Penalized Likelihood Method • The MLE is the root when the score function (derivative of the loglikelihood) is

Thank you! [email protected]

Copyright © Cytel Inc. 2013. All rights reserved.

JSM, Montreal, Aug 2013

37