Accurate Prediction of Rare Eventsprema.mf.uni-lj.si/files/Angelika_654.pdfAngelika Geroldinger,...

Accurate Prediction of Rare Events with Firth’s Penalized Likelihood

Approach

Angelika Geroldinger, Daniela Dunkler, Rainer Puhr,

Rok Blagus, Lara Lusa, Georg Heinze

12th Applied Statistics

September 2015

Overview

1. Introduction: Bias, bias reduction

2. Firth’s method: Definition and properties

3. PREMA: Accurate prediction and Firth’s method

Example: Bias in logistic regression Consider a model containing only intercept, no regressors:

logit (𝑃 𝑌 = 1 ) = 𝛼.

With 𝑛 observations, 𝑘 events, the ML estimator of 𝛼 is given by:

𝛼 = logit (k/n).

Since k/n is unbiased,

𝛼 is biased!

(If 𝛼 was unbiased,

expit 𝛼 would be biased!)

Jensen’s inequality

Introduction

Example: Bias in logistic regression Consider a model containing only intercept, no regressors:

logit (𝑃 𝑌 = 1 ) = 𝛼.

With 𝑛 observations, 𝑘 events, the ML estimator of 𝛼 is given by:

𝛼 = logit (k/n).

Since k/n is unbiased,

𝛼 is biased!

(If 𝛼 was unbiased,

expit 𝛼 would be biased!)

Jensen’s inequality

Introduction

bias away from 0

Bias and consistency The bias of an estimator 𝜃 for a true value 𝜃 is defined as

bias 𝜃 = E 𝜃 − 𝜃.

Example: Let 𝑌1, … 𝑌𝑛 be i.i. normally distributed. Then, 𝑌𝑛 is an unbiased estimator for the mean.

An estimator 𝜃 is called consistent if it converges in probability to 𝜃.

Introduction

Bias reduction

For ML-estimates in regular models one can show that

bias 𝜃 =𝑏1 𝜃

𝑏2 𝜃

𝑛2+ … .

Some approaches to a bias-reduced estimate 𝜃 𝑏𝑐:

• jackknife,

• bootstrap,

• explicitly determine the function 𝑏1and set

𝜃 𝑏𝑐 = 𝜃 − 𝑏1(𝜃 ),

• Firth type penalization.

bias-corrective

bias-preventive

Introduction

Firth type penalization

In exponential family models with canonical parametrization the Firth-type penalized likelihood is given by

𝐿∗ 𝜃 = 𝐿 𝜃 det( 𝐼 𝜃 )1/2,

where 𝐼 𝜃 is the Fisher information matrix.

This removes the first-order bias of the ML-estimates.

Software:

• logistic regression: R (logistf, brglm, pmlr), SAS, Stata…

• Cox regression: R (coxphf), SAS…

Firth’s method

In exponential family models with canonical parametrization the Firth-type penalized likelihood is given by

𝐿∗ 𝜃 = 𝐿 𝜃 det( 𝐼 𝜃 )1/2,

where 𝐼 𝜃 is the Fisher information matrix.

This removes the first-order bias of the ML-estimates.

Software:

• logistic regression: R (logistf, brglm, pmlr), SAS, Stata…

• Cox regression: R (coxphf), SAS…

Firth’s method

Jeffreys invariant prior

We are interested in logistic regression:

Here the penalized likelihood is given by 𝐿 𝜃 det(𝑋𝑡𝑊𝑋)1/2 with

𝑊 = diag expit Xi𝜃 (1 − expit Xi𝜃 ) .

• 𝑊 is maximised at 𝜃 = 0, i.e. the ML estimates are shrunken towards zero,

• for a 2 × 2 table (logistic regression with one binary regressor), the Firth’s bias correction amounts to adding 1/2 to each cell.

Firth’s method

Example: 2 × 2 table

Two groups with event probabilities 0.9 and 0.1. ML-Estimates:

Firth’s method

0 0.1 0.9

1 0.9 0.1

Two groups with event probabilities 0.9 and 0.1. ML-Estimates and Firth-Estimates:

Firth’s method

0 0.1 0.9

1 0.9 0.1

Two groups with event probabilities 0.9 and 0.1. ML-Estimates:

Bias towards zero??

Firth’s method

0 0.1 0.9

1 0.9 0.1

Separation

(Complete) separation: a combination of the explanatory variables (nearly) perfectly predicts the outcome

– frequently encountered with small samples,

– “monotone likelihood”,

– some of the ML-estimates are infinite,

– but Firth estimates do exist!

Example: A B

0 0 10

1 10 0

1 10 3

complete separation quasi-complete separation

Firth’s method

Separation

(Complete) separation: a combination of the explanatory variables (nearly) perfectly predicts the outcome

– frequently encountered with small samples,

– “monotone likelihood”,

– some of the ML-estimates are infinite,

– but Firth estimates do exist!

Example: complete separation

0 0 10

1 10 0 A B

0 0.5 10.5

1 10.5 0.5

quasi-complete separation

1 10 3 A B

0 0.5 7.5

1 10.5 .5

Firth’s method

We are interested in

accurate prediction of rare events

in particular in the presence of high-dimensional data.

What can we expect from Firth’s method?

Accurate prediction

Recall the example: A B

0 0.1 0.9

1 0.9 0.1

Now we take a closer look at n=30…

Accurate prediction

Firth’s method aims at removing the bias of the coefficients.

Accurate prediction

Firth’s method aims at removing the bias of the coefficients. Though, this results in biased event probabilities.

Accurate prediction

Firth’s method aims at removing the bias of the coefficients. Though, this results in biased event probabilities.

bias towards 0.5

Accurate prediction PREMA

Approaches for unbiased event-probabilities: • Puhr R and Heinze G: adjust the intercept, such that the mean

predicted probability is equal to the proportion of events • Elgmati E et al.: weaken the Firth type penalty (replace 0.5 by

factor < 0.5), for instance 𝜏 = 0.1

Accurate prediction PREMA

With rare events: • group A: 45% events, N=15 • group B: 5% events, N=45

→ 15% events in total, separation in ~10% of scenarios

0 0.55 0.95

1 0.45 0.05

N=15 N=45

High dimensional data

If 𝑛 ≪ 𝑝 then, in general, the sample outcome can be perfectly predicted.

Just another case of complete separation?

Unfortunately, Firth estimates are not unique for 𝒏 < 𝒑.

(For the same reason why ML estimates are not unique.)

However, ridge and LASSO estimates give “reasonable” results for 𝑛 < 𝑝.

Combine ridge or LASSO with Firth?

Combination of Firth’s method and ridge

Shen and Gao (2008), for 𝑛 > 𝑝 :

𝑙∗ 𝜃 = 𝑙 𝜃 +1

2log(det( 𝐼 𝜃 )) −𝜆 𝜃 2

Motivation: to deal with multicollinearity AND separation

Conclusion: reduces MSE but introduces bias of coefficients

Firth penalty ridge penalty

Conclusions

• Other modifications of Firth’s penalty favouring accurate prediction of event probabilities?

• Performance of these modifications in combination with weighting, tuning? In the situation of rare events?

• Combination of Firth’s and ridge for high-dimensional data?

• Combination of Firth’s method and LASSO? In high-dimensional data?

• Tuning Firth’s penalty?

QUESTIONS

Literature • Elgmati E, Fiaccone RL, Henderson R and Matthews JNS. Penalised logistic

regression and dynamic prediction for discrete-time recurrent event data. Lifetime Data Analysis 2015; doi: 10.1007/s10985-015-9321-4.

• Firth D. Bias reduction of maximum likelihood estimates. Biometrika 1993; 80(1): 27-38.

• Heinze G and Schemper M. A solution to the problem of separation in logistic regression. Statistics in Medicine 2002; 21(16): 2409-2419.

• Puhr R and Heinze G. Predicting rare events with penalized logistic regression. Work in progress.

• Shen J and Gao S. A solution to separation and multicollinearity in multiple logistic regression. Journal of Data Science 2008; 6(4): 515-531.

Accurate Prediction of Rare Eventsprema.mf.uni-lj.si/files/Angelika_654.pdfAngelika Geroldinger,...

Documents

Saša Blagus GRAĐANIN BLAGUS - SNV · odnosno komunizam već godinama zlonamjerno, bez ikakvog uporišta u 1 Gajo Petrović, »Čemu Praxis«, u: Praxis , Džepno izdanje 10 –

Pediatric Urea Hydrolysis Rate Calculation pUHR-CA) Version 2 UHR-CA_UG v2.0.pdf · Barcode Up to 15 digits Collection Date Select from pop up calendar Analysis Date Select from pop

4 Koraka Za Uspjesan Biznis - Goran Blagus

Introduction · THE CATENARY DEGREE OF KRULL MONOIDS I ALFRED GEROLDINGER, DAVID J. GRYNKIEWICZ, AND WOLFGANG A. SCHMID Abstract. Let H be a Krull monoid with nite class group G such

Great Rivers Greenway Making St. Louis a Better Place to …greatriversgreenway.org/wp-content/uploads/2015/05/Executive... · Greg Poleski Kathleen Puhr ... social, environmental,

Arithmetic of commutative semigroups › geroldinger › 99-arithmetic-of...ARITHMETIC OF COMMUTATIVE SEMIGROUPS WITH A FOCUS ON SEMIGROUPS OF IDEALS AND MODULES YUSHUANGFAN,ALFREDGEROLDINGER,FLORIANKAINRATH,ANDSALVATORETRINGALI

bpawest-smartschools.enschool.org · a liquid to a gas is called vaporization (vay puhr ih ZAY shun). Vaporization occurs when the particles in a liquid gain enough energy to move

7 koraka kako organizirati dan i biti ... - produktivnost.netproduktivnost.net/wp-content/uploads/2011/04/E-book-7-dnevnih... · Goran Blagus business productivity coach, poslovni

TERMOGRAVIMETRIJSKA ANALIZA I ODREÐIVANJE · PDF fileTERMOGRAVIMETRIJSKA ANALIZA I ODREÐIVANJE PH TLA S OTOKA RABA T. BALIÆ A. BLAGUS ... osim H+ iona i ioni slabih lužina koji

Comprehensive Assessment An Austrian Supervisors’ Perspective*) Claus Puhr Vienna, 11 November 2014 Supervision Policy, Regulation and Strategy Division

System Center 2016 preview - novosti i novi featurei Matija Blagus, Acceleratio d.o.o. @matijablagus

Web viewAccording to Greek myth, ... (PUHR • guh •muhm), Egypt, and the ... This number is used to measure the area of circles and is usually represented by the symbol

Next Generation System-Wide Liquidity Stress Testing Generation System-Wide Liquidity Stress Testing1 Prepared by Christian Schmieder, Heiko Hesse, Benjamin Neudorfer, Claus Puhr,

Arithmetic of seminormal weakly Krull monoids and domainsimns2010/2014/Slides/IMNS_2014_Geroldinger.pdf · Arithmetic of seminormal weakly Krull monoids and domains A :Geroldinger

qinghai-zhong.weebly.com · arXiv:1407.0548v1 [math.AC] 2 Jul 2014 THE CATENARY DEGREE OF KRULL MONOIDS II ALFRED GEROLDINGER AND QINGHAI ZHONG Abstract. Let H be a Krull monoid with

Author's personal copy - uni-graz.at · Author's personal copy A. Geroldinger, F. Kainrath / Journal of Pure and Applied Algebra 214 (2010) 2199 2218 2201 is called the factorization

Monika Bihar, Martina Blagus, Predrag Brezni · 2 . UVODNA RIJEČ Ovo je prvi školski list Osnovne škole Hodošan. Dali smo sve od sebe da uspije u izboru tekstova i rješenja pojedinih

STRATEGIES FOR SPEAKING, READING, AND WRITING Charles Knoppel, Jesse Puhr, Carolyn Klein

SharePoint on demand with System Center - Matija Blagus

Author's personal copy - uni-graz.atAuthor's personal copy 1510 A. Geroldinger, W. Hassler / Journal of Pure and Applied Algebra 212 (2008) 1509 1524 an arithmetical invariant such