33
Gerhard-Wilhelm Weber * Karga Yıldırak and Efsun Kürüm Institute of Applied Mathematics, Middle East Technical University, Ankara, Turkey Faculty of Economics, Management and Law, University of Siegen, Germany Center for Research on Optimization and Control, University of Aveiro, Portugal Universiti Teknologi Malaysia, Skudai, Malaysia A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve 5th International Summer School Achievements and Applications of Contemporary Informatics, Mathematics and Physics National University of Technology of the Ukraine Kiev, Ukraine, August 3-15, 2010

A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

  • Upload
    ssa-kpi

  • View
    1.013

  • Download
    7

Embed Size (px)

DESCRIPTION

AACIMP 2010 Summer School lecture by Gerhard Wilhelm Weber. "Applied Mathematics" stream. "Modern Operational Research and Its Mathematical Methods with a Focus on Financial Mathematics" course. Part 7.More info at http://summerschool.ssa.org.ua

Citation preview

Page 1: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

Gerhard-Wilhelm Weber *

Kasırga Yıldırak and Efsun Kürüm

Institute of Applied Mathematics, Middle East Technical University, Ankara, Turkey

• Faculty of Economics, Management and Law, University of Siegen, Germany

Center for Research on Optimization and Control, University of Aveiro, Portugal

Universiti Teknologi Malaysia, Skudai, Malaysia

A Classification Problem of Credit Risk Rating

Investigated and Solved by

Optimization of the ROC Curve

5th International Summer School

Achievements and Applications of Contemporary Informatics,

Mathematics and Physics

National University of Technology of the Ukraine

Kiev, Ukraine, August 3-15, 2010

Page 2: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

• Main Problem from Credit Default

• Logistic Regression and Performance Evaluation

• Cut-Off Values and Thresholds

• Classification and Optimization

• Nonlinear Regression

• Numerical Results

• Outlook and Conclusion

Outline

Page 3: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

Whether a credit application should be consented or rejected.

Solution

Learning about the default probability of the applicant.

Main Problem from Credit Default

Page 4: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

Whether a credit application should be consented or rejected.

Solution

Learning about the default probability of the applicant.

Main Problem from Credit Default

Page 5: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

0 1 1 2

( 1 )log

( 0 )l lp2 l p

P Y X xβ β x β x β x

P Y X x

l

l

Logistic Regression

( 1,2,..., )l N

Page 6: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

Goal

We have two problems to solve here:

To distinguish the defaults from non-defaults.

To put non-default firms in an order based on their credit quality

and classify them into (sub) classes.

Our study is based on one of the Basel II criteria which

recommend that the bank should divide corporate firms by

8 rating degrees with one of them being the default class.

Page 7: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

Data

Data have been collected by a bank from the firms operating in the

manufacturing sector in Turkey.

They cover the period between 2001 and 2006.

There are 54 qualitative variables and 36 quantitative variables originally.

Data on quantitative variables are formed based on a balance sheet

submitted by the firms’ accountants.

Essentially, they are the well-known financial ratios.

The data set covers 3150 firms from which 92 are in the state of default.

As the number of default is small, in order to overcome the possible

statistical problems, we downsize the number to 551,

keeping all the default cases in the set.

Page 8: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

non-default

casesdefault

cases

test result value

TP

F, se

nsitiv

ity

FPF, 1-specificity

ROC curve

cut-off value

We evaluate performance of the model

Page 9: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

True Positive

Fraction

TPF

False Positive

Fraction

FPF

False Negative

Fraction

FNF

True Negative

Fraction

TNF

model outcome

d n

truth

d ı

total

1 1

n ı

Model outcome versus truth

Page 10: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

Definitions

• sensitivity (TPF) := P( Dı | D)

• specificity := P( NDı | ND )

• 1-specificity (FPF) := P( Dı | ND )

• points (TPF, FPF) constitute the ROC curve

• c := cut-off value

• c takes values between - and

• TPF(c) := P( z>c | D )

• FPF(c) := P( z>c | ND )

Page 11: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

: n s

s

μ - μ

σa

)Φ( ic

Φ( )ia b cTPF ( ) :ic

: n

s

σ

normal-deviate axes

FPF( ) :ic

TPF

FPFNormal Deviate (TPF)

Normal Deviate (FPF)

Page 12: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

: n s

s

μ - μ

σa

)Φ( ic

Φ( )ia b cTPF ( ) :ic

: n

s

σ

normal-deviate axes

FPF( ) :ic

TPF

FPFNormal Deviate (TPF)

Normal Deviate (FPF)

t

c

Page 13: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

actually non-default

casesactually default

cases

class I class II class III class IV class V

Ex.: cut-off values

To assess discriminative power of such a model,

we calculate the Area Under (ROC) Curve:

: Φ( ) Φ ( ).AUC c d ca b

Classification

c

Page 14: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

1Φ( ) Φ ( )c t c t

relationship between thresholds and cut-off values

FPF

TPF

t1 t2 t3 t4 t5t0 R=5

Ex.:

Page 15: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

maximize AUC,

Problem:

Optimization in Credit Default

Simultaneously to obtain the thresholds and the parameters a and bthat

while balancing the size of the classes (regularization)

guaranteeing a good accuracy.and

Page 16: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

subject to 1)0,1,...,( Ri

1 02 -1 0, 1: ( )R R

Tt , t ,..., t t tτ

Optimization Problem

11Φ( Φ ( ))

i

i

i

t

t

a b t d t δ

1 11

100

2

max-

Φ( Φ ( )) ( )R

ii i

ia,b,n

t ta b t dt

Page 17: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

1

01

1

Φ( Φ ( )) i

i

i

t

t

i i

a b t d t δ

t t

subject to

1 02 -1 0, 1: ( )R R

Tt , t ,..., t t tτ

Optimization Problem

1)0,1,...,( Ri

1 11

100

2

max-

Φ( Φ ( )) ( )R

ii i

ia,b,n

t ta b t dt

Page 18: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

0

11: (1 Φ( Φ ( ))) AOC a b t dt

FPF

TPF

t1 t2 t3 t4 t5

AUC

1-AUC

Over the ROC Curve

t0

Page 19: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

1

2 1

0

211

10

( ) (1 Φ( ( ))) mina, b,

Ri

i iτ i

α t t α a b t dtn

1

11(1 Φ( ( )))

tj

j j j

tj

a b t dt t t δ

subject to

( 0,1, ..., 1)j R

New Version of the Optimization Problem

Page 20: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

Simultaneously to obtain the thresholds and the parameters a and b

that maximize AUC,

while balancing the size of the classes (regularization)

and guaranteeing a good accuracy

discretization of integral

nonlinear regression problem

Optimization problem:

Regression in Credit Default

Page 21: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

Discretization of the Integral

R

kkk t tba

1

1 Δ))(ΦΦ(AUC

Riemann-Stieltjes integral

Φ( ) Φ( )a b c d cAUC

Riemann integral

Discretization

1

1

0

Φ( Φ ( )) a b t dtAUC

Page 22: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

Optimization Problem with Penalty Parameters

1

0

2

11

( ) : (1- Φ( ( )))2 1 10

( ) Θ-

Ri

Π a,b, a b t dti ii

τ t tn

11

0

13

: ( , , )

Φ( ( ))) j

j

j

tR-

tj

j a b

δ a b t dt

1 2 1: ( , ,..., )TRΘ θ θ θ 0jθ ( 0,1, ..., 1)j R

In the case of violation of anyone of these constraints, we introduce penalty

parameters. As some penalty becomes increased, the iterates are forced

towards the feasible set of the optimization problem.

Page 23: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

2

1

2

1

1

12

( ) 10

( ) ( (1-Φ( ( ))) Δ )R

j j

j

Ri

Θ i ii

Π a,b, α t t α a b t tn

1

1

00

2

1( ( ) ) Δ

Φ j

j

j j

R-

jνj

n

j j

δ νa b ηt t

Optimization Problem further discretized

.3

Page 24: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

2

1

2

1

1

12

( ) 10

( ) ( (1-Φ( ( ))) Δ )R

j j

j

Ri

Θ i ii

Π a,b, α t t α a b t tn

1

1

00

2

1( ( ) ) Δ

Φ j

j

j j

R-

jνj

n

j j

δ νa b ηt t

Optimization Problem further discretized

.3

Page 25: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

min ( ) ( ) ( )

Tf F F

1( ) : ( ),..., ( )T

NF f f

2

,

1

2

1

min

:

N

j j

j

N

j

j

f d g x

f

Nonlinear Regression

Page 26: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

• Gauss-Newton method :

• Levenberg-Marquardt method :

( ) ( ) ( ) ( )T qF F F F

( ) ( ) I ( ) ( )T

p qF F F F

0

1 :k k kq

Nonlinear Regression

Page 27: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

,

2

2

min ,

subject to ( ) ( ) I ( ) ( ) , 0,

|| ||

t

T

p

qt

F F F F

qL

q t t

M

alternative solution

conic quadratic programming

Nonlinear Regression

Page 28: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

,

2

2

min ,

subject to ( ) ( ) I ( ) ( ) , 0,

|| ||

t

T

p

qt

F F F F

qL

q t t

M

Nonlinear Regression

interior point methods

alternative solution

conic quadratic programming

Page 29: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

Numerical Results

Initial Parameters

a b Threshold values (t)

1 0.95 0.0006 0.0015 0.0035 0.01 0.035 0.11 0.35

1.5 0.85 0.0006 0.0015 0.0035 0.01 0.035 0.11 0.35

0.80 0.95 0.0006 0.0015 0.0035 0.01 0.035 0.11 0.35

2 0.70 0.0006 0.0015 0.0035 0.01 0.035 0.11 0.35

Optimization Results

a b Threshold values (t) AUC

0.9999 0.9501 0.0004 0.0020 0.0032 0.012 0.03537 0.09 0.3400 0.8447

1.4999 0.8501 0.0003 0.0017 0.0036 0.011 0.03537 0.10 0.3500 0.9167

0.7999 0.9501 0.0004 0.0018 0.0032 0.011 0.03400 0.10 0.3300 0.8138

2.0001 0.7001 0.0004 0.0020 0.0031 0.012 0.03343 0.11 0.3400 0.9671

Page 30: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

Numerical Results

Accuracy Error in Each Class

I II III IV V VI VII VIII

0.0000 0.0000 0.0000 0.0001 0.0001 0.0010 0.0010 0.0075

0.0000 0.0000 0.0000 0.0001 0.0001 0.0010 0.0018 0.0094

0.0000 0.0000 0.0000 0.0000 0.0001 0.0002 0.0018 0.0059

0.0000 0.0000 0.0000 0.0001 0.0001 0.0006 0.0018 0.0075

Number of Firms in Each Class

I II III IV V VI VII VIII

4 56 27 133 115 102 129 61

2 42 52 120 119 111 120 61

4 43 40 129 114 116 120 61

4 56 24 136 106 129 111 61

Number of firms in each class at the beginning: 10, 26, 58, 106, 134, 121, 111, 61

Page 31: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

Generalized Additive Models

Page 32: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

Aster, A., Borchers, B., and Thurber, C., Parameter Estimation and Inverse Problems, Academic Press, 2004.

Boyd, S., and Vandenberghe, L., Convex Optimization, Cambridge University Press, 2004.

Buja, A., Hastie, T., and Tibshirani, R., Linear smoothers and additive models, The Ann. Stat. 17, 2 (1989)

453-510.

Fox, J., Nonparametric regression, Appendix to an R and S-Plus Companion to Applied Regression,

Sage Publications, 2002.

Friedman, J.H., Multivariate adaptive regression splines, Annals of Statistics 19, 1 (1991) 1-141.

Friedman, J.H., and Stuetzle, W., Projection pursuit regression, J. Amer. Statist Assoc. 76 (1981) 817-823.

Hastie, T., and Tibshirani, R., Generalized additive models, Statist. Science 1, 3 (1986) 297-310.

Hastie, T., and Tibshirani, R., Generalized additive models: some applications, J. Amer. Statist. Assoc.

82, 398 (1987) 371-386.

Hastie, T., Tibshirani, R., and Friedman, J.H., The Element of Statistical Learning, Springer, 2001.

Hastie, T.J., and Tibshirani, R.J., Generalized Additive Models, New York, Chapman and Hall, 1990.

Nash, G., and Sofer, A., Linear and Nonlinear Programming, McGraw-Hill, New York, 1996.

Nemirovski, A., Lectures on modern convex optimization, Israel Institute of Technology (2002).

References

Page 33: A Classification Problem of Credit Risk Rating Investigated and Solved by Optimization of the ROC Curve

Nemirovski, A., Modern Convex Optimization, lecture notes, Israel Institute of Technology (2005).

Nesterov, Y.E , and Nemirovskii, A.S., Interior Point Methods in Convex Programming, SIAM, 1993.

Önalan, Ö., Martingale measures for NIG Lévy processes with applications to mathematical finance,

presentation in: Advanced Mathematical Methods for Finance, Side, Antalya, Turkey, April 26-29, 2006.

Taylan, P., Weber, G.-W., and Yerlikaya, F., A new approach to multivariate adaptive regression spline

by using Tikhonov regularization and continuous optimization, to appear in TOP, Selected Papers at the

Occasion of 20th EURO Mini Conference (Neringa, Lithuania, May 20-23, 2008).

Stone, C.J., Additive regression and other nonparametric models, Annals of Statistics 13, 2 (1985) 689-705.

Weber, G.-W., Taylan, P., Akteke-Öztürk, B., and Uğur, Ö., Mathematical and data mining contributions

dynamics and optimization of gene-environment networks, in the special issue Organization in Matter

from Quarks to Proteins of Electronic Journal of Theoretical Physics.

Weber, G.-W., Taylan, P., Yıldırak, K., and Görgülü, Z.K., Financial regression and organization, to appear

in the Special Issue on Optimization in Finance, of DCDIS-B (Dynamics of Continuous, Discrete and

Impulsive Systems (Series B)).

References