27
1 The Economic Value of Reject Inference in Credit Scoring G. Gary Chen Thomas Astebro* Department of Management Sciences University of Waterloo CANADA, N2L 3G1 *corresponding author [email protected] Tel (519) 888 4567, ext. 2521 Fax (519) 746 7252 June, 2001 Abstract We use data with complete information on both rejected and accepted bank loan applicants to estimate the value of sample bias correction using Heckman’s two-stage model with partial observability. In the credit scoring domain such correction is called reject inference. We validate the model performances with and without the correction of sample bias by various measurements. Results show that it is prohibitively costly not to control for sample selection bias due to the accept/reject decision. However, we also find that the Heckman procedure is unable to appropriately control for the selection bias. Data contained in this study were produced on site at the Carnegie-Mellon Census Research Data Center. Research results and conclusions are those of the authors and do not necessarily indicate concurrence by the Bureau of the Census or the Carnegie-Mellon Census Research Data Center. Åstebro acknowledges financial support from the Natural Sciences and Engineering Research Council of Canada and the Social Sciences and Humanities Research Council of Canada’s joint program in Management of Technological Change as well as support from the Canadian Imperial Bank of Commerce.

The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

1

The Economic Value of Reject Inferencein Credit Scoring†

G. Gary ChenThomas Astebro*

Department of Management SciencesUniversity of WaterlooCANADA, N2L 3G1

*corresponding [email protected] (519) 888 4567, ext. 2521

Fax (519) 746 7252

June, 2001

Abstract

We use data with complete information on both rejected and accepted bank loan applicantsto estimate the value of sample bias correction using Heckman’s two-stage model withpartial observability. In the credit scoring domain such correction is called reject inference.We validate the model performances with and without the correction of sample bias byvarious measurements. Results show that it is prohibitively costly not to control for sampleselection bias due to the accept/reject decision. However, we also find that the Heckmanprocedure is unable to appropriately control for the selection bias.

† Data contained in this study were produced on site at the Carnegie-Mellon Census Research DataCenter. Research results and conclusions are those of the authors and do not necessarily indicateconcurrence by the Bureau of the Census or the Carnegie-Mellon Census Research Data Center. Åstebroacknowledges financial support from the Natural Sciences and Engineering Research Council of Canadaand the Social Sciences and Humanities Research Council of Canada’s joint program in Management ofTechnological Change as well as support from the Canadian Imperial Bank of Commerce.

Page 2: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 1 -

1

1. IntroductionInference for non-randomly selected samples is of crucial importance in all data

analysis. Non-random sample selection appears with certainty in the area of credit

scoring where complete data is typically only available for those that have gone through a

screening process and have been accepted. It is of great importance to correct for the bias

that sample selection may cause in credit scoring applications as a model developed on a

non-random (screened) sample is likely to be inappropriate for selecting credit applicants

(Jacobson and Roszbach 1999). Comprising a number of statistical/mathematical

techniques, reject inference can be used to infer performance on cases that have been

rejected.

A credit scoring model that is used to make accept/reject decisions will gradually

deteriorate over time, and will eventually have to be replaced (Hand 1998). If the credit

scoring model (called a scorecard) is not updated to reflect population shift and variable

effect changes, the original scorecard will lose its predictive power. If, on the other hand,

only data on accepted applicants are used to update the model, sample selection bias will

put into question the validity of the new model. Reject inference might be the answer to

this dilemma.

Research has shown that constructing scorecards based only on accepted

applicants is likely to lead to inaccuracies when the scorecards are applied to the entire

population of applicants (e.g. Hand 1998, Greene 1998). Researchers have therefore

developed statistical methodologies to test the degree to which the sample selection bias

affects the accuracy of the model (e.g. Copas and Li 1997, Vella 1992). Researchers are

continuing to improve reject inference techniques (e.g. Copas and Li 1997, Greene 1998,

Feedler 1999).

The benefit of reject inference depends on the data sampling and population

distributions, and the degree to which the underlying statistical assumptions for reject

inference are satisfied. For example, on some portfolios such as mortgages not many

applications are rejected. Reject inference may then be unimportant since the sub-

population of rejected applications is small in comparison with the whole population and

the bias due to missing data from those rejected may be ignorable. However, with higher

risk portfolios such as loans for small businesses the reject rate may be in excess of 50%,

Page 3: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 2 -

2

and bias due to screening is typically not ignorable. However, it has not yet been

established under which conditions that systematic screening is non-ignorable for

parameter estimation. General principles are difficult to establish as the bias is data

dependent.

Some statisticians argue that reject inference can solve the non-random sample

selection problem (e.g. Copas and Li 1997, Joanes 1993/4, Donald 1995, Greene 1998).

Reject inference techniques have already been widely implemented by developers of

credit scorecards and a general statistical toolkit for reject inference was recently

introduced by SAS Institute. However, Hand and Henley (1993/4) show that the methods

typically employed in industry are problematic as they typically rest on very tenuous

assumptions. They argue that reliable reject inference is impossible and that the only

robust approach to reject inference is to accept a sample of rejected applications and

observe their behavior as well as their credit outcomes.

Should one trust certain statisticians and use reject inference methods, the

question arises how to validate the method and evaluate its potential improvements. We

have seen little relevant empirical research on this problem as most data-sets on which

reject inference methods have been tested are not complete or are simulated (e.g. Donald

1995, Feelders 1999, Manning et al. 1987).

In this paper we are able to explore the reliability and prediction accuracy gains

from reject inference in credit scoring by using a data set that contains complete

information on both rejected and accepted bank loan applicants. Our approach is based on

Heckman (1979), Copas and Li (1997) and Greene (1998). These studies show that it is

beneficial to use reject inference, even though the estimated models may be unstable. We

compare the prediction accuracy of a model developed on a complete sample with that of

a model developed from data on accepted, and with that of a model developed on data on

accepted but adjusted for sampling bias. We perform various tests for sample selection

bias, reliability and usability.

The remainder of this paper is organized as follows. Section two reviews reject

inference for credit scoring from the angle of sampling distributions. In section three we

describe the methodology, data and report some descriptive statistics. Section four

presents results. Section five contains sensitivity analysis and section six concludes.

Page 4: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 3 -

3

2. Review of Reject Inference TechniquesSample selection bias is usually caused by a difference between the sampling

distribution and the population distribution.1 Reject inference is a general methodology to

solve the problem when one observes outcomes only for a selected sample. Specifically,

this problem appears in credit scoring (and many other business applications, e.g.,

marketing) where the performance is observed of accepted applicants for bank loans but

not of the population distribution (as well as the distribution of rejected applicants). If the

distribution of accepted is different from rejected, which is likely since a systematic

rejection rule is applied, inference about the population is necessary. Reject inference

techniques can be grouped into three classes. The first class is that of the ideal technique

when the sample is representative of the whole population. In the second class, although

the sample is drawn only from accepted applicants, it is assumed that the distribution

pattern in the accepted region can be extended to that of rejected region through either

observation or assumption. Thus, the information of rejected applicants can be integrated

into modeling by some known function from information of accepted applicants. For the

third class, the sample is drawn from the sub-population of accepted applicants, and it is

assumed that the distribution of the accepted applicant population is different from that of

the rejected applicant population. In such a case statistical inference for rejected cases

directly from the accepted sub-population is unreliable.

Reject inference techniques grouped into the first class are ideal but expensive to

implement. These techniques are robust since the sampling data are representative of the

true population. One straightforward method is to accept all applicants, which is fairly

self-explanatory. Over time the bad accounts profile will become clear, and then these

bad accounts can be rejected. For higher risk portfolios this is a very expensive strategy.

Sometimes a random acceptance technique can be adopted to partially avoid this high

cost. A small proportion of the normally rejected cases is accepted in order to observe

their behavior. Over time the outcomes of these observations will be identified, and the

stratified random sample may be used to construct an unbiased prediction model for the

1 For detailed reviews of reject inference methodologies see Hand and Henley (1993/4), and Hand

(1998).

Page 5: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 4 -

4

whole population. This technique is commercially sensible if the loss due to the increased

number of delinquent accounts is compensated by the increased accuracy in classification

(Hand and Henley, 1998). Another way to reduce the cost of this technique is to obtain

information on rejected applicants from other credit suppliers who did grant them credit.2

For the second class, some reject inference implementations directly assume that

the proportion of goods is the same for the rejects as for the accepted.3 For example, the

augmentation method described by Hsia (1978) suffers from this problematic assumption.

It is unlikely that the original scorecard has insignificant power to separate good and bad

risks by the accept/reject decisions. If it could not separate good from bad accounts it

would not be used.4 In order to avoid this unrealistic assumption, Hand (1998) concludes

that only when the set of characteristics used in the original scorecard are used in the

proposed new scorecard, reject inference by extrapolation from the accepted patterns over

the reject patterns may be valid. However, extrapolation methods are still based on the

untestable assumption that the form of the model where it can be observed also extends

over the unobserved region. It is not known how much bias is reduced and prediction

power increased by using these techniques. It is typically not possible to test what the

gains are since the techniques build on an assumption of homogeneity across accepted

and rejected which is not testable.

One example of extrapolation methods is called parcelling. It assumes that the

distribution of good and bad in the rejected region shifts proportionally along the

observed distribution in the accepted region. The shift distance for good/bad odds can be

estimated or assumed by model analysts based on experience. After that the rejects can be

included with the known goods and bads, and the regression procedure can be re-run to

infer a new score/odds relationship. Theoretically, if one assumes that distributions of

characteristics on both good and bad regions belong to a particular family of

distributions, then one can estimate the parameters using both the classified cases (the

accepts) and the unclassified cases (the rejects) using the EM algorithm. Based on this

2 Such information sharing is unlikely both for competitive and privacy reasons. Recent regulation in

Canada, Californian and Vermont prohibits such information sharing.3 A “good’ account is typically defined as one in good standing. A bad account is one that is past

due (by some date), bankrupt, or written off. The definition varies.4 Interested readers can refer to Hsia (1978) and Hand and Henley (1993/4) for a more detailed

discussion.

Page 6: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 5 -

5

argument Hand and Henley (1993/4) suggest that the classical linear discriminant

analysis can be applied for reject inference. However, the gain of this method is

questionable in a credit scoring context because the common covariance assumption for

discriminant analysis is not likely to hold since the reject decision which parses

applicants into accepted and rejected is correlated with the underlying characteristics of

the applicants.

It is widely understood that the distributions over the accepted and rejected

regions are different. Reject inference techniques that accepts this premise can be

summarized into a third class. These methods need further theoretical and practical

validation. Hand and Henley (1993/4) proposed three methods to use supplementary

information (they call it a “calibration sample”) for reject inference. Feelders (1999)

proposed a new reject inference method based on mixture modeling. He describes that

reject inference can be treated as a missing data imputation problem using the EM

algorithm. He assumes that some vector of variables X is completely observed for both

accepted and rejected applicants, and that the vector of variables Y is observed only for

accepted applicants. In credit scoring we can think of X to include behavior variables and

the score, and where Y includes risk characteristics that are only available for accepted

applicants such as default risk and repayment behavior. Using the assumption of missing

at random (MAR) Feedlers proposed two approaches to reject inference. However, MAR

implies that the distribution for the vector of variable X of accepted applicants is identical

to that of rejected applicants. It also implies that the variables in X are not correlated with

variables in Y. This assumption is quite unrealistic since it implies that the repayment

behavior is not related to behavior characteristics as well as the original score.

Some reject inference methods specific to the application of logistic regression in

credit scoring have been proposed. Joanes (1993/4) derived posterior probabilities

adjusted to reflect prior probabilities of assignment to each group (good vs. bad) and the

differential costs of misclassification. A reject inference procedure based on iterative

reclassification is adapted to this framework that produces a modified set of parameter

estimates reflecting the fractional allocation of the rejects. However, his method is based

on the augmentation procedure that requires the assumption of the same proportion of

good and bad for both accepted and rejected regions, though the iteration may relax this

Page 7: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 6 -

6

restriction to some degree. It is still unknown what the relative advantage of this bias

adjustment is as it is heretoforth based on an untested assumption.

Heckman’s (1979) two-stage bivariate probit model has also been proposed for

reject inference. This model does not assume that the samples for the accepted and

rejected regions are similar. Technically, the loan granting decision and the default model

can be described as a two-stage model with partial observability, discussed by Poirier

(1980) and formalized by Van de Ven and Van Pragg (1981). Maddala (1983) presents an

excellent overview. Meng and Schmidt (1985) discuss the cost of partial observability in

this model. Copas and Li (1997) conducted further analysis on inference for non-random

samples by extending this technique. Other researchers (e.g. Boyes et al. 1989, Greene

1998, Jacobson and Roszbach 1999) have applied this model. These studies show that

there a significant sample selection bias due to the loan granting decision. However, the

applicability of Heckman’s model hinges crucially on the assumption that the granting

and default equations are fully specified.

In this paper we mainly focusing on investigating the performance of reject

inference in class 1 and 3. With the unrealistic assumption in class 2 that the proportion

of goods is the same for the rejects as for the accepted, the reject inference methods in

this class are believed to be some tiny refinement to the work on censored samples.

Therefore the methods in class 2 do not fundamentally solve the non-random sampling

problem as we concern.

In the next section we summarize the methodology of this reject inference

technique for logistic regression, and use real data to validate the method.

3. The ModelAs discussed in Section 2, the problem of sample selection bias in credit scoring

can be modeled as two-stage procedure, as displayed in Figure 1. In the first stage, the

bank decides whether a loan should be granted to an applicant or not. A selection

equation is specified to capture this decision. In the second stage, the good/bad risk status

is observed conditional on both intended and unintended selection reasons and only for

those accepted. A default equation is specified to describe how applicant characteristics

affect the probability that a borrower can be classified as in good standing. This equation,

Page 8: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 7 -

7

when correctly specified, can then be used to identify future expected good applicants in

the accept/reject stage.

The solution is a bivariate probit model with sample selection. The model

assumes that there exists an underlying relationship (latent equation)

iii uxy 2*2 += β (1)

such that we observe only the binary outcome (default equation) when

)0( *22 >= i

defaulti yy . (2)

The outcome in the default equation, i.e., good risk vs. bad risk, is only observed

if (selection equation)

)0( 11 >+= iiselecti uzy γ , (3)

where

.),(),1,0(~),1,0(~

21

2

1

ρ=uucorrNuNu

(4)

The coefficients in γ specifies the degree to which lending officers select

applicants based on observed applicant characteristics. The correlation ρ reflects the

degree of to which lending officers systematically select applicants based on variables

that are not observed. The selection equation can always be estimated separately since it

Accept(observed)

Good (observed)

Reject(observed)

Banks(Stage 1)

Applicants(Stage 2)

Good (unobserved)

Bad (observed)

Bad (unobserved)

Figure 1

Page 9: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 8 -

8

is fully observed. However, this will be inefficient unless ρ = 0 (Meng and Schmidt

1985). Similarly, when 0≠ρ , a standard probit or logit technique directly applied to the

default equation yields a biased set of coefficient estimates. Therefore, ρ corrects for

potential unobserved and systematic sample selection bias that could be incurred in the

separate estimation of the default equation (Boyes et al. 1989). Meng and Schmidt (1985)

find the cost of partial observability in the bivariate probit model to be fairly high, and

suggest that extra information may be worth collecting, if possible. Therefore, in the area

of credit scoring it seems unsafe to initially assume that ρ = 0. A better way is to find

some rule used in an early development stage to judge the cost of partial observability.

Unfortunately, quantifying the efficiency loss is not possible without reference to a

particular data set (Poirier, 1980). Hence, the cautious way is to first apply the bivariate

probit model instead of separate estimations to see if the correlation is significant.

For this problem there are three types of observations: no loans, bad loans and

good loans.5 The corresponding log likelihood function is:

);,(ln

)];,()(ln[)1()](1ln[)1(ln

211

21111

ρβγρβγγφγφ

iiiiNi

iiiiiNiii

Ni

xzyyxzzyyzyL

ΦΣ+

Φ−−Σ+−−Σ=

=

== (5)

where φ(.) and Φ(., .; ρ) represent the univariate and bivariate standard normal c.d.f.

A direct measure of default risk is the marginal predicted probability of the

default equation. However, Boyes et al. (1989) propose a method to calculate the

expected probability of default. The expected probability of default is supposed to adjust

for an upward bias of the marginal predicted default probability. This bias occurs because

the predicted probability of default is to be used as an uncertain estimate for a class of

applicants with identical characteristics, rather than a point estimate for one applicant.

We are concerned with comparing three different model structures using predicted

probabilities to classify observations as accepted or rejected. It is not obvious what the

adjustment proposed by Boyes et al. will have on classifications when comparing

different models. Not to confound results with the potential differential impact that

computing expected probabilities might have, we use the marginal predicted probability

for the estimated default risks. Then the default risk based acceptance rule will be:

5 We are assuming that loans are classifiable into two categories: good and bad. In reality there are

several complicating issues which we for simplicity ignore in this setting.

Page 10: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 9 -

9

loan not granted if the marginal predicted probability δ ′≥ ;

loan granted if the marginal predicted probability δ ′≤

where δ ′ is the threshold parameter selected by policy makers. Once the estimated

probability of default is known one can compute expected loss rates for each observation

given some additional assumptions. The classification accuracy of different models can

then be compared.

4. DataWe use the 1987 U.S. CBO database in this analysis. The database is described in

detail by Nucci (1992). The 1987 CBO survey contains information about businesses

operating during 1987. Businesses included in the survey were started in 1987 or earlier.

CBO owner and business information was assembled from questionnaire responses

provided in 1991 by all owners of the sampled businesses. Other business-level

information was supplied by the Internal Revenue Services. The database has two special

features for the purpose of this research: (1) pooled information about whether a small

business obtained a business loan from a commercial bank in 1987; (2) information about

whether a small business is in operations or not in 1991. Based on these data we are able

to construct a hyper scorecard to measure the survival probability of new small

businesses during a four-year period. Complete information on both accepted and

rejected cases allows us to explore the sample selection problem for reject inference.

However, our hyper scorecard does not measure the loan default probability directly but

the business survival probability.

The 1987 CBO contains approximately 126,000 observations split evenly into

five groups: white males, white females, hispanics, asian-pacifics and african-americans.

We selected the white male sub-sample to avoid problems caused by specific lending

programs available in the U.S. for minority groups, as well as potential ethnic and sex

discrimination, which may mislead the true performance of startups under perfect

competitive market conditions. To focus on startups we further limited our sample to

those companies that were started in 1987, including business that might have existed

before 1987, but that had completely new ownership in 1987. To limit the sample to

possible candidates for bank loans we also deleted startups with zero capital.

Page 11: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 10 -

10

There are some missing data in the CBO. The unit response rate among all white

males in the 1987 CBO was approximately 74% (Nucci 1992, Table 1). Unit non-

response occurs when an owner fails to return a questionnaire. This is attributable, in

part, to a difference of approximately three years between the year of business tax filing

and receipt of a CBO questionnaire. Unit non-response and business survival are likely to

be correlated due to owner deaths, for example. We were able to use information about

the survival of the business supplied by responding owners where data were missing from

non-respondents in multi-owner businesses. As well, the survey contains weights that

adjust for unit non-response in both single owner and multiple owner business. That is,

the Census Bureau determined the incidence of non-response according to business size,

location and industry. The inverses of these response frequencies by stratum are

employed as weights when computing parameter estimates. This method is supported by

Holt et al. (1980). Weights are also used when presenting some descriptive statistics. We

clearly indicate when weights are used.

Item non-response occurs when an owner opts not to answer a particular question

on the questionnaire even though the question is applicable. Item non-response varies by

survey item. Item response is above 86% for all variables except college concentration.6

We imputed values for item non-responses using the Bayesian imputation method

described by Rubin (1987). In short, we generated a complete data set where missing data

were randomly replaced conditional on observed data and survey structure. For details

see Astebro and Chen (2000).

The final sample contains 924 startups which represents 1126 owner. Among these

firms, only 304 have at least one commercial loan. That is, only about one third of new

startups obtained financial support from commercial banks. Table 1 shows the

distributions of banks’ decisions on loan granting and startups’ survival. The “bad” rate

those accepted is 18.1%, which is 65.8% lower than that of those rejected. As 33.2% of

applicants are rejected whereas a typical acceptance rate among small businesses is above

85%, and as among the rejected 72.5% of startups will survive over a four-year period,

6 The low item response for college concentration is due to a questionnaire design error: respondentsto the question regarding highest level of education were not asked to skip the college concentrationquestion if they did not attend college. After classifying as ineligible for response all those owners reportingtheir highest level of education as below college attendance, the item response on college concentration is93.1%.

Page 12: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 11 -

11

there is potentially a large room for banks to improve their selection criteria for startup

loan applications. To simplify the problem, we assume that all startups without

commercial bank loans are rejected by banks.7 The number without parenthesis in Table

1 is the number of observations for the cell. The percentage in parenthesis is the

proportion of a row, and the percentage in brackets is the proportion of a column. Table 1

shows that 18.1% of the startups with loans failed within a four-year period, compared to

27.5% for the startups without loans. On the other hand, 36.0% of surviving startups were

granted bank loans, but 24.7% of failed startups were granted bank loans.

Table 1: Loans Granted and Startup Survival.Survival

Yes No Total

Yes244

(81.9%)[36.0%]

54(18.1%)[24.7%]

298

[33.2%]Loan

No434

(72.5%)[64.0%]

165(27.5%)[75.3%]

599

[66.8%]

Total 678(75.6%)

219(24.4%)

897100%

5. ResultsThree basic models are constructed: (1) a default model for those selected by

banks (no correction for bias), (2) a default model for those selected by banks using a

sampling bias corrected default equation, and (3) a default model for the entire

population. Model 1 and 3 are standard applications of the probit model, and model 2 is

the bivariate probit model with sample selection.

The dependent variable of the selection equation is straightforward:

=iy1 0 if the startup has a commercial loan )0( *1 ≤iy ;

=iy1 1 if the startup does not have a commercial loan )0( *1 ≥iy .

7 In reality there are several reasons for not having a bank loan. Some may apply for a bank loan and are

rejected. Others may obtain sufficient financial support from other resources. However, research shows that financialsupport from commercial banks is the major source of funds for startups, and that obtaining a bank loan is ceterisparibus, a positive predictor of startup business survival (Astebro and Bernhardt 2001).

Page 13: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 12 -

12

Loan defaults are not measured in the CBO database. Instead, we use as an

indicator of the default risk whether the startup survives during a 4-year period. We

define:

=iy2 0 if the startup failed in the four-year period )0( *2 ≤iy ;

=iy2 1 if the startup survives in the four-year period )0( *2 ≥iy .

Note that data allows for complete observability. That is, survival is observed both for

those rejected and those accepted. However, when estimating model (2) we assume that

survival is not observed for those rejected.

When adjusting for sample selection bias we need to specify the selection

equation that mimics the procedures used by banks to decide whether to grant a loan to a

small startup. Greene (1998) shows that credit scoring vendors rely heavily on credit

reporting agencies for their decisions. He summarized the variables for selection of

consumer credit into three groups: (1) basic cardholder specification, (2) variables from

credit bureau and (3) credit reference variables. On the small business side, credit

assessment for small businesses still rely heavily on the assessment of owners’ personal

credit behavior. The CBO does not have such information. However, it does contain

owners’ demographic and socio-economic status indicators, which to some degree reflect

their credit behavior. The indicator we used was a county code. We were able to merge in

data from the 1990 Census of Population on various demographic and socio-economic

characteristics that to some degree captures banks’ credit decisions (Greene 1998). The

reason is that banks are concerned about an applicant’s creditworthiness which is gauged

by the applicant’s wealth. Wealth is correlated with aggregate residential and

neighborhood characteristics. Accordingly we classify the variables for selection equation

into three groups: firm data, owners characteristics, and credit reference proxies.

A full description of the variables is presented in Appendix A. Table 2 displays

estimates of the parameters of the model corrected for sample selection. Table 3 shows

estimation results for the three default models. We applied weighted maximum likelihood

to estimate all three models (White, 1982).

Page 14: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 13 -

13

Table 2: Selection Equation for Sample Selection Bias Corrected ModelVariables Coefficient Std. err. P value

Firm characteristicsLacsr1 0.418 0.069 0.000Sleep -1.308 0.595 0.028Asset82n 4.832 1.232 0.000Indassn -0.067 0.030 0.026Trad -0.139 0.183 0.446Scale20 1.788 2.730 0.513Scale50 -2.430 5.028 0.629

Owners characteristicsAge3 -0.103 0.197 0.603Age4 -0.378 0.229 0.099Age5 -0.217 0.285 0.446Workexp3 -0.906 0.303 0.003Workexp4 -0.477 0.294 0.104Workexp5 -0.643 0.242 0.008Workexp6 -0.738 0.261 0.005Edu3 -0.429 0.294 0.145Edu4 -0.973 0.292 0.001Edu5 -0.684 0.322 0.034Edu6 -1.193 0.343 0.001Manager -0.437 0.205 0.033

Credit referenceInherit -0.856 0.329 0.009Homeloan -0.781 0.348 0.025Othloan -0.080 0.195 0.681Toteq45 -0.345 0.262 0.188Medincn -3.093 1.099 0.005Constant -1.875 2.989 0.530Athrho -1.997 1.229 0.1040Rho -0.964 0.087Wald chi2(22) = 88.64; Log likelihood = -.4669244; Prob > chi2 = 0. 0000

In the maximum likelihood estimation, ρ is not directly estimated. Directly

estimated is atanh ρ: atanh ρ = )11ln(

21

ρρ

−+ . The Chi-square of the Wald test for ρ = 0 is

2.642 and is not significant at the 90% confidence level. Therefore we are unable to reject

the null hypothesis that ρ = 0. The result implies that there is weak sample selection

based on unobservable variables.

It is not altogether surprising that there is no strong sample selection bias based on

unobservable variables since we know that banks were (and still are) likely to assess loan

applications of small startups based primarily on owners’ personal credit situation

Page 15: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 14 -

14

(Caouette et al. 1998, Ch. 13). Assessing personal credit worthiness is different from

assessing business worthiness, and the correlation between the two performance criteria

may not be high. Banks may be close to making a random selection with respect to

businesses survival when assessing personal credit which explains why ρ = 0.

There is, however, still a fair degree of observable sample selection because the

failure rate of those granted a bank loan (18.1%) is much lower than those not granted a

bank loan (27.5%). This is partially a function of that obtaining a bank loan affects

business survival positively (Astebro and Bernhardt, 2001), partially a function of bank

selection, and partially self selection for bank loans, as shown by the coefficient estimates

in Table 2.

The negative signs of the coefficients for human capital (age, education and work

experience) in Table 2 imply that there is self-selection. That is, those that have higher

observable human capital are less likely to seek bank loans. However, individuals with

high human capital have higher business survival rates (Astebro and Bernhardt 2001,

Bates 1989, Cressy 1996) This result reveals that the banks’ risk management procedures

does not lead to risk minimization. Boyes et al. (1989), suggest that such results may

reflect that lending policies are designed to seek out accounts that may carry substantially

higher balances – despite higher default risk. It is not obvious why individuals with lower

human capital carry higher balances.

The positive sign of predicted sales indicates that banks typically select larger

firms. The negative signs of HOMELOAN, OTHLOAN and INHERIT may indicate a

substitution effect between commercial bank loans and other sources of capital. The

coefficients for the industry variables show that startups are more likely to have a bank

loan in industries with historically larger average assets for small surviving startups. On

the other hand we find that startups are more likely not to have a bank loan in industries

with larger average assets for all companies. These two estimates together indicate that

banks seem to favor startups in “startup-friendly” industries. Finally, of all the County-

level indicators we had at our disposal there was only one that had a significant effect.

Owners in Counties with higher median household incomes are less likely to obtain a

startup business bank loan. This result is not consistent with Greene’s (1998). The

estimations for this equation proved to be fairly unstable. Slight changes in

Page 16: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 15 -

15

inclusion/exclusion of variables often resulted in dramatic changes in coefficient

estimates for remaining variables.

Table 3 displays the three estimated default equations: (1) a default model for those

selected by banks with no correction for bias, (2) a default model for those selected by

banks using a sampling bias corrected default equation, and (3) a default model for the

entire population.

Table 3: Default models (Survival equation)Model 1* Model 2** Model 3***

Coef. Std. Err. P>|z| Coef. Std. Err. P>|z| Coef. Std. Err. P>|z|Edu3 0.933 0.426 0.029 1.002 0.355 0.005 -0.054 0.299 0.857Edu4 1.898 0.611 0.002 2.227 0.431 0.000 0.032 0.296 0.913Edu5 0.184 0.525 0.727 0.719 0.413 0.082 0.290 0.330 0.380Edu6 0.618 0.647 0.339 1.300 0.448 0.004 0.821 0.343 0.017Workexp4 1.817 0.565 0.001 1.475 0.431 0.001 0.477 0.276 0.084Workexp5 1.358 0.416 0.001 1.323 0.405 0.001 0.062 0.226 0.784Workexp6 2.058 0.476 0.000 1.953 0.437 0.000 0.474 0.236 0.045Denovo -0.674 0.362 0.063 -0.541 0.347 0.119 -0.143 0.214 0.504Franchi 0.861 0.681 0.207 0.811 0.568 0.153 -0.514 0.430 0.232Toteq45 0.101 0.437 0.818 0.178 0.319 0.578 0.151 0.307 0.622Homeloan 0.551 0.681 0.418 1.377 0.660 0.037 0.427 0.340 0.210Othloan 0.394 0.461 0.393 0.622 0.365 0.089 -0.088 0.205 0.667Lacsr1 0.101 0.150 0.502 -0.127 0.092 0.169 0.401 0.087 0.000Asset82 -1.630 2.182 0.455 -3.309 1.785 0.064 0.270 1.473 0.855Indass -0.089 0.052 0.088 -0.007 0.067 0.912 -0.078 0.032 0.015Trad 0.669 0.313 0.033 0.579 0.253 0.022 0.213 0.181 0.240Scale20 12.810 6.354 0.044 4.982 4.837 0.303 3.756 3.341 0.261Scale50 -28.096 13.345 0.035 -12.708 10.084 0.208 -11.189 6.267 0.074Constant 14.208 7.693 0.065 9.035 6.337 0.154 3.875 3.441 0.260Pseudo R-square 0.401 - 0.176

* Default model for those selected by banks (no correction for bias).** Default model for those selected by banks using a sampling bias corrected survival equation based onthe entire population.*** Unconditional default model the entire population.

Table 3 shows that the structures of model 1 and model 2 are quite similar. One

reason might be that the selection and default equations are not significantly correlated.

Though most signs of the coefficient estimates in Model 3 are similar to those in model 1

and 2, some of them are different. These include the signs for EDU3, FRANCHI,

OTHLOAN, and ASSET82. The coefficient for predicted sales (LACSR1) in model 3 is

significant and positive. It is reasonable since higher sales produces greater cash flow that

Page 17: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 16 -

16

can support a firm’s operations longer. However, LACSR1 is not significant in model 1

and 2, and the sign is even negative in model 2. This example clearly illustrates both the

problem of using a selected sample when the selection process is based on predictors that

drive the performance variable as well as the problem of running an unconditional

survival equation. Banks clearly select firms with higher predicted sales (Table 2).8

Therefore, in the selected sample, there is apparently little remaining correlation between

sales and survival. However, as illustrated in model 3 there is a clear association between

the two. The bivariate probit model with sample selection separates out the selection

effect from the true association and indicates that the majority of the association is due to

banks selecting large firms. As the unconditional survival equation does not control for

whether a bank loan was granted or not the coefficient for sales in model 3 is upwards

biased.

The Hausman specification test compares an estimator that is known to be

consistent with an estimator that is efficient under the assumption being tested. If we

assume that model 2 is more efficient than model 1 the Hausman test is χ2=32.75, which

is not significant at the 95% confidence level. This means that the Heckman sample

selection correction does not produce estimates that are significantly different than the

model based on the selected sample. On the other hand, assuming that model 3 is more

efficient than model 1, then Hausman test is very significant (χ2=108.73). Therefore it

supports the assumption that some form of reject inference is indeed necessary to obtain

an appropriate model for the accept/reject decision.

It is noteworthy that the R2 is significantly larger for the uncorrected default

model on the selected sample than the default model for the entire population. Judging

solely by statistical power it thus seems appropriate to use the uncorrected default model.

However, according to the Hausman test this would be a highly inappropriate conclusion.

The ultimate success of a credit scoring model is not measured by the statistical power of

the model but by the delinquency and losses appearing on the credit portfolio. Jacobson

and Roszbach (1999) suggest using the concept of total credit losses λ to measure

scorecard performance. However, this method is questionable for comparing the

usefulness of different scoring models. This is because, fundamentally, simple credit

8 An additional problem is the potential positive effect that granting a loan has on sales.

Page 18: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 17 -

17

scoring models only produces relative risk rankings, and not portfolio loss estimates.9 To

measure how effective the classification rule is in assigning an object to the correct class,

we use the bad rate which is computed simply as the proportion of observed defaults

among the accepted obligors. As we observe all defaults (failures) the bad rate can easily

be computed based on any hypothetical portfolio selection method.10 We use the three

different survival models to predict firm survival, and select all firms with predicted

survival above the cutoff δ such that acceptance rate is the same as that in the sample

selected by banks. However, the within-sample bad rate is expected to be an

optimistically biased estimate of future out-of-sample performance (Hand 1998). Hence,

for the two-class case (e.g., survive vs. fail), the Brier score of equation (7) and

logarithmic score of equation (8) are used to measure out-of-sample accuracy:11

2

1)}|1(ˆ{2

i

N

ii xfc

N ∑=− (7), and

∑=

−−+−N

iiiii xfcxfc

N 1))}|1(ˆ1ln()1()|1(ˆln{1 (8).

Here 1,0=ic where 1 represents if a firm survives and 0 otherwise. )|1(ˆixf is the

estimated probability that the ith object belongs to the class of survivors. The bad rate,

Brier and logarithmic scores are compared across the three models. Table 4 displays

results. Lower values of Brier and logarithmic scores imply higher classification

accuracy.

Table 4: Bad rates, Brier Score and Logarithmic ScoreObligors

selected bybanks

Obligorsselected

usingmodel 1

Obligorsselected

usingmodel 2

Obligorsselected

usingmodel 3

Bad rate 18.12% 15.44% 19.12% 11.07%Brier Score 0.429 0.425 0.338Logarithmic Score 0.790 1.075 0.547

9 If one wants to build models for controlling dollar portfolio losses a more complex set of equationsneeds to be estimated, including a model for expected losses in the event of default.

10 If defaults were not observed for those that were initially rejected one would use the expected defaultprobability derived by Boyes et al. (1989).

11 Hand (1997) provides a good review of aspects of evaluation on classification rules. Hand (2000) suggeststhat common measures of precision are the Brier score and the logarithmic score. In the two-class case the majormeasure in comparing classification rules is precision. Besides misclassification rate, the Brier score and thelogarithmic rate, another common measure of performance is AUC (area under the curve) that we have not explored inthis paper.

Page 19: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 18 -

18

First, we observe from Table 4 that the bad rate among those selected using model

1 is moderately better than that for those granted loans by banks. The bad rate for those

selected using model 2 with sample selection bias correction does not improve when

compared to the overall performance of banks. Also, model 2 does not have higher

prediction accuracy than model 1 based on the values of the Brier score and the

logarithmic score. As suggested by Hand and Henley (1993/4), these results indicate that

reliable reject inference is impossible without performance data on those rejected.

Second, the model developed on the whole population (#3) significantly improves

portfolio performance with respect to the bad rate, Brier score as well as logarithmic

score. Even though model 3 has a much lower pseudo-R2 value, the bad rate decreases

39% from 18.1% for those granted loans by banks to 11.1% for those granted loans using

model 3. If this improvement holds in real situations, large savings are possible. This

evidence supports the claim by Meng and Schmidt (1985) that the cost of partial

observability in the bivariate probit model is fairly high. That is, there are substantial

benefits from collecting performance data on rejected applicants. The results above

imply two rules for reject inference using the bivariate probit model with sample

selection:

(1) if the selection equation is not significantly correlated with the default

equation, and if the selection by observable variables is “less severe” the

prediction gains from using the Heckman reject inference technique is limited;

(2) under all circumstances the loss of observability due to selection seems very

high, and information on outcomes for those typically unobserved seems

worth collecting.

6. Sensitivity analysis

The first implication in the previous section is that if the selection equation and

default equation are not significantly correlated, and if selection by observable variables

is “less severe”, then the gain from reject inference is possibly limited. It is of value to

test under which circumstances this conclusion holds.

Page 20: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 19 -

19

For this purpose we used a scorecard, RiskPro , which has been implemented in

some major North American banks, to generate selected samples.12 The RiskPro

scorecard eschews financial data and concentrates on the business skills of the owners,

the viability of the business, and industry characteristics. Therefore, the risk assessment

procedure based on using RiskPro is assumed highly correlated with the default models

presented in Section 5. The procedure for generating data for this sensitivity test is:

(1) generate a RiskPro score for each observation;

(2) select the cutoff point by setting the acceptance rate identical to that of banks’

actual acceptance rate;

(3) if the RiskPro score of an observation is higher than the cutoff, grant it a

(hyper) bank loan;

(4) follow the steps in section 5 to re-estimate the three models, using hyper bank

loans instead of true bank loans as the dependent variable in the selection

equation.

Table 5 displays the performance of the RiskPro scorecard on the analysis

sample. Using the RiskPro scorecard, while maintaining the acceptance rate, the bad

rate decreases from 18.1% to 11.4%, which represents a 37% decrease.

Table 5: Loans Granted and Startup Survival Using the RiskPro Scorecard

SurvivalYes No Total

Yes264

(88.6%)[38.9%]

34(11.4%)[15.5%]

298

[33.2%]HyperLoan

No414

(69.1%)[61.1%]

185(30.9%)[84.5%]

599

[66.8%]

Total 678(75.6%)

219(24.4%)

897100%

Table 6 contains results of the sensitivity analysis corresponding to Table 2 in

Section 5. Similar to Table 3, Table 7 is the sensitivity analysis for the default equation of

three models given a selection of hyper bank loans using RiskPro .

12 Interested reader may refer to the web site http://www.abRiskSolutions.com.

Page 21: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 20 -

20

Table 6: Selection Equation for Sample Selection Bias Corrected Model Based on HyperBank LoansVariables Coefficient Std. err. P value

Firm characteristicsLacsr1 0.645 0.102 0.000Sleep -1.579 1.222 0.196Asset82n 0.585 2.248 0.795Indassn 0.033 0.033 0.311Trad 0.547 0.265 0.039Scale20 8.005 4.097 0.051Scale50 -7.019 7.039 0.319

Owners characteristicsAge3 0.442 0.312 0.157Age4 0.372 0.435 0.392Age5 0.253 0.469 0.590Workexp3 -0.322 0.592 0.587Workexp4 0.817 0.504 0.105Workexp5 -0.417 0.518 0.422Workexp6 1.083 0.596 0.069Edu3 1.279 0.591 0.030Edu4 0.414 0.547 0.450Edu5 1.057 0.589 0.073Edu6 5.426 0.598 0.000Manager 0.297 0.269 0.270

Credit referenceInherit -0.068 0.566 0.904Homeloan 0.299 0.373 0.423Othloan 1.435 0.310 0.000Toteq45 2.508 0.324 0.000Medincn 2.467 1.541 0.110Constant -12.850 3.909 0.001Athrho -2.027 0.337 0.000Rho -0.966 0.025Wald chi2(22) = 40.68; Log likelihood = -. 1327258; Prob > chi2 = 0. 0017

Notice that the coefficient estimates are considerably different compared to values

in Table 2. Furthermore, the Wald test for ρ = 0 is χ2=29.00, and is significant at below

0.001. Therefore, we are able to reject the null hypothesis that ρ = 0. The value of the

correlation coefficient is –0.966, which is quite close to that estimated by Jacobson and

Roszbach (1999). A negative correlation is also commonly found in other studies (e.g.,

Boyes et al. 1989, Greene 1998).

Page 22: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 21 -

21

Table 7: Sensitivity Analysis of Default Models (Survival Equation)Model 1* Model 2** Model 3***

Coef. Std. Err. P>|z| Coef. Std. Err. P>|z| Coef. Std. Err. P>|z|Edu3 2.151 0.719 0.003 0.550 1.534 0.720 -0.054 0.299 0.857Edu4 3.040 0.808 0.000 1.514 1.774 0.393 0.032 0.296 0.913Edu5 1.802 0.758 0.018 0.466 1.410 0.741 0.290 0.330 0.380Edu6 2.590 0.787 0.001 0.416 1.363 0.760 0.821 0.343 0.017Workexp4 0.529 0.540 0.328 0.219 0.494 0.657 0.477 0.276 0.084Workexp5 -0.033 0.529 0.951 0.031 0.491 0.950 0.062 0.226 0.784Workexp6 0.405 0.546 0.458 0.049 0.403 0.903 0.474 0.236 0.045Denovo -0.607 0.333 0.068 -0.448 0.297 0.131 -0.143 0.214 0.504Franchi 1.595 0.535 0.003 1.292 0.517 0.012 -0.514 0.430 0.232Toteq45 -0.422 0.373 0.258 -0.704 0.332 0.034 0.151 0.307 0.622Homeloan 1.613 0.710 0.023 1.135 0.600 0.059 0.427 0.340 0.210Othloan -0.119 0.444 0.788 0.055 0.388 0.888 -0.088 0.205 0.667Lacsr1 0.127 0.135 0.348 -0.004 0.121 0.970 0.401 0.087 0.000Asset82 -3.167 2.876 0.271 -2.481 3.141 0.430 0.270 1.473 0.855Indass 0.034 0.054 0.528 0.021 0.058 0.713 -0.078 0.032 0.015Trad 0.565 0.414 0.172 0.537 0.364 0.141 0.213 0.181 0.240Scale20 3.225 6.053 0.594 3.271 5.920 0.581 3.756 3.341 0.261Scale50 -0.529 12.091 0.965 -1.734 11.943 0.885 -11.189 6.267 0.074Constant -4.853 7.293 0.506 0.028 7.028 0.997 3.875 3.441 0.260Pseudo R-square 0.267 - 0.176

* Default model for those selected by banks (no correction for bias).** Default model for those selected by banks using a sampling bias corrected survival equation based onthe entire population.*** Unconditional default model the entire population.

Table 7 shows that the survival model 1 (for the hyper selected population only)

and model 2 (for the hyper selected population with reject inference) are stable and

reasonably similar, except for the size of the coefficients for education. The coefficient

signs of the variable for franchise (FRANCHI) of these two models are different from

that of the unconditional model based on the entire population. Other variables with

different coefficients signs include EDU3, WORKEXP5, TOTEQ45, OTHLOAN,

ASSET82 and LACSR1. Again, pseudo-R2 is higher for model 1 than for model 3. The

Hausman test comparing model 1 and 2 (χ2=51.49), as well as model 1 and 3 (χ2= -4528)

shows that the model just based on the censored sample (model 1) is not efficient.

Page 23: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 22 -

22

Table 8: Sensitivity Analysis for Bad Rates and Average Credit Loss RateObligors

selected byRiskPro

Obligorsselected

usingmodel 1

Obligorsselected

usingmodel 2

Obligorsselected

usingmodel 3

Bad rate 11.4% 14.43% 20.07% 11.07%Brier Score 0.440 0.436 0.338Logarithmic Score 0.767 0.916 0.547

The prediction power of model 1, 2 and 3 when using the RiskPro scorecard to

select obligors is presented in Table 8. Astonishingly, the prediction power with sample

selection correction is significantly worse than the model based on the censored sample

only. The bad rate of model 2 is 39% higher than that of model 1. The values of the Brier

score and the logarithmic score do not favor model 2 as well. Under all circumstances the

best solution is still the model based on the whole distribution. It again confirms the

claim of high cost of partial observability by Meng and Schmidt (1985). Also note that

the bad rate of model 2 is even worse than that of the same model in the previous section.

It is clear that the model for sample selection correction (model 2) is extremely sensitive

to data and for practical purposes apparently useless.

7. Conclusion

We explored the use of a bivariate probit model with partial observability for

reject inference in credit socring. This model has recently been applied in some credit

scoring applications. Although most data sets generated in credit scoring applications

contain strong selection bias, we found that using this reject inference technique does not

justify its marginal gain to credit classification. Interestingly, estimating a performance

equation based on the complete sample had a much lower R2 than the performance

equation based on the selected sample. Judging by this measure of statistical prediction

power one would be content with models based on the selected sample and not correct for

selection bias. However, other specification tests showed that performance equations

specified on the selected sample would not capture the correlations that appeared across

the complete distribution.

We tested three different performance models for their ability to correctly classify

outcomes: a probit model based on banks’ censored sample only; a bivariate probit model

Page 24: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 23 -

23

with sample selection correction; and a probit model based on the whole sample. We find

that the ability of the model based on the whole distribution to classify credit risks is

significantly greater than the performance of the other two models. This result confirms

the claim by Meng and Schmidt (1985) that the cost of partial observability is very high.

Therefore, collecting outcome information on rejected will be greatly rewarded.

We also find that the potential improvements offered by this reject inference

technique are not reliable. Under both weak and strong correlation between the selection

and default equations, the prediction accuracy of the model with sample selection

correction is not better than that of the model built only on the censored sample. In the

circumstance of higher correlation between the selection and default equations there

appears to be an over-compensation by the sample selection correction method, which

significantly reduces the prediction power of the model. These results supports the

conclusion of Hand and Henley (1993/4) that reliable reject inference is impossible, even

though the bivariate probit model with partial observability is theoretically a sound reject

inference technique.

Page 25: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 24 -

24

References1. Asch, Latimer, (1995), “How the RMA/Fair, Isaac Credit-Scoring Model Was Built,” the Journal of

Commercial Lending, June, 12-16.

2. Astebro, Thomas and Irwin Bernhardt, (2001): "Bank Loans as Predictors of Small Start-up Business

Survival,"

3. Astebro, Thomas and Gongyue G. Chen, (2000): “Missing data analysis for single choice and multiple

choice survey questions when data are sparse,” presented at 2000 Academy of Management

Conference, Symposium entitled "Much ado about missing data", Toronto, Canada.

4. Bates, Timothy, (1989):

5. Boyes, William, Dennis L. Hoffman and Suart A. Low, (1989): “An econometric analysis of the bank

credit scoring problem,” Journal of Econometrics 40, 3-14.

6. Caouette et al. (1998):

7. Copas, J. B. and H. G. Li, (1997): “Inference for non-random samples (with discussion),” Journal of

the Royal Statistical Society, B, 59, 55-95.

8. Cressy, Robert, (1996): “Are Business Startups Debt-Rationed?”, The Economic Journal, 106

September, pp1253-1270.

9. Donald, Stephen G., (1995): “Two-step estimation of heteroskedastic sample selection models,”

Journal of Econometrics 65, 347-380.

10. Feelders, A. J., (1999): “Credit scoring and reject inference with mixture models,” International

Journal of Intelligent System in Accounting, Finance and Management, 8, 271-279.

11. Greene, William, (1998): “Sample selection in credit-scoring models,” Japan and the World Economy

10, 299-316.

12. Hand, D. J. and W. E. Henley, (1993/4): “Can reject inference ever work?” IMA Journal of

Mathematics Applied in Business & Industry 5, 45-55.

13. Hand, D. J. and W. E. Henley, (1997): “Statistical classification methods in consumer credit scoring: a

review,” Journal of Royal Statistical Association, 160, Part 3, 523-541.

14. Hand, David J. (1997): Construction and Assessment of Classification Rules, Chichster: Wiley.

15. Hand, David J. (1998): “Reject inference in credit operations,” in Credit Risk Modeling: Design and

Application (ed. E. Mays), 181-190, AMACOM.

16. Hand, David J. (2000): “Measuring Diagnostic Accuracy of Statistical Prediction Rules,” Satistica

Neerlandica, forthcoming.

17. Heckman, James J., (1979): “Sample selection bias as a specification error,” Econometrica, 47, 153-

161.

18. Holt, D., Smith T. M. F. and P. D. Winter, (1980): “Regression analysis of data from complex

surveys,” Journal of the Royal Statistical Society, Vol 143 (series A).

19. Hsia, D. C., (1978): “Credit scoring and the equal credit opportunity act,” The Hastings Law Journal,

30, November, 371-448.

Page 26: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 25 -

25

20. Jacobson, Tor, and Kasper F. Roszbach, (2000): “Evaluating bank lending policy and consumer credit

risk,” in Computational Finance 1999 (edited by Yaser S. Abu-Mostafa et al.) the MIT Press, 2000.

21. Joanes, Derrick N., (1993/4): “Reject inference applied to logistic regression for credit scoring,” IMA

Journal of Mathematics Applied in Business & Industry, 5, 35-43.

22. Leonard, Kevin J., (1992): “Credit-scoring models for the evaluation of small-business loan

applications,” IMA Journal of Mathematics Applied in Business & Industry, 4, 89-95.

23. Meng, Chun-Lo, and Peter Schmidt, (1985): “On the cost of partial observation in the bivariate probit

model,” International Economic Review, Vol. 26, No. 1, February, 71-85.

24. 24. Nucci, Alfred R., (1992): “The Characteristics of Business Owners Database,” Discussion Paper

92-7, Center for Economic Studies, U.S. Bureau of the Census.

25. Poirier, Dale J., (1980): “Partial observability in bivariate probit model,” Journal of Econometrics, 12,

209-217.

26. Rubin, Donald B, (1987): Multiple Imputation for Nonresponse in Surveys, John Wiley & Sons.

27. Vella, F, (1992): “Simple tests for sample selection bias in censored and discrete choice models,”

Journal of Applied Econometrics, 7, 413-421.

28. White, Halbert , (1982): “Maximum likelihood estimation of misspecified models,” Econometrica, 50,

1-25.

Page 27: The Economic Value of Reject Inference in Credit Scoring · credit scorecards and a general statistical toolkit for reject inference was recently introduced by SAS Institute. However,

- 26 -

26

Appendix A: Definition of VariablesName Definition

LACSR1 Natural logarithm of predicted sales in the first year of operations. Established using a

production function model

Sleep Proportion of owners in the firm are sleeping partners

Asset82n Industry median assets 80-82 survivors

Indassn Industry average assets of survival firms

Scale20 Proportion of firms in 2-digit industry with 1-19 employees

Scale50 Proportion of firms in 2-digit industry with 1-49 employees

Age3 Proportion of owners whose ages were 35-44

Age4 Proportion of owners whose ages were 45-54

Age5 Proportion of owners whose ages were above 55

Workexp3 Proportion of owners with 2-5 years of work experience

Workexp4 Proportion of owners with 6-9 years of work experience

Workexp5 Proportion of owners with 10-19 years of work experience

Workexp6 Proportion of owners with at least 20 years of work experience

Edu3 Proportion of owners with high school diploma

Edu4 Proportion of owners who started but did not finish college/university

Edu5 Proportion of owners with college/university undergraduate degree

Edu6 Proportion of owners with graduate degree (e.g., Master's degree)

Manager Proportion of owners in the firm with managerial or executive work experience

Inherit =1 if the firm is inherited or given, else 0

Homeloan =1 if any owner in the firm had a home mortgage loan to finance start-up, else 0

Othloan =1 if any owner in the firm had a start-up loan from either friends, family, spouse, or

former owner, else 0

Toteq =1 if total equity from all owners is at least U. S. $25,000, else 0

Medincn Median income of households

Denovo =1 if newly formed business in 1987, else 0

Franchi =1 if franchise, else 0

Trad =1 if the business operates in wholesale trade, retail trade, or services, defined by the

U.S. Bureau of the Census as the following SIC 2-digit, else 0