8
Stochastics and Statistics Updating a credit-scoring model based on new attributes without realization of actual data Yong Han Ju, So Young Sohn Department of Information & Industrial Engineering, Yonsei University, 134 Shinchon-dong, Seoul 120-749, South Korea article info Article history: Available online 28 February 2013 Keywords: Finance Credit-scoring model Exploratory factor analysis (EFA) Logistic regression analysis ANOVA Small and medium enterprise abstract Funding small and medium-sized enterprises (SMEs) to support technological innovation is critical for national competitiveness. Technology credit scoring models are required for the selection of appropriate funding beneficiaries. Typically, a technology credit-scoring model consists of several attributes and new models must be derived every time these attributes are updated. However, it is not feasible to develop new models until sufficient historical evaluation data based on these new attributes will have accumu- lated. In order to resolve this limitation, we suggest the framework to update the technology credit scor- ing model. This framework consists of ways to construct new technology credit-scoring model by comparing alternative scenarios for various relationships between existing and new attributes based on explanatory factor analysis, analysis of variance, and logistic regression. Our approach can contribute to find the optimal scenario for updating a scoring model. Ó 2013 Elsevier B.V. All rights reserved. 1. Introduction Small and medium-sized enterprises (SMEs) occupy a large por- tion of all industries (Ebrahim et al., 2011). According to 2007 sta- tistics reported by the Korean Federation of Small and Medium Business, SMEs account for 99.5% of the total enterprise, 76.9% of the total employment (Kim et al., 2011). There is a number of gov- ernment policies intended to structurally and financially support these SMEs. One of these policies is a credit guarantee for SMEs that is awarded on the basis of technology. The credit guarantee policy provides financial support to SMEs suffering from insuffi- cient investment from private financial institutions due to lack of collateral and has the goal of increasing SME’s accessibility to pri- vate financing sources (Oh et al., 2009). The government has encouraged the creation of new businesses and supported these SMEs via technology credit guarantee schemes to help accelerate economic growth and to decrease the unemployment rate, especially during the current economic down- turn (Kang and Heshimati, 2008). However, this financial support must be selective to prevent wasteful expenditures. In order to se- lect the promising SMEs, technology credit scoring model is used. Since the first attempt of development of technology credit scoring model by Sohn et al. (2005), many studies have been published, focusing on more accurate default prediction by adding behavioral characteristics or economic environment to update existing credit scoring models (Koc ˇenda and Vojtek, 2009; Moon and Sohn, 2010; Paleologo et al., 2010). However, these studies have not addressed the issue of updating existing attributes in the credit scoring mod- el. This is a very important issue, because technology credit-scor- ing models often need to be updated to reflect the changes due to mergers, separations, and deletions of existing attributes. This paper proposes a method to update a credit-scoring model with new attributes. A new model can be fitted only after collect- ing data based on these new attributes. However, a new credit scoring is needed to select SMEs, even before new data are ob- served and utilized for a new credit model fitting. Upon unavail- ability of such data, we propose approaches to find a new technology credit scoring model based on potential relationships between new attributes and existing attributes. Several scenarios are formed to create new attributes from their potential relation- ship with existing attributes. Exploratory factor analysis (EFA) is used to reduce the multi-collinearity in new attributes. Using a lo- gistic regression for loan default against resulting factors one can obtain a new credit scoring model. Analysis of Variance (ANOVA) is used to compare the performances of new credit scoring models created according to different scenarios regarding the relationship between existing and new attributes. As a result of ANOVA, we can find the optimal scenario in terms of prediction accuracy. This paper is organized as follows: Section 2 explains the pro- posed methodology, and Section 3 applies the proposed approach to the evaluation of SMEs in Korea. Section 4 summarizes results of our study and suggests further areas of study. 2. Literature review Credit guarantee scheme is an important part of enterprise financing, especially for small and medium-sized enterprises 0377-2217/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.ejor.2013.02.030 Corresponding author. Tel.: +82 2 2123 4014; fax: +82 2 364 7807. E-mail address: [email protected] (S.Y. Sohn). European Journal of Operational Research 234 (2014) 119–126 Contents lists available at SciVerse ScienceDirect European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor

Updating a credit-scoring model based on new attributes without realization of actual data

Embed Size (px)

Citation preview

Page 1: Updating a credit-scoring model based on new attributes without realization of actual data

European Journal of Operational Research 234 (2014) 119–126

Contents lists available at SciVerse ScienceDirect

European Journal of Operational Research

journal homepage: www.elsevier .com/locate /e jor

Stochastics and Statistics

Updating a credit-scoring model based on new attributes withoutrealization of actual data

0377-2217/$ - see front matter � 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.ejor.2013.02.030

⇑ Corresponding author. Tel.: +82 2 2123 4014; fax: +82 2 364 7807.E-mail address: [email protected] (S.Y. Sohn).

Yong Han Ju, So Young Sohn ⇑Department of Information & Industrial Engineering, Yonsei University, 134 Shinchon-dong, Seoul 120-749, South Korea

a r t i c l e i n f o

Article history:Available online 28 February 2013

Keywords:FinanceCredit-scoring modelExploratory factor analysis (EFA)Logistic regression analysisANOVASmall and medium enterprise

a b s t r a c t

Funding small and medium-sized enterprises (SMEs) to support technological innovation is critical fornational competitiveness. Technology credit scoring models are required for the selection of appropriatefunding beneficiaries. Typically, a technology credit-scoring model consists of several attributes and newmodels must be derived every time these attributes are updated. However, it is not feasible to developnew models until sufficient historical evaluation data based on these new attributes will have accumu-lated. In order to resolve this limitation, we suggest the framework to update the technology credit scor-ing model. This framework consists of ways to construct new technology credit-scoring model bycomparing alternative scenarios for various relationships between existing and new attributes basedon explanatory factor analysis, analysis of variance, and logistic regression. Our approach can contributeto find the optimal scenario for updating a scoring model.

� 2013 Elsevier B.V. All rights reserved.

1. Introduction the issue of updating existing attributes in the credit scoring mod-

Small and medium-sized enterprises (SMEs) occupy a large por-tion of all industries (Ebrahim et al., 2011). According to 2007 sta-tistics reported by the Korean Federation of Small and MediumBusiness, SMEs account for 99.5% of the total enterprise, 76.9% ofthe total employment (Kim et al., 2011). There is a number of gov-ernment policies intended to structurally and financially supportthese SMEs. One of these policies is a credit guarantee for SMEsthat is awarded on the basis of technology. The credit guaranteepolicy provides financial support to SMEs suffering from insuffi-cient investment from private financial institutions due to lack ofcollateral and has the goal of increasing SME’s accessibility to pri-vate financing sources (Oh et al., 2009).

The government has encouraged the creation of new businessesand supported these SMEs via technology credit guaranteeschemes to help accelerate economic growth and to decrease theunemployment rate, especially during the current economic down-turn (Kang and Heshimati, 2008). However, this financial supportmust be selective to prevent wasteful expenditures. In order to se-lect the promising SMEs, technology credit scoring model is used.Since the first attempt of development of technology credit scoringmodel by Sohn et al. (2005), many studies have been published,focusing on more accurate default prediction by adding behavioralcharacteristics or economic environment to update existing creditscoring models (Kocenda and Vojtek, 2009; Moon and Sohn, 2010;Paleologo et al., 2010). However, these studies have not addressed

el. This is a very important issue, because technology credit-scor-ing models often need to be updated to reflect the changes dueto mergers, separations, and deletions of existing attributes.

This paper proposes a method to update a credit-scoring modelwith new attributes. A new model can be fitted only after collect-ing data based on these new attributes. However, a new creditscoring is needed to select SMEs, even before new data are ob-served and utilized for a new credit model fitting. Upon unavail-ability of such data, we propose approaches to find a newtechnology credit scoring model based on potential relationshipsbetween new attributes and existing attributes. Several scenariosare formed to create new attributes from their potential relation-ship with existing attributes. Exploratory factor analysis (EFA) isused to reduce the multi-collinearity in new attributes. Using a lo-gistic regression for loan default against resulting factors one canobtain a new credit scoring model. Analysis of Variance (ANOVA)is used to compare the performances of new credit scoring modelscreated according to different scenarios regarding the relationshipbetween existing and new attributes. As a result of ANOVA, we canfind the optimal scenario in terms of prediction accuracy.

This paper is organized as follows: Section 2 explains the pro-posed methodology, and Section 3 applies the proposed approachto the evaluation of SMEs in Korea. Section 4 summarizes resultsof our study and suggests further areas of study.

2. Literature review

Credit guarantee scheme is an important part of enterprisefinancing, especially for small and medium-sized enterprises

Page 2: Updating a credit-scoring model based on new attributes without realization of actual data

120 Y.H. Ju, S.Y. Sohn / European Journal of Operational Research 234 (2014) 119–126

which often are faced with difficulty in flow of private accounts. Inorder to support these SMEs, various credit guarantee schemeswere actively used for corporate financing by Korea Credit Guaran-tee Funds (KCGFs), and Korea Technology Credit Guarantee Fund(KTCGF) in Korea (Shim, 2006), and credit guarantee amount keepsincreasing over time (Moskovitch and Kim, 2008). Especially,KTCGF was established to help SMES get loan based on their tech-nology. Therefore, evaluation of SMEs’ technology is very impor-tant to reduce the risk involved in lending.

Currently, adverse selection and moral hazard problems arecritical issues in lending for SMEs. Although the guarantee agenciessense the risk in terms of SME’s default, they tend to give SMEschances to innovate by allowing lending (Oh et al., 2009; Navajas,2001; Lee et al., 2006; Stiglitz and Weiss, 1981). Moral hazard ishigh because borrowers are aware that guaranty fund will coverlosses although borrowers become bankrupt (Oh et al., 2009). Inorder to reduce such risk, technology credit scoring model wasintroduced.

Since the first introduction of technology credit scoringmodel by Sohn et al. (2005), many related studies have beenpublished (Sohn and Kim, 2007; Sohn et al., 2007, 2012; Kimand Sohn, 2007, 2010; Moon and Sohn, 2008a,b). Theseprevious studies attempted to improve technology credit-scoringmodels within the context of existing evaluation attributeswhich can be matched with default/non-default of fundrecipient SMEs.

However, previous investigators did not consider issues relatedto updating technology credit scoring model with new attributeswhich have not been applied for lending decision yet. Establishedcredit-scoring models should be updated to reflect changes in attri-butes. Often, new attributes are modified forms of existing attri-butes. In updating process, we consider two different situations:(1) multiple existing attributes are merged into a new attributeand (2) existing attributes are redistributed to become part of sev-eral new attributes. In this paper, we suggest ways to deal withthese two situations.

Fig. 1. Differences between exi

3. Proposed methodology

In order to support small and medium enterprises, technologycredit guarantee fund has been established in Korea. This fundgives the credit warranty to SMEs which score highly in technologyevaluation in terms of the 16 attributes. In Fig. 1, left side showsthe 16 attributes used originally in the scorecard when decidingwhether to guarantee applicant firms (Sohn et al., 2005). In thisstudy, we consider several potential scenarios describing the rela-tionships between existing and new attributes and identify anoptimal scenario that assigns the most appropriate weights onthe existing attributes for the new attributes.

Technology credit scorecard includes a total of 16 attributes(Fig. 1), which can be sorted into four categories: management,technology, marketability, and profitability. Management attri-butes describe CEO’s ability in various areas, such as knowledgemanagement, technology experience, management, funding sup-ply, and human resources. These individual attributes are assigneda maximum of five points each. Technology attributes, which in-clude superiority, technology commercialization, product compet-itiveness, and sales schedule, are assigned 10 points each.However, it is difficult to distinguish CEO’s technology experiencefrom knowledge management, while funding supply should beclassified under profitability rather than management. Other possi-bly misclassified attributes are listed in the right-hand column ofFig. 1.

To update the credit-scoring model with new attributes, weconsidered two alternate cases of change. In the first case, multipleattributes are merged into a new attribute (e.g., P1&P2 (G1), P8&P9(G7), and P4&P15 (G13)). In the second case, existing attribute isredistributed to become parts of several new attributes or a newattribute (e.g., P9 (G6, G7, G8)). When such changes are made bythe first and second cases, scoring models may be updated byeither approach (1) retaining the values of the original attributesfor the first case scenario but identifying split ratios for the secondcase, or approach (2) identifying weights for existing attributes for

sting and new scorecards.

Page 3: Updating a credit-scoring model based on new attributes without realization of actual data

Table 1Results of ANOVA for c-statistics.

Factor (weight ratiofor each attribute)

DF Sum ofsquares

Meansquare

F value pValue

G1*** 2 0.03170439 0.01585219 102177 <.0001G7*** 2 0.00027454 0.00013727 884.79 <.0001G13*** 2 0.00320823 0.00160411 10339.5 <.0001G1 � G7*** 4 0.00075426 0.00018856 1215.42 <.0001G1 � G13*** 4 0.00125317 0.00031329 2019.37 <.0001G7 � G13*** 4 0.00089375 0.00022344 1440.19 <.0001G1 � G7 � G13*** 8 0.00152385 0.00019048 1227.77 <.0001

*** p Value < 0.01.

Y.H. Ju, S.Y. Sohn / European Journal of Operational Research 234 (2014) 119–126 121

the first case while retaining the values of the original attributesfor the second case.

In approach (1), the values of new attributes created by mergingexisting attributes will be the simple sums of existing attributes forthe first case. On the other hand, weights are assigned to existingattributes in the second case. However, some new attributes havea duplication problem. For instance, in P9, the values for G6 and G8are redundant. To address this problem, noise can be added byusing the Bernoulli process which ensures ordinal score.

In approach (2), when existing attributes become part of newattributes, it is necessary to identify appropriate weights for exist-ing attributes for P1&P2 (G1), P8&P9 (G7), and P4&P15 (G13) forthe first case, while resolving the redundancy problem for G6and G8 by generating random noise using the Bernoulli processto distinguish G6 from G8 for the second case.

In approach (1), this kind of random noise must be generated aswell, but weighting the original attributes is not necessary. In addi-tion, variation generated by noise alone is not expected to lead tosubstantially different results in the performance of new technol-ogy credit scoring models.

Fig. 2. Flow chart for the p

Therefore, we conducted a comparison study based on the ap-proach (2) by considering several scenarios for weighting existingattributes. As described in Table 1, let us consider a case that 16attributes used in a previous credit scorecard were reduced to 15attributes in a new scorecard. By merging existing attributes,new attributes are created: technology experience and knowledgemanagement, technology completion, and business applicationability. Additionally, an old attribute, technology superiority, is di-vided to create three new attributes, technology innovation, com-pletion, and expansion. We sought to identify the optimal scenariodescribing their relationship using the following approach.

We considered several scenarios based on different weightingschemes to convert existing attributes into new attributes G1,G7, and G13, as shown in Fig. 1. For each scenario, exploratory fac-tor analysis (EFA) is applied to produce new factors used as explan-atory variables for a logistic regression meant to predict creditdefault (West, 2000; Jung, 2000; Flint, 1997). EFA can be used to re-duce the dimensionality of multivariate data and to detect patternsof association among variables (Gopinath, 1995; Blankson et al.,2007; Ghosh et al., 2011). Factors obtained by EFA for new attri-butes are used in logistic regression, and the resulting c-statisticdescribing the areas of ROC-curves is used to compare individualscenarios. In order to identify the optimal scenario for weight, AN-OVA is used to compare the performances of all scenarios (Sinhaand Zhao, 2008; Amini et al., 2012). The flow chart for the proposedmethodology is displayed in Fig. 2.

4. Empirical study

In order to build a new technology credit-scoring model basedon the new attributes introduced in Section 2, we utilize an empir-ical data set consisting of information from 3618 start-up firms

roposed methodology.

Page 4: Updating a credit-scoring model based on new attributes without realization of actual data

Fig. 3. ROC curve of logistic regression.

122 Y.H. Ju, S.Y. Sohn / European Journal of Operational Research 234 (2014) 119–126

that obtained credit guarantees based on technology credit scoresproduced by the technology attributes conventionally used in Kor-ea during 1999–2004. Among these start-up firms, 1259 went onto default the loan within 5 years. We randomly selected 1259matching control cases (firms that did not default). In non-defaultcase, we did not consider the successfully finished cases as cen-sored with their fund return life term. These firms are small andmedium enterprises in various industries and were evaluated interms of the 16 existing attributes given in the left-hand side ofFig. 1.

This section considers several weighting scenarios for a newscorecard to identify the best option for new attributes. The

Table 2Results of Tukey’s test of weighting plans for G1 � G7 � G13.

Tukeygrouping

Mean N Scenarionumber

G1 (technology experience and knowledgemanagement)

Weight for knowledgemanagement (%)

Weight fortechnologyexperience (%)

A 0.6833 10 3 90 10A 0.6833 10 6 90 10A 0.6832 10 9 90 10B 0.6822 10 2 90 10B 0.6821 10 5 90 10C 0.6805 10 8 90 10D 0.6785 10 27 10 90D 0.6785 10 1 90 10D 0.6784 10 4 90 10D 0.678 10 18 50 50D 0.678 10 7 90 10D 0.678 10 12 50 50D 0.678 10 15 50 50E 0.676 10 14 50 50E 0.676 10 17 50 50E 0.6759 10 11 50 50F 0.672 10 16 50 50F 0.672 10 10 50 50F 0.672 10 13 50 50G 0.6576 10 21 10 90G 0.6575 10 24 10 90H 0.6547 10 26 10 90H 0.6547 10 23 10 90I 0.6536 10 20 10 90J 0.6487 10 19 10 90J 0.6483 10 25 10 90J 0.6481 10 22 10 90

changing attributes of technology experience and knowledge man-agement, technology completion, and business application abilityare combinations of existing attributes. We examined the effectsof three different weighting scenarios (Appendix A). Before theanalysis, all attributes were adjusted to a five-point scale.

Appendix A shows that a total of 27(33) combinations ofweighting schemes are possible. The optimal scenario must beidentified, therefore, by comparison of these 27 scenarios using athree-way ANOVA.

In order to avoid the redundancy problem caused by splitting P9into (G6, G7, G8), the following random variables x, y, and z, thatfollow the independent Bernoulli process are created with proba-bility of event, p, equal to 0.5. Bernoulli process is a finite or infinitesequence of binary random variables that are a discrete-time sto-chastic process that takes only two values such as 0 and 1. First,the new attribute of technology innovation (G6) is created by add-ing x to the value of the existing attribute technology superiority(P9), if y is 1. Conversely, if y is 0, subtracting x from the value oftechnology superiority (P9) creates the new attribute of technologyinnovation (G6). Likewise, technology expansion (G8) is createdfrom technology superiority (P9) by using z. If z is 1, technologyexpansion (G8) is created by adding x to the value of the existingattribute technology superiority (P9). Conversely, if z is 0, subtract-ing x to from the value of technology superiority (P9) creates thenew attributes of technology expansion (G8). This process is re-peated 10 times. Those exceeding the range from 1 to 5 are con-verted to those closest in value within the range.

4.1. Results of exploratory factor analysis

Sohn et al. (2005) suggested a technology credit-scoring modelthat uses logistic regression based on factor analysis of 16 technol-ogy-related attributes. This was the first attempt to improve tech-nology credit scoring by removing the multi-collinearity of various

G7 (technology completion) G13 (business application ability)

Weight for newtechnology (%)

Weight fortechnologysuperiority (%)

Weight forfund supply(%)

Weight forbusiness progress(%)

90 10 10 9050 50 10 9010 90 10 9090 10 50 5050 50 50 5010 90 50 5010 90 10 9090 10 90 1050 50 90 1010 90 10 9010 90 90 1090 10 10 9050 50 10 9050 50 50 5010 90 50 5090 10 50 5010 90 50 5090 10 90 1050 50 90 1090 10 10 9050 50 10 9010 90 50 5050 50 50 5090 10 50 5090 10 90 1010 90 90 1050 50 90 10

Page 5: Updating a credit-scoring model based on new attributes without realization of actual data

Table 3Identification of the best set of evaluation attributes.

New attribute Existingattribute

Weightingratio basedon scenario3 (%)

Weightingratio basedon scenario6 (%)

Weightingratio basedon scenario9 (%)

Technologyexperienceandknowledgemanagement(G1)

Knowledgemanagement

90 90 90

Technologyexperience

10 10 10

Technologycompletion(G7)

Newtechnology

90 50 10

Technologysuperiority

10 50 90

Businessapplicationability (G13)

Fund supply 10 10 10Businessprogress

90 90 90

Table 4Maximum likelihood estimates of a logistic regression model for non-defaulting firmsbased on scenario 6.

Parameter Estimate Standard error Wald chi-square p-Value

Intercept 0.00721 0.0422 0.0293 0.8641Factor 1 0.0592 0.0424 1.9527 0.1623Factor 2 0.045 0.0422 1.1388 0.2859Factor 3*** 0.6357 0.0443 206.263 <.0001Factor 4 0.0248 0.0421 0.3458 0.5565Factor 5 �0.0307 0.0422 0.5285 0.4673Factor 6*** 0.1187 0.0425 7.8015 0.0052Factor 7 �0.0615 0.0425 2.0965 0.1476Factor 8 0.0619 0.0425 2.1254 0.1449Factor 9** 0.0861 0.0421 4.181 0.0409Factor 10*** �0.1187 0.0423 7.8627 0.005Factor 11*** 0.2152 0.0426 25.4785 <.0001Factor 12 �0.0258 0.0422 0.376 0.5398

** p-Value 0.05.*** p-Value 0.01.

Appendix A1Weighting schemes for the new scorecard.

Original attribute Proposal1 (%)

Proposal2 (%)

Proposal3 (%)

New attribute

Knowledgemanagement(P1)

90 50 10 Technologyexperience andknowledgemanagement (G1)

Technologyexperience (P2)

10 50 90

New technology(P8)

90 50 10 Technologycompletion (G7)

Technologysuperiority (P9)

10 50 90

Fund supply (P4) 90 50 10 Businessapplication ability(G13)

Business progress(P15)

10 50 90

Y.H. Ju, S.Y. Sohn / European Journal of Operational Research 234 (2014) 119–126 123

attributes. In this paper, EFA was performed for the 15 newly cre-ated attributes for the 27 scenarios. EFA facilitates the removal ofpotential multi-colinearity of these 15 attributes. EFA was repeatedby the number of times that Bernoulli processes were applied toindividual scenarios. However, random noise generated by the Ber-noulli process did not affect the factor pattern.

As shown in Appendix B, 12 factors identified by EFA explainmore than 90% of the variation in the 15 new attributes in all

scenarios. Different factors are obtained according to each scenario.We performed a logistic regression analysis based on the EFAresults.

4.2. Results of logistic regression analysis

A logistic regression model was used to predict non-defaultusing the 12 factors obtained from EFA resulting in 270 ROC (recei-ver operating characteristic) curves. Among these, Fig. 3 shows 27ROC curves drawn from the first simulation.

Each point on the ROC curve represents a sensitivity and (1-specificity) pair over various threshold values. The closer the curvefollows the left-hand border and the top border of the ROC space,the more accurate the test is (Moon and Sohn, 2008a,b; Brownand Mues, 2012). As shown in Fig. 3, ROC curves vary by scenario.ANOVA conducted with c-statistic as the area under the ROC curveas a dependent variable found the optimal scenario. c-statistic ex-plains area under ROC curve and values closer to 1.0 indicate betterdiscrimination by model (Higgins et al., 1997).

4.3. ANOVA

For ANOVA tests of c-statistics, weighting plans for attributesG1 (technology experience and knowledge management), G7(technology completion), and G13 (business application ability)were used as three factors. The three levels of each factor corre-sponded to the three different weight ratios: (90,10), (50,50),and (10,90). The results of ANOVA are given in Table 1.

These results show that all interaction effects, as well as themain factors, are significant. Therefore, we examined the three-way interaction effects among weighting plans for the three attri-butes. Results of Tukey’s test at a 5% significance level are given inTable 2.

Based on the results presented in Table 2, we observe that sce-narios 3, 6, and 9 yield significantly better performances than theothers. A summary is given in Table 3.

During the establishment of new credit-scoring models, newattributes should be generated following the weighting ratios gi-ven in Table 3. In the results of Tukey’s tests, scenarios 3, 6, and9 are grouped as the best. Table 3 shows that these three scenariosvary in terms of weighting ratios for technology completion (G7).However, the choice of weighting ratios for G7 does not have a sig-nificant effect on the classification performances. New attributesare first generated based on scenario 6 as given in Table 3, and alogistic regression model is fitted using these new attributes.

In Table 4, the following factors are found to be significant atthe 5% level: Factor 3 (technology experience and knowledge man-agement), Factor 6 (sales schedule), Factor 9 (return on invest-ment), Factor 10 (market potential), and Factor 11 (businessapplication ability). Result of EFA is shown in Appendix C. Oneinteresting point is that F10 is assigned a negative coefficient. Thisfactor represents market potential, and most of the SMEs in thearea of high market potential environment are often at a disadvan-tage due to intense competition with large companies in the samemarket. With this, business application ability (Factor 11), salesschedule (Factor 6), and return on investment (Factor 9) representthe business and profitability. Therefore, we can say that businessand profitability have significant effect on SMEs’ default prediction.In addition, technology experience and knowledge management(Factor 3) has the largest coefficient therefore this factor can besaid as the most influential one. Factor 3, Factor 10, and Factor11 were considered as significant variables in previous studies(Moon and Sohn, 2010; Sohn and Moon, 2010) while other vari-ables such as Factor 6, Factor 9 were only significant in our model.Next, we conducted Hosmer and Lemeshow’s Goodness-of-Fit test.

Page 6: Updating a credit-scoring model based on new attributes without realization of actual data

Appendix B1Result of exploratory factor analysis.

Factor Scenario

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Factor 1 G8 G8 G8 G8 G8 G8 G8 G8 G8 G8 G8 G8 G8 G8G6 G6 G6 G6 G6 G6 G6 G6 G6 G6 G6 G6 G6 G6

G7 G7 G7 G7 G7 G7 G7 G7Factor 2 G15 G3 G3 G3 G3 G3 G3 G3 G3 G3 G3 G3 G3 G3

G13 G4 G4 G4 G4 G4 G4 G4 G4 G4 G4 G4 G4 G4Factor 3 G3 G2 G2 G1 G1 G1 G1 G1 G1 G15 G2 G14 G1 G1

G4 G13 G13 G13 G13 G13Factor 4 G1 G1 G1 G5 G5 G5 G5 G2 G2 G1 G15 G1 G2 G2Factor 5 G2 G15 G15 G2 G2 G2 G2 G5 G5 G2 G1 G5 G5 G5Factor 6 G5 G14 G14 G14 G14 G14 G14 G14 G14 G5 G14 G2 G14 G14Factor 7 G12 G5 G5 G12 G12 G12 G12 G12 G12 G12 G9 G10 G12 G12Factor 8 G9 G9 G9 G9 G9 G9 G9 G9 G15 G9 G5 G7 G9 G9Factor 9 G14 G12 G12 G10 G15 G15 G10 G15 G9 G14 G12 G9 G15 G15Factor 10 G10 G10 G10 G15 G10 G10 G15 G10 G10 G10 G7 G12 G10 G10Factor 11 G7 G7 G7 G13 G13 G13 G13 G13 G13 G7 G10 G13 G13 G13Factor 12 G11 G11 G11 G11 G11 G11 G11 G11 G11 G11 G11 G11 G11 G11

Average cumulative 0.91428 0.91477 0.91185 0.9373 0.9373 0.93729 0.94662 0.94666 0.94665 0.91355 0.91403 0.91126 0.93657 0.93658Weight

G1 R1 R1 R1 R1 R1 R1 R1 R1 R1 R2 R2 R2 R2 R2G7 R1 R1 R1 R2 R2 R2 R3 R3 R3 R1 R1 R1 R2 R2G13 R1 R2 R3 R1 R2 R3 R1 R2 R3 R1 R2 R3 R1 R2

15 16 17 18 19 20 21 22 23 24 25 26 27

Factor 1 G8 G8 G8 G8 G8 G8 G8 G8 G8 G8 G8 G8 G8G6 G6 G6 G6 G6 G6 G6 G6 G6 G6 G6 G6 G6G7 G7 G7 G7 G7 G7 G7 G7 G7

Factor 2 G3 G3 G3 G3 G3 G3 G3 G3 G3 G3 G3 G3 G15G4 G4 G4 G4 G4 G4 G4 G4 G4 G4 G4 G4 G13

Factor 3 G1 G1 G1 G1 G15 G2 G14 G1 G1 G1 G1 G1 G3G13 G13 G15 G4

Factor 4 G2 G2 G2 G2 G1 G15 G1 G2 G2 G2 G2 G2 G1Factor 5 G5 G5 G5 G5 G2 G1 G5 G5 G5 G5 G5 G5 G2Factor 6 G14 G14 G14 G14 G5 G14 G2 G14 G14 G14 G14 G14 G5Factor 7 G12 G12 G12 G12 G12 G9 G7 G12 G12 G12 G9 G15 G12Factor 8 G15 G9 G15 G15 G9 G5 G10 G9 G15 G15 G15 G9 G9Factor 9 G9 G15 G9 G9 G14 G12 G9 G15 G9 G9 G12 G12 G14Factor 10 G10 G10 G10 G10 G10 G7 G12 G10 G10 G10 G10 G10 G10Factor 11 G13 G13 G13 G13 G7 G10 G13 G13 G13 G13 G13 G13 G7Factor 12 G11 G11 G11 G11 G11 G11 G11 G11 G11 G11 G11 G11 G11

Average cumulative 0.93656 0.94592 0.94595 0.94593 0.91224 0.91266 0.91016 0.93529 0.93532 0.9353 0.94462 0.9447 0.91428Weight

G1 R2 R2 R2 R2 R3 R3 R3 R3 R3 R3 R3 R3 R3G7 R2 R3 R3 R3 R1 R1 R1 R2 R2 R2 R3 R3 R3G13 R3 R1 R2 R3 R1 R2 R3 R1 R2 R3 R1 R2 R3

Proposal ratio 1 (R1): 90% reflection of the first attribute, 10% reflection of the second attribute.Proposal ratio 2 (R2): 50% reflection of the first attribute, 50% reflection of the second attribute.Proposal ratio 3 (R3): 10% reflection of the first attribute, 90% reflection of the second attribute.

124Y.H

.Ju,S.Y.Sohn/European

Journalof

Operational

Research

234(2014)

119–126

Page 7: Updating a credit-scoring model based on new attributes without realization of actual data

Appendix C1Result of exploratory factor analysis of scenario 6.

Rotated factor pattern

Factor1 Factor 2 Factor3 Factor 4 Factor5 Factor 6 Factor 7 Factor8 Factor 9 Factor 10 Factor 11 Factor 12

G8 0.9280 0.0712 0.0452 0.0354 0.0129 �0.0102 0.0103 0.0508 0.0168 0.0376 0.0279 0.0662G6 0.9254 0.0889 0.0365 0.0330 0.0136 0.0048 0.0191 0.0261 0.0378 0.0264 0.0289 0.0802G7 0.8462 0.0862 0.0418 0.1460 0.0101 0.1070 0.1077 0.0974 0.0648 0.0709 0.0289 0.1541G3 0.0811 0.8833 �0.1054 �0.0111 0.1913 0.0479 0.0485 0.0189 0.0952 0.0767 0.0939 0.0421G4 0.2031 0.7088 0.3815 0.1998 �0.0547 0.0222 0.1136 0.0797 �0.0278 0.0704 0.0966 0.0130G1 0.0710 0.0581 0.9686 0.0383 0.0060 �0.0004 �0.0251 0.0112 0.0232 0.0109 0.0170 0.0012G5 0.1536 0.0948 0.0532 0.9721 0.0384 �0.0055 0.0004 0.0370 0.0426 �0.0073 0.0101 0.0515G2 0.0229 0.1341 0.0021 0.0385 0.9650 0.0553 0.0549 0.0386 0.0611 0.0345 0.1539 0.0205G14 0.0657 0.0536 0.0014 �0.0052 0.0549 0.9752 0.0739 0.0386 0.1420 0.0355 0.0614 0.0691G12 0.0948 0.1058 �0.0168 0.0016 0.0542 0.0731 0.9780 0.0117 0.0610 0.0522 0.0516 0.0400G9 0.1281 0.0602 0.0184 0.0378 0.0379 0.0383 0.0118 0.9791 0.0680 0.0061 0.0480 0.0892G15 0.0875 0.0681 0.0225 0.0440 0.0628 0.1465 0.0633 0.0709 0.9641 0.0728 0.0628 0.0919G10 0.0985 0.1062 0.0174 �0.0062 0.0339 0.0349 0.0517 0.0061 0.0692 0.9835 0.0223 0.0258G13 0.0626 0.1425 0.0270 0.0114 0.1568 0.0630 0.0534 0.0497 0.0626 0.0233 0.9671 0.0278G11 0.2565 0.0455 0.0020 0.0559 0.0222 0.0745 0.0433 0.0974 0.0963 0.0281 0.0291 0.9483

Y.H. Ju, S.Y. Sohn / European Journal of Operational Research 234 (2014) 119–126 125

The resulting p-value is 0.2824 (v2 value = 9.7588, DF = 8) whichsupports goodness-of-fit.

5. Conclusions

Many SMEs are important cornerstones to the flow dynamics ofindustry. In Korea, the government has encouraged the creation ofnew businesses and has chosen which SMEs to support financiallybased on their expertise in and use of technology. Utilizing histor-ical data, many studies have been conducted to develop technologycredit-scoring models that can reduce the technology credit risk in-curred by the government.

In this paper, we showed how the proposed approach can be ap-plied to updating a credit-scoring model with new attributes.Through EFA, we eliminated multi-collinearity of new attributes.The resulting factors were used for logistic regression accordingto scenarios. Next we found the best model using ANOVA. The pro-posed process in our study was designed for situations in whichnew models cannot be obtained due to a lack of experience withnew attributes. Until some historical data are accumulated withnew attributes, guarantee institutions can utilize our approachfor their credit scoring business.

Our proposed process can be applied to other scoring systemsas well when scoring system should be improved and modified.Our research can be improved in several ways. Further study needsto provide approaches that can be applicable to consider additionalattributes which are totally new. In addition, this setting with newattributes can be extended to life data analysis based on the time ittakes until loan default of SMEs. These topics are left for furtherareas of research.

Acknowledgement

This research was supported by Basic Science ResearchProgram through the National Research Foundation of Korea(NRF) funded by Ministry of Science, ICT & Future Planning(2013R1A2A1A09004699).

Appendix A

See Appendix A1.

Appendix B

See Appendix B1.

Appendix C

See Appendix C1.

References

Amini, M., Wakolbinger, T., Racer, M., Nejad, M.G., 2012. Alternative supply chainproduction-sales policies for new product diffusion: an agent-based modelingand simulation approach. European Journal of Operational Research 216 (2),301–311.

Blankson, C., Cheng, J.M.S., Spears, N., 2007. Determinants of banks selection in USA,Taiwan and Ghana. The International Journal of Bank Marketing 25 (7), 469–489.

Brown, I., Mues, C., 2012. An experimental comparison of classification algorithmsfor imbalanced credit scoring data sets. Expert Systems with Applications 39(3), 3446–3453.

Ebrahim, N.A., Rashid, S.H.A., Ahmed, S., Taha, Z., 2011. The effectiveness of virtualR&D teams in SMEs: Experiences of Malaysian SMEs. Industrial Engineering andManagement Systems 10 (2), 371–384.

Flint, T.A., 1997. Predicting student loan defaults. The Journal of Higher Education68 (3), 322–354.

Ghosh, A., Bandyopadhyay, G., Choudhuri, K., 2011. Forecasting BSE sensex underoptimal conditions: an investigation post factor analysis. Journal of BusinessStudies Quarterly 3 (2), 57–73.

Gopinath, C., 1995. Bank strategies toward firms in decline. Journal of BusinessVenturing 10 (1), 75–92.

Higgins, T.L., Estafanous, F.G., Loop, F.D., 1997. ICU admission score for predictingmorbidity and mortality risk after coronary artery bypass grafting. The Annalsof Thoracic Surgery 64 (4), 1050–1058.

Jung, J.W., 2000. Application of statistical models for default probability of loans inmortgage companies. The Korean Communications in Statistics 7 (2), 605–616.

Kang, J.W., Heshimati, A., 2008. Effect of credit guarantee policy on survival andperformance of SMEs in Republic of Korea. Small Business Economics 31 (4),445–462.

Kim, Y., Sohn, S.Y., 2007. Technology scoring model considering rejected applicantsand effect of reject inference. Journal Operational Research Society 58 (10),1341–1347.

Kim, H.S., Sohn, S.Y., 2010. Support vector machines for default prediction of SMEsbased on technology credit. European Journal of Operational Research 201 (3),838–846.

Kim, S., Seo, J., Sohn, P., 2011. SMEs’ capital structure behavior on cash holdings toKorean financial crisis: the evidence from emerging market. African Journal ofBusiness Management 5 (34), 13095–13115.

Kocenda, E., Vojtek, M., 2009. Default Predictors and Credit Scoring Models for RetailBanking. CESifo Working Paper No. 2862.

Lee, Y., Rhee, C., Sung, T., 2006. Fiscal policy in Korea: before and after the financialcrisis. International Tax and Public Finance 13 (4), 509–531.

Moon, T.H., Sohn, S.Y., 2008a. Case based reasoning for predicting multi-periodfinancial performances of technology-based SMEs. Applied ArtificialIntelligence 22 (7), 1–14.

Moon, T.H., Sohn, S.Y., 2008b. Technology scoring model for reflecting evaluator’sperception within confidence limits. European Journal of Operational Research184 (3), 981–989.

Moon, T.H., Sohn, S.Y., 2010. Technology credit scoring model considering both SMEcharacteristics and economic conditions: the Korean case. Journal of theOperational Research Society 61 (4), 666–675.

Moskovitch, I., Kim, D., 2008. Building and entrepreneurial economy: the case ofKorea 1998–2005. Global Economic Review 37 (1), 63–73.

Page 8: Updating a credit-scoring model based on new attributes without realization of actual data

126 Y.H. Ju, S.Y. Sohn / European Journal of Operational Research 234 (2014) 119–126

Navajas, A.R., 2001. Credit Guarantee Schemes: Conceptual Frame. GTZ/FONDESIF:Financial System Development Project. <http://www2.gtz.de/dokumente/bib/03-0076.pdf>.

Oh, I., Lee, J.D., Heshmati, A., Choi, G.G., 2009. Evaluation of credit guaranteepolicy using propensity score matching. Small Business Economics 33 (3),335–351.

Paleologo, G., Elisseeff, A., Antonini, G., 2010. Subagging for credit scoring models.European Journal of Operational Research 201 (2), 490–499.

Shim, L., 2006. Corporate credit guarantees in Asia. BIS Quarterly Review, 85–98.Sinha, A.P., Zhao, H., 2008. Incorporating domain knowledge into data mining

classifiers: an application in indirect lending. Decision Support Systems 46 (1),287–299.

Sohn, S.Y., Kim, H.S., 2007. Random effects logistic regression model for defaultprediction of technology credit guarantee fund. European Journal of OperationalResearch 183 (1), 472–478.

Sohn, S.Y., Moon, T.H., 2010. Technology credit scoring model considering both SMEcharacteristics and economic conditions: the Korean case. Journal of theOperational Research Society 61, 666–675.

Sohn, S.Y., Moon, T.H., Kim, S.H., 2005. Improved technology scoring model forcredit guarantee fund. Expert Systems with Applications 28 (2), 327–331.

Sohn, S.Y., Kim, H.S., Moon, T.H., 2007. Predicting the financial performancescorecard of technology fund for SME using structural equation model. ExpertSystems with Applications 32 (3), 890–898.

Sohn, S.Y., Doo, M.K., Ju, Y.H., 2012. Pattern recognition for evaluator errors in acredit scoring model for technology-based SMEs. Journal of the OperationalResearch Society 63, 1051–1064.

Stiglitz, J., Weiss, A., 1981. Credit rationing in markets with imperfect information.American Economic Review 71 (3), 393–410.

West, D., 2000. Neural network credit scoring models. Computers and OperationsResearch 28 (11/12), 1131–1152.