6
RamG Data Analytics and Insights Page 1 RamG Data Analy cs and Insights View by Ram CHAID Decision Tree: Reverse Mortgage Loan Termina on Example Business Context Reverse Mortgage Loan (RML) enables Senior Ci zens to avail of periodical payments from a lender against the mortgage of his/her house to supplement their income while remaining the owner and occupying the house. Interest on the payments availed will be accumulated. One of the types of Reverse mortgage is Home Equity Conversion Mortgage (HECM), insured by the Federal Housing Administra on (FHA) and cons tu ng over 90% of all reverse mortgage loans originated in the U.S. market 2 . Understanding termina on outcomes of HECM loans is essen al for the FHA insurance program and the long- 2 The data is downloaded from HUD.GOV websire 3 . All loans originated in 2003 and 2004 are considered for below example. If the termina on date is populated, the HECM loan is considered as close. Age and Gender of borrowers are used below example to illustrate Decision Tree Building process. CHAID Algorithm 1 Process Steps Step 1: Find best split for each Predictor or Independent Variable by merging categories of the Predictor variable Step 2: Compare Predicator Variables and select the best variable for the node and split the node into two child nodes Step 3: Con nue Step 1 and Step 2 for each of the new nodes un l sa sfy the stopping criteria Step 1: Best Split for a Predictor Variable In this example Age of borrowers are considered as one of the predictor or Independent variable. Age is ordinal variable hence merging of con nuous categories are allowed. Ordinal variables (without missing category) are referred as Monotonic Predictors. An ordinal variable with missing category is considered as Floa ng Predictor. In the oa ng predictors, oa ng category can be grouped with any other category. Since Age of borrower does not have any missing category, it is considered as Monotonic Predictor and categories will be compared and merged to the subsequent category only.

Decision tree-an-illustration-of-decision-tree-building-process

Embed Size (px)

Citation preview

Page 1: Decision tree-an-illustration-of-decision-tree-building-process

RamG Data Analytics and Insights Page 1

RamG Data Analy cs and Insights

View by Ram

CHAID Decision Tree: Reverse Mortgage Loan Termina on Example

Business Context

Reverse Mortgage Loan (RML) enables Senior Ci zens to avail of periodical payments from a lenderagainst the mortgage of his/her house to supplement their income while remaining the owner andoccupying the house. Interest on the payments availed will be accumulated. One of the types ofReverse mortgage is Home Equity Conversion Mortgage (HECM), insured by the Federal HousingAdministra on (FHA) and cons tu ng over 90% of all reverse mortgage loans originated in the U.S.market2.

Understanding termina on outcomes of HECM loans is essen al for the FHA insurance program and thelong- 2

The data is downloaded from HUD.GOV websire3. All loans originated in 2003 and 2004 are consideredfor below example. If the termina on date is populated, the HECM loan is considered as close. Age andGender of borrowers are used below example to illustrate Decision Tree Building process.

CHAID Algorithm1 Process Steps

Step 1: Find best split for each Predictor or Independent Variable by merging categories of thePredictor variable

Step 2: Compare Predicator Variables and select the best variable for the node and split thenode into two child nodes

Step 3: Con nue Step 1 and Step 2 for each of the new nodes un l sa sfy the stopping criteria

Step 1: Best Split for a Predictor Variable

In this example Age of borrowers are considered as one of the predictor or Independent variable. Age isordinal variable hence merging of con nuous categories are allowed. Ordinal variables (without missingcategory) are referred as Monotonic Predictors. An ordinal variable with missing category is consideredas Floa ng Predictor. In the oa ng predictors, oa ng category can be grouped with any othercategory.

Since Age of borrower does not have any missing category, it is considered as Monotonic Predictor andcategories will be compared and merged to the subsequent category only.

Page 2: Decision tree-an-illustration-of-decision-tree-building-process

RamG Data Analytics and Insights Page 2

RamG Data Analy cs and Insights

View by Ram

1. Calculate Chi Square Sta s cs between two of the subsequent categories of the Predictor orIndependent variable, Age of the Borrowers. Each value of borrower will be considered asseparate category, but for simplicity the below category groups are created.

Borrower AgeLoan Terminate Low-65 <=70 <=75 <=80 <=85 <=90 <=95 95-HighNo 5100 7709 7922 5676 2306 531 75 19Yes 3512 5741 7260 7578 4679 1918 776 200Total 8612 13450 15182 13254 6985 2449 851 219

2. Merge the categories which are the least signi cantly di erent- will

be candidates for merging. Below will be tables a er merging.

Borrower Age

Loan Terminate Low-65 <=70 <=75 <=80 <=85 <=90 90-HighNo 5100 7709 7922 5676 2306 531 94Yes 3512 5741 7260 7578 4679 1918 976

Total 8612 13450 15182 13254 6985 2449 1070

Chi-square between subsequent categories will be calculated again to nd next leastsigni cantly di erent category group.

3. Con nue category merging steps un l two categories are le , the process also involve spli ngcategories of a group which has more than 2 categories.

4. Final Split for the variable, age of borrower

Loan TerminateBorrower Age

<=69 >69

No 11187 18151Yes 8012 23652

Total 19199 41803

7.8

75.88248.2

184.4

110.

69.77

0.004

Page 3: Decision tree-an-illustration-of-decision-tree-building-process

RamG Data Analytics and Insights Page 3

RamG Data Analy cs and Insights

View by Ram

Expected Count9233 20105

9966 21698

Chi Square Detailed413 190

383 176

Chi Square Sta s cs 1161.95

Follow similar steps for other predictor variables. Consider Gender of borrower. Gender is nominalvariable; hence any category can be clubbed with any other category. In the original paper, the nominalvariables are referred as Free Predictors.

Gender

Loan Terminate Couple Female Male Not Reported TotalNo 12397 13349 3543 49 29338Yes 9895 16003 5645 121 31664

Total 22292 29352 9188 170 61002

First Merge Itera on

Gender

Loan Terminate Couple Female Male Not Reported TotalNo 12397 13349 3543 49 29338Yes 9895 16003 5645 121 31664

Total 22292 29352 9188 170 61002

Expected Count11113 14633

11179 14719

Chi Square Detailed148 113

147 112

Chi Square Sta s cs 520

Expected Count11288 4652.374

11004 4535.626

Chi Square Detailed109 265

112 271

Chi Square Sta s cs 757

Expected Count12352 94

9940 76

Chi Square Detailed0 22

0 27

Page 4: Decision tree-an-illustration-of-decision-tree-building-process

RamG Data Analytics and Insights Page 4

RamG Data Analy cs and Insights

View by Ram

Chi Square Sta s cs 49

Expected Count12865 4027

16487 5161

Chi Square Detailed18 58

14 45

Chi Square Sta s cs 136

Expected Count13321 77

16031 93

Chi Square Detailed0.059 10

0.049 9

Chi Square Sta s cs 19

Expected Count3526.7 65.3

5661.3 104.7

Chi Square Detailed0.1 4.0

0.0 2.5

Chi Square Sta s cs 6.7

Second Merge Itera on

Gender

Loan Terminate Couple Female

Male &NotReported Total

No 12397 13349 3592 29338Yes 9895 16003 5766 31664

Total 22292 29352 9358 61002

Expected Count11113 14633

11179 14719

Chi Square Detailed148 113

147 112

Chi Square Sta s cs 520

Expected Count11262 4727

11030 4631

Chi Square Detailed114 273

117 278

Page 5: Decision tree-an-illustration-of-decision-tree-building-process

RamG Data Analytics and Insights Page 5

RamG Data Analy cs and Insights

View by Ram

Chi Square Sta s cs 783

Expected Count12846 4095

16506 5263

Chi Square Detailed20 62

15 48

Chi Square Sta s cs 145

Final Split for the predictor variable, Gender of borrower

Gender

Loan Terminate Couple

Male, Female& NotReported Total

No 12397 16941 29338Yes 9895 21769 31664

Total 22292 38710 61002

Expected Count10721 18617

11571 20093

Chi Square Detailed262 151

243 140

Chi Square Sta s cs 795

Step 2: Selec ng Best Predictor Variable for Node split

The most discriminate variable based on Chi Square Sta s cs will be selected to split the parent node tochild nodes. In the above example, if Age and Gender of borrowers are only two predictor variables,based on Chi Square Sta s cs (Age Chi-Square 1162 and Gender Chi-Square 765), age is selected asvariable to split the parent node.

Main Data

Yes 31664 52%

NO 29338 48%

Age of Borrower

Age<=69 Age>69

Yes 8012 42% Yes 23652 57%

NO 11187 58% NO 18151 43%

Page 6: Decision tree-an-illustration-of-decision-tree-building-process

RamG Data Analytics and Insights Page 6

RamG Data Analy cs and Insights

View by Ram

References

1. G.V.Kass, An exploratory technique for inves ga ng large quan es of categorical data, AppliedSta s cs

2. Tonja Bowen Bishop , Hui Shan , Reverse Mortgages: A Closer Look at HECM Loans3. h p://portal.hud.gov