Upload
avisek-kundu
View
112
Download
2
Embed Size (px)
Citation preview
RamG Data Analytics and Insights Page 1
RamG Data Analy cs and Insights
View by Ram
CHAID Decision Tree: Reverse Mortgage Loan Termina on Example
Business Context
Reverse Mortgage Loan (RML) enables Senior Ci zens to avail of periodical payments from a lenderagainst the mortgage of his/her house to supplement their income while remaining the owner andoccupying the house. Interest on the payments availed will be accumulated. One of the types ofReverse mortgage is Home Equity Conversion Mortgage (HECM), insured by the Federal HousingAdministra on (FHA) and cons tu ng over 90% of all reverse mortgage loans originated in the U.S.market2.
Understanding termina on outcomes of HECM loans is essen al for the FHA insurance program and thelong- 2
The data is downloaded from HUD.GOV websire3. All loans originated in 2003 and 2004 are consideredfor below example. If the termina on date is populated, the HECM loan is considered as close. Age andGender of borrowers are used below example to illustrate Decision Tree Building process.
CHAID Algorithm1 Process Steps
Step 1: Find best split for each Predictor or Independent Variable by merging categories of thePredictor variable
Step 2: Compare Predicator Variables and select the best variable for the node and split thenode into two child nodes
Step 3: Con nue Step 1 and Step 2 for each of the new nodes un l sa sfy the stopping criteria
Step 1: Best Split for a Predictor Variable
In this example Age of borrowers are considered as one of the predictor or Independent variable. Age isordinal variable hence merging of con nuous categories are allowed. Ordinal variables (without missingcategory) are referred as Monotonic Predictors. An ordinal variable with missing category is consideredas Floa ng Predictor. In the oa ng predictors, oa ng category can be grouped with any othercategory.
Since Age of borrower does not have any missing category, it is considered as Monotonic Predictor andcategories will be compared and merged to the subsequent category only.
RamG Data Analytics and Insights Page 2
RamG Data Analy cs and Insights
View by Ram
1. Calculate Chi Square Sta s cs between two of the subsequent categories of the Predictor orIndependent variable, Age of the Borrowers. Each value of borrower will be considered asseparate category, but for simplicity the below category groups are created.
Borrower AgeLoan Terminate Low-65 <=70 <=75 <=80 <=85 <=90 <=95 95-HighNo 5100 7709 7922 5676 2306 531 75 19Yes 3512 5741 7260 7578 4679 1918 776 200Total 8612 13450 15182 13254 6985 2449 851 219
2. Merge the categories which are the least signi cantly di erent- will
be candidates for merging. Below will be tables a er merging.
Borrower Age
Loan Terminate Low-65 <=70 <=75 <=80 <=85 <=90 90-HighNo 5100 7709 7922 5676 2306 531 94Yes 3512 5741 7260 7578 4679 1918 976
Total 8612 13450 15182 13254 6985 2449 1070
Chi-square between subsequent categories will be calculated again to nd next leastsigni cantly di erent category group.
3. Con nue category merging steps un l two categories are le , the process also involve spli ngcategories of a group which has more than 2 categories.
4. Final Split for the variable, age of borrower
Loan TerminateBorrower Age
<=69 >69
No 11187 18151Yes 8012 23652
Total 19199 41803
7.8
75.88248.2
184.4
110.
69.77
0.004
RamG Data Analytics and Insights Page 3
RamG Data Analy cs and Insights
View by Ram
Expected Count9233 20105
9966 21698
Chi Square Detailed413 190
383 176
Chi Square Sta s cs 1161.95
Follow similar steps for other predictor variables. Consider Gender of borrower. Gender is nominalvariable; hence any category can be clubbed with any other category. In the original paper, the nominalvariables are referred as Free Predictors.
Gender
Loan Terminate Couple Female Male Not Reported TotalNo 12397 13349 3543 49 29338Yes 9895 16003 5645 121 31664
Total 22292 29352 9188 170 61002
First Merge Itera on
Gender
Loan Terminate Couple Female Male Not Reported TotalNo 12397 13349 3543 49 29338Yes 9895 16003 5645 121 31664
Total 22292 29352 9188 170 61002
Expected Count11113 14633
11179 14719
Chi Square Detailed148 113
147 112
Chi Square Sta s cs 520
Expected Count11288 4652.374
11004 4535.626
Chi Square Detailed109 265
112 271
Chi Square Sta s cs 757
Expected Count12352 94
9940 76
Chi Square Detailed0 22
0 27
RamG Data Analytics and Insights Page 4
RamG Data Analy cs and Insights
View by Ram
Chi Square Sta s cs 49
Expected Count12865 4027
16487 5161
Chi Square Detailed18 58
14 45
Chi Square Sta s cs 136
Expected Count13321 77
16031 93
Chi Square Detailed0.059 10
0.049 9
Chi Square Sta s cs 19
Expected Count3526.7 65.3
5661.3 104.7
Chi Square Detailed0.1 4.0
0.0 2.5
Chi Square Sta s cs 6.7
Second Merge Itera on
Gender
Loan Terminate Couple Female
Male &NotReported Total
No 12397 13349 3592 29338Yes 9895 16003 5766 31664
Total 22292 29352 9358 61002
Expected Count11113 14633
11179 14719
Chi Square Detailed148 113
147 112
Chi Square Sta s cs 520
Expected Count11262 4727
11030 4631
Chi Square Detailed114 273
117 278
RamG Data Analytics and Insights Page 5
RamG Data Analy cs and Insights
View by Ram
Chi Square Sta s cs 783
Expected Count12846 4095
16506 5263
Chi Square Detailed20 62
15 48
Chi Square Sta s cs 145
Final Split for the predictor variable, Gender of borrower
Gender
Loan Terminate Couple
Male, Female& NotReported Total
No 12397 16941 29338Yes 9895 21769 31664
Total 22292 38710 61002
Expected Count10721 18617
11571 20093
Chi Square Detailed262 151
243 140
Chi Square Sta s cs 795
Step 2: Selec ng Best Predictor Variable for Node split
The most discriminate variable based on Chi Square Sta s cs will be selected to split the parent node tochild nodes. In the above example, if Age and Gender of borrowers are only two predictor variables,based on Chi Square Sta s cs (Age Chi-Square 1162 and Gender Chi-Square 765), age is selected asvariable to split the parent node.
Main Data
Yes 31664 52%
NO 29338 48%
Age of Borrower
Age<=69 Age>69
Yes 8012 42% Yes 23652 57%
NO 11187 58% NO 18151 43%
RamG Data Analytics and Insights Page 6
RamG Data Analy cs and Insights
View by Ram
References
1. G.V.Kass, An exploratory technique for inves ga ng large quan es of categorical data, AppliedSta s cs
2. Tonja Bowen Bishop , Hui Shan , Reverse Mortgages: A Closer Look at HECM Loans3. h p://portal.hud.gov