Upload
tommy96
View
501
Download
2
Embed Size (px)
DESCRIPTION
Citation preview
1
Data Mining Techniques & Its Applications in Insurance
Society of ActuariesSan Francisco Spring Meeting
June 24 - 26, 2002
Lijia Guo, PhD, ASA, MAAAUniversity of Central Florida
Session 11L
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 2
Learning Objectives
§ Understanding a Data Mining Process§ Having insight about the actuarial
applications of data mining techniques § Exploring the perspective of applying data
mining techniques in your own practice
2
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 3
Agenda
§ Introduction§ Data Mining Methods§ Actuarial Applications§ Conclusions & Questions
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 4
Introduction
§ Changes in Information Technology § Availability of large quantity of insurance
data§ Mind your business by mining your data
3
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 5
What is Data Mining?
§ An information discovery process.• Prediction
-- Finding unknown values/relationships/patterns from known large database
• Description-- interpretation of a large database
§ Making crucial business decisions - turn the newfound knowledge into actionable results
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 6
Why Use Data Mining?
§ Product development§ Marketing§ Analysis of Claims Distribution§ Healthcare§ ALM§ Fraud detection§ Solvency analysis
4
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 7
Data Mining Methods
§ Classification§ Regression§ Clustering§ Summarizations§ Dependency modeling§ Deviation Detection
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 8
Data Mining Algorithms
§ Decision Trees (Breiman et al., 1984)§ Logistic regression (Hosmer & Lemeshow,
1989) § Neural Networks (Bishop, 1995; Ripley, 1996) § Fuzzy Logics§ Genetic Algorithms (Goldberg, 1989)§ Bayesian analysis, (Cheeseman et al., 1988)§ Hybrid algorithms
5
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 9
Data Mining Algorithms-- Decision Trees
§ What are decision trees§ How decision trees work
• Choosing variables• Grouping• Creating the leaf nodes of the tree
§ Strengths and weaknesses
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 10
Data Mining Algorithms-- Neural Networks
§ What are Neural Networks§ How Neural Networks work
• Processing elements• Training• Predicting
§ Strengths and weaknesses
6
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 11
Data Mining Algorithms-- Hybrid Algorithms
§ Problems with standard algorithms§ Advanced algorithms§ Discovery-driven approaches§ Mixture of algorithms
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 12
Data Mining: Knowledge Discovery Process
§ Data Acquisition§ Data integration§ Data exploration § Model building§ Understanding your model§ Post-mining analysis
7
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 13
Data Mining Process: Data Acquisition
§ Data acquisition• Getting your data• Data qualification issues• Data quality issues• Data derivation
§ Defining a study• Basic Risk Characteristics
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 14
Data Mining Process:Data Acquisition -- Case Study
§ SOA database for RP-2000 Mortality Tables• 10,957,103 exposed life-years
§ Subset of the database that includes all the lives above age 70 (3,769,956 exp, 217,490 death)§ Risk groups
• Age, gender, participation status, union, pay type, collar type, and annuity amount, etc.
8
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 15
Data Mining Process:Data Acquisition -- Case Study
§ Existing study on advanced-age mortality• Smooth extension of the patterns • Families of curves - Gompertz law, etc.• All these approaches aim at explaining the age
pattern of mortality.
§ Mortality distribution varies among seniors with different backgrounds
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 16
Data Mining Process: Data Integration
§ To identify the factors that influence mortality§ To study the interaction of the risk factors§ To gain the perspective on the importance
of these factors
9
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 17
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 18
Data Mining Process: Data Integration-- Case Study
§ Main effect exists for all six variables considered§ Degrees of the effects of the risk factors are
different.• the interaction of these factors • the importance of the factors
10
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 19
Data Mining Process: Data exploration
§ Decision tree algorithm • Analyze the influences and the importance of
the mortality risk factors • observations are grouped into several segments
§ Algorithm - SAS/Enterprise Miner Version 4.2 (2001).§ Further study the interaction and the
importance of the risk factors
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 20
Data Mining Process: Data Integration-- Case Study
§ Variable Importance MeasureVariable Importance
Participation Status 1.00
Gender 0.75
Annuity size 0.43
Pay Type 0.21
Union 0.18
Collar 0.00
11
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 21
Data Mining Process: Data exploration
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 22
Data Mining Process: Data exploration
12
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 23
§ Six risk groups: • Employees• Beneficiaries• Combined• Disabled• Male Retirees• Female Retirees.
§ Logistic regression method
Data Mining Process: Model building
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 24
Data Mining Process: Model Building --Case Study: Female “Retiree”
13
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 25
Data Mining Process: Model Building-- Case Study: Female “Retiree” Group
§ “Collar” and “Pay Type” are two important variables§ The interaction between “Collar” and “Pay
Type” does exist§ Both “annuity size” and “union” are not
picked up by tree algorithm
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 26
Data Mining Process: Model Building-- Case Study: Female “Retiree” Group
§ R-square for the regression is 0.95
PTCPTCxxp
p046.000087.026.097.17
1log 2 −+−−+−=
−
=
collarmixed
collarbluecollarwhite
C
0047.0
00
=
typepaysalarizedtypepayhourly
typepaycombinedPT
0051.0033.0
Where p is the mortality rate, x is the age
14
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 27
Data Mining Process: Model Building -- Case Study: Female “Retiree” Group
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 28
Data Mining Process: Model Building-- Case Study: Male “Retiree” Group
§ R-square for the regression is 0.92
Where p is the mortality rate, x is the age
SUUSxxp
p +−−−+−=
−
200055.020.057.141
log
=
annuitysmallannuitymedian
annuityel
S0074.0060.0
arg044.0
−=combined
memberunionnonmemberunion
U
040.0
14.00
15
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 29
Data Mining Process: SEMMA
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 30
Data Mining Process: Model Building -- Case Study: Male“Retiree”
16
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 31
Data Mining Process: Post-mining Analysis -- Case Study
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 32
Data Mining Process: Understanding your model – Case Study
§ The male retirees mortality model and the female retirees mortality model depend on different variables§ Mortality of the beneficiaries is determined by
gender, annuity size, the pay type, and their interactions § The gender factors will play a much-reduced role
in determining beneficiaries’ mortality model
17
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 33
Data Mining Process: Post-mining Analysis -- Case Study
§ Limited results on the mortality distribution for the ages above 95 § As the female demography changed in the past
three decade, variables such as annuity size, and union will play more important role in determining the female mortality§ Other risk factors such as education, life style,
smoking/non-smoking, etc.
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 34
Data Mining Process: Summary-- Case Study
§ Non Gompertz (linear growth) between age 70 and 85§ Selection of the risk factors may influence
the quality of the mortality model§ Mortality models varies with the most
important risk factor (the participating status, in this study) among all the other variables
18
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 35
Data Mining Process: -- Case Study in Claim Analysis
§ Basic risk characteristics§ Top-down identification§ Underlying statistical properties§ Domain-specific constraints
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 36
Data Mining Process: -- Case Study in ALM
§ Decision tree and DNF learning
§ Generative stochastic modeling• Probabilistic networks• Probabilistic Rules
§ Hidden Markov model
19
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 37
Data Mining Process: -- Applications in Healthcare
§ More productive managed care program§ Pricing § Individual health insurance market§ Recovery & prevention of fraudulent claims§ Prescription Drugs cost management
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 38
Quiz on Data mining
§ What is Data Mining?§ What can data mining do?§ What are data mining techniques?§ What are the applications of data mining?§ How can you practice on data mining?
20
SOA San Francisco Spring MeetingJune 24-26, 2002
Slide 39
Summary
§ Overview of data mining techniques§ Its application to actuarial practice§ Future developments§ Potential contribution to your area