20
1 Data Mining Techniques & Its Applications in Insurance Society of Actuaries San Francisco Spring Meeting June 24 - 26, 2002 Lijia Guo, PhD, ASA, MAAA University of Central Florida Session 11L SOA San Francisco Spring Meeting June 24-26, 2002 Slide 2 Learning Objectives § Understanding a Data Mining Process § Having insight about the actuarial applications of data mining techniques § Exploring the perspective of applying data mining techniques in your own practice

San Francisco - Data Mining Techniques in Actuarial Modeling

  • Upload
    tommy96

  • View
    501

  • Download
    2

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: San Francisco - Data Mining Techniques in Actuarial Modeling

1

Data Mining Techniques & Its Applications in Insurance

Society of ActuariesSan Francisco Spring Meeting

June 24 - 26, 2002

Lijia Guo, PhD, ASA, MAAAUniversity of Central Florida

Session 11L

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 2

Learning Objectives

§ Understanding a Data Mining Process§ Having insight about the actuarial

applications of data mining techniques § Exploring the perspective of applying data

mining techniques in your own practice

Page 2: San Francisco - Data Mining Techniques in Actuarial Modeling

2

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 3

Agenda

§ Introduction§ Data Mining Methods§ Actuarial Applications§ Conclusions & Questions

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 4

Introduction

§ Changes in Information Technology § Availability of large quantity of insurance

data§ Mind your business by mining your data

Page 3: San Francisco - Data Mining Techniques in Actuarial Modeling

3

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 5

What is Data Mining?

§ An information discovery process.• Prediction

-- Finding unknown values/relationships/patterns from known large database

• Description-- interpretation of a large database

§ Making crucial business decisions - turn the newfound knowledge into actionable results

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 6

Why Use Data Mining?

§ Product development§ Marketing§ Analysis of Claims Distribution§ Healthcare§ ALM§ Fraud detection§ Solvency analysis

Page 4: San Francisco - Data Mining Techniques in Actuarial Modeling

4

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 7

Data Mining Methods

§ Classification§ Regression§ Clustering§ Summarizations§ Dependency modeling§ Deviation Detection

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 8

Data Mining Algorithms

§ Decision Trees (Breiman et al., 1984)§ Logistic regression (Hosmer & Lemeshow,

1989) § Neural Networks (Bishop, 1995; Ripley, 1996) § Fuzzy Logics§ Genetic Algorithms (Goldberg, 1989)§ Bayesian analysis, (Cheeseman et al., 1988)§ Hybrid algorithms

Page 5: San Francisco - Data Mining Techniques in Actuarial Modeling

5

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 9

Data Mining Algorithms-- Decision Trees

§ What are decision trees§ How decision trees work

• Choosing variables• Grouping• Creating the leaf nodes of the tree

§ Strengths and weaknesses

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 10

Data Mining Algorithms-- Neural Networks

§ What are Neural Networks§ How Neural Networks work

• Processing elements• Training• Predicting

§ Strengths and weaknesses

Page 6: San Francisco - Data Mining Techniques in Actuarial Modeling

6

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 11

Data Mining Algorithms-- Hybrid Algorithms

§ Problems with standard algorithms§ Advanced algorithms§ Discovery-driven approaches§ Mixture of algorithms

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 12

Data Mining: Knowledge Discovery Process

§ Data Acquisition§ Data integration§ Data exploration § Model building§ Understanding your model§ Post-mining analysis

Page 7: San Francisco - Data Mining Techniques in Actuarial Modeling

7

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 13

Data Mining Process: Data Acquisition

§ Data acquisition• Getting your data• Data qualification issues• Data quality issues• Data derivation

§ Defining a study• Basic Risk Characteristics

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 14

Data Mining Process:Data Acquisition -- Case Study

§ SOA database for RP-2000 Mortality Tables• 10,957,103 exposed life-years

§ Subset of the database that includes all the lives above age 70 (3,769,956 exp, 217,490 death)§ Risk groups

• Age, gender, participation status, union, pay type, collar type, and annuity amount, etc.

Page 8: San Francisco - Data Mining Techniques in Actuarial Modeling

8

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 15

Data Mining Process:Data Acquisition -- Case Study

§ Existing study on advanced-age mortality• Smooth extension of the patterns • Families of curves - Gompertz law, etc.• All these approaches aim at explaining the age

pattern of mortality.

§ Mortality distribution varies among seniors with different backgrounds

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 16

Data Mining Process: Data Integration

§ To identify the factors that influence mortality§ To study the interaction of the risk factors§ To gain the perspective on the importance

of these factors

Page 9: San Francisco - Data Mining Techniques in Actuarial Modeling

9

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 17

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 18

Data Mining Process: Data Integration-- Case Study

§ Main effect exists for all six variables considered§ Degrees of the effects of the risk factors are

different.• the interaction of these factors • the importance of the factors

Page 10: San Francisco - Data Mining Techniques in Actuarial Modeling

10

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 19

Data Mining Process: Data exploration

§ Decision tree algorithm • Analyze the influences and the importance of

the mortality risk factors • observations are grouped into several segments

§ Algorithm - SAS/Enterprise Miner Version 4.2 (2001).§ Further study the interaction and the

importance of the risk factors

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 20

Data Mining Process: Data Integration-- Case Study

§ Variable Importance MeasureVariable Importance

Participation Status 1.00

Gender 0.75

Annuity size 0.43

Pay Type 0.21

Union 0.18

Collar 0.00

Page 11: San Francisco - Data Mining Techniques in Actuarial Modeling

11

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 21

Data Mining Process: Data exploration

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 22

Data Mining Process: Data exploration

Page 12: San Francisco - Data Mining Techniques in Actuarial Modeling

12

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 23

§ Six risk groups: • Employees• Beneficiaries• Combined• Disabled• Male Retirees• Female Retirees.

§ Logistic regression method

Data Mining Process: Model building

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 24

Data Mining Process: Model Building --Case Study: Female “Retiree”

Page 13: San Francisco - Data Mining Techniques in Actuarial Modeling

13

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 25

Data Mining Process: Model Building-- Case Study: Female “Retiree” Group

§ “Collar” and “Pay Type” are two important variables§ The interaction between “Collar” and “Pay

Type” does exist§ Both “annuity size” and “union” are not

picked up by tree algorithm

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 26

Data Mining Process: Model Building-- Case Study: Female “Retiree” Group

§ R-square for the regression is 0.95

PTCPTCxxp

p046.000087.026.097.17

1log 2 −+−−+−=

=

collarmixed

collarbluecollarwhite

C

0047.0

00

=

typepaysalarizedtypepayhourly

typepaycombinedPT

0051.0033.0

Where p is the mortality rate, x is the age

Page 14: San Francisco - Data Mining Techniques in Actuarial Modeling

14

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 27

Data Mining Process: Model Building -- Case Study: Female “Retiree” Group

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 28

Data Mining Process: Model Building-- Case Study: Male “Retiree” Group

§ R-square for the regression is 0.92

Where p is the mortality rate, x is the age

SUUSxxp

p +−−−+−=

200055.020.057.141

log

=

annuitysmallannuitymedian

annuityel

S0074.0060.0

arg044.0

−=combined

memberunionnonmemberunion

U

040.0

14.00

Page 15: San Francisco - Data Mining Techniques in Actuarial Modeling

15

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 29

Data Mining Process: SEMMA

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 30

Data Mining Process: Model Building -- Case Study: Male“Retiree”

Page 16: San Francisco - Data Mining Techniques in Actuarial Modeling

16

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 31

Data Mining Process: Post-mining Analysis -- Case Study

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 32

Data Mining Process: Understanding your model – Case Study

§ The male retirees mortality model and the female retirees mortality model depend on different variables§ Mortality of the beneficiaries is determined by

gender, annuity size, the pay type, and their interactions § The gender factors will play a much-reduced role

in determining beneficiaries’ mortality model

Page 17: San Francisco - Data Mining Techniques in Actuarial Modeling

17

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 33

Data Mining Process: Post-mining Analysis -- Case Study

§ Limited results on the mortality distribution for the ages above 95 § As the female demography changed in the past

three decade, variables such as annuity size, and union will play more important role in determining the female mortality§ Other risk factors such as education, life style,

smoking/non-smoking, etc.

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 34

Data Mining Process: Summary-- Case Study

§ Non Gompertz (linear growth) between age 70 and 85§ Selection of the risk factors may influence

the quality of the mortality model§ Mortality models varies with the most

important risk factor (the participating status, in this study) among all the other variables

Page 18: San Francisco - Data Mining Techniques in Actuarial Modeling

18

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 35

Data Mining Process: -- Case Study in Claim Analysis

§ Basic risk characteristics§ Top-down identification§ Underlying statistical properties§ Domain-specific constraints

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 36

Data Mining Process: -- Case Study in ALM

§ Decision tree and DNF learning

§ Generative stochastic modeling• Probabilistic networks• Probabilistic Rules

§ Hidden Markov model

Page 19: San Francisco - Data Mining Techniques in Actuarial Modeling

19

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 37

Data Mining Process: -- Applications in Healthcare

§ More productive managed care program§ Pricing § Individual health insurance market§ Recovery & prevention of fraudulent claims§ Prescription Drugs cost management

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 38

Quiz on Data mining

§ What is Data Mining?§ What can data mining do?§ What are data mining techniques?§ What are the applications of data mining?§ How can you practice on data mining?

Page 20: San Francisco - Data Mining Techniques in Actuarial Modeling

20

SOA San Francisco Spring MeetingJune 24-26, 2002

Slide 39

Summary

§ Overview of data mining techniques§ Its application to actuarial practice§ Future developments§ Potential contribution to your area