24
Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Embed Size (px)

DESCRIPTION

Agenda Project Overview Prior to Modeling Modeling Business Issues

Citation preview

Page 1: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Special Challenges With Large Data Mining Projects

CAS PREDICTIVE MODELING SEMINAR

Beth FitzgeraldISO

October 2006

Page 2: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006
Page 3: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Agenda

•Project Overview•Prior to Modeling•Modeling•Business Issues

Page 4: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Development of a Model - Project Overview

•Data•Statistical Tools•Computer Capacity•Team Skills–Data management –Analytical/statistical– Technology–Business Knowledge

Page 5: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Prior to Modeling

•Formulate the Problem•Evaluate Possible Data Sources•Prepare the Data•Develop Understanding of Modeling

Procedures and Diagnostics•Explore the Data with Simple Modeling

Techniques

Page 6: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

What percent of a model building project is the data preparation and

data management? 25% 50% 75% 85%

Page 7: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Formulate the Problem

•What problem are you trying to solve?•What results do you expect to see?•How will you know if the results are

reasonable?

Page 8: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Prepare the Data•Do quality checks in level of detail needed

for project•Understand how to prepare individual

variables for use in models•Need to be practical about number of

classification categories models can handle•Need to decide on truncation and bucketing

of variables that are continuous•Create new variables

Page 9: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Develop Understanding of Modeling Procedures and Diagnostics

•Basic modeling training – GLM, Data Mining•What software is available? •What software/models work for my data

investigation, modeling problem, etc.•What computer capacity do I need?•Learn how to use software •Learn how to interpret the diagnostics

Page 10: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Development of a Model• Analyze historical policy and loss data– Policy level detail– Location level detail

• Link policy and loss data with external and/or internal data:– Specific business risk data – operational,

financial – Specific location data – demographic,

weather– Other data – building, vehicle, agency

• Need link between policy detail and other data

Page 11: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Explore the Data with Simple Modeling Techniques

•Start with sample of data•Try different classical analysis on

sample such as:– regression– linear models– correlation matrices

•Make use of graphical options to explore data

Page 12: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Data Management Issues

•Matching additional internal policy information to premium/loss data– Different points in time– Tracking & balancing audited exposures

•Different summarization keys – handling of mid-term endorsements•Address scrubbing •Matching to external data for correct point in

time• Significance of missing values within variable

Page 13: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Modeling Activities

•Selection of Predictors – variable elimination, variable transformation•Start with classical models prior to

evaluating more complex models•Methodology Understanding and

Evaluation•Evaluation of Model Performance

Page 14: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Data Mining Techniques

Balance good fit with explanatory power

•Generalized Linear Models•Classification Trees•Regression Trees•Multivariate Adaptive Regression

Splines•Neural Networks

Page 15: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Data Mining Process

BusinessKnowledge

Data Linking

Data Cleansing

Analyze Variables

Determine Predictive Variables

Evaluation

Data Gathering

Data Mining

Page 16: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Model Performance

•Lift Curve Analysis– Score all risks in sample –Rank risks by score from Bad to

Good–Compare loss ratio of risks in each

decile to loss ratio for all risks

Page 17: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Sample Lift Curve AnalysisRelative Loss Ratio Lift

Optimal Model

0.7

0.8

0.9

1

1.1

1.2

1.3

1 2 3 4 5 6 7 8 9 10

Decile of Worst to Best Risk

Loss

Rat

io R

elat

ivity

LR Relativity by Decile

Page 18: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Business Issues

• Model uses information from a third-party vendor• Model needs to be accessible

electronically• Technology Issues• Implementation Decisions

Page 19: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Technology Issues

• Develop/Modify Systems • Integrate into underwriting/rating workflow– Decision process– Agency system

• Decide on technology– Web-based interface– API, FTP, MQ, TCP/IP, HTTPS webservices

Page 20: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Implementation of ModelSolution focus/usage:

• Suitability of risk for underwriting decision

• Source for additional pricing factors• Consistency in underwriting/pricing

decisions • Compliance with regulations based on

implementation decision• Consider model alone or model with

other information available from application

Page 21: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Implementation of Model

Workflows:• Underwriting– New Business– Renewal business

• Rating– Pricing– Coverage Adjustment

Page 22: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Business Implementation of Model

• Strategic Plan - need management involvement • Prepare Announcement/Training Material

for Internal & External Customers•Coordinate Implementation •Monitor Feedback/Adjust Implementation

Page 23: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006

Future Plans

•Determine Process for Updates to Model–Use of Updated Data–Use of New Data Variables–Use of New Techniques

Page 24: Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006