24
Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. [email protected] www.data-mines.com

Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Embed Size (px)

Citation preview

Page 1: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Free and Cheap Sources of External Data

CAS 2007 Predictive Modeling Seminar

Louise Francis, FCAS, MAAAFrancis Analytics and Actuarial Data Mining, Inc.

[email protected]

www.data-mines.com

Page 2: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Objectives

• Information sharing

• Introduce some useful sources of data to augment company internal databases

• Show examples of applications using external data

Page 3: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Why Augment Data?

• For small companies, new lines of business, internal data may not be sufficient

• Add variables (i.e, demographic and economic) that are not in data

Page 4: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Some Kinds of External Data

• Demographic• Geographic• Economic

– Unemployment rate, avg wage, etc– Financial Market

• Insurance data• Occupational• Weather

Page 5: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Zip Code Level Data

• Census bureau web site, www.census.gov has a wealth of information

• May require some processing effort to put into useful format for analysis

• For a small fee there are vendors who pre-process some of the useful data

• One of them is zip-codes.com

Page 6: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Zip-codes.com

Page 7: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Some Useful Variables

• Average Income

• Population

• Average house value

• # people per house

• Latitude, longitude– Use to compute distances

• City, county

Page 8: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Distance formula

Page 9: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

The Data

ZipCode PrimaryRecord ZipCodePopulation HouseholdsPerZipcode WhitePopulation

90071 P 6 0.0000 3.0090010 P 1,943 996.0000 780.0090014 P 3,518 2,587.0000 868.0091608 P 0 0.0000 0.0090015 P 15,134 5,339.0000 4,485.00

Page 10: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

California Auto Data by ZIP

• BI Exposures

• BI Losses

• BI Claims

• PD Exposures

• PD Losses

• PD Claims

Page 11: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

CAARP Data

• CAARP data

• California Auto Assigned Risk Plan

• Collected by state

• Aggregated data

• Request from Statistical Analysis Division of department

Page 12: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

California Proposed Changes to Territory Rating

Page 13: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Effect of Change by County

Page 14: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Effect of Change by Pure Premium Group

Instruction Set 1 Pct Change Instruction Set 2 Pct Change Instruction Set 3Pct Change * Percentile Group of PPBI

Mean

7.889% 9.516% 18.600%

5.143% 6.416% 11.250%

2.381% 3.050% 5.507%

-.124% -.225% -.125%

-4.240% -5.448% -8.777%

2.136% 2.575% 5.110%

Percentile Group of PPBI1

2

3

4

5

Total

Instruction Set1 Pct Change

Instruction Set2 Pct Change

Instruction Set3 Pct Change

Page 15: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Effect of Change by Average House Value

Instruction Set 1 Pct Change Instruction Set 2 Pct Change InstructionSet 3 Pct Change * Percentile Group of AverageHouseValue

Mean

3.308% 4.101% 8.126%

2.117% 2.986% 5.336%

2.393% 3.121% 5.478%

2.936% 3.603% 6.100%

2.369% 2.945% 4.598%

2.739% 3.498% 6.411%

Percentile Group ofAverageHouseValue1

2

3

4

5

Total

Instruction Set1 Pct Change

Instruction Set2 Pct Change

Instruction Set3 Pct Change

Page 16: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Effect of Change by Average Income

Instruction Set 1 Pct Change Instruction Set 2 Pct Change Instruction Set3 Pct Change * Percentile Group of IncomePerHousehold

Mean

3.450% 4.203% 8.755%

3.001% 4.046% 7.119%

2.298% 2.973% 5.276%

1.615% 2.241% 3.384%

2.518% 3.080% 4.278%

2.739% 3.498% 6.411%

Percentile Group ofIncomePerHousehold1

2

3

4

5

Total

Instruction Set1 Pct Change

Instruction Set2 Pct Change

Instruction Set3 Pct Change

Page 17: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

The Data used for Fraud Model

Described in “Distinguishing the Forest From the Trees”, Derrig and Francis, 2005 CAS Winter Forum

Page 18: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

The Fraud Surrogates used as Dependent Variables

• Independent Medical Exam (IME) requested

• Special Investigation Unit (SIU) referral– (IME successful)– (SIU successful)

• Data: Detailed Auto Injury Claim Database for Massachusetts

• Accident Years (1995-1997)

Page 19: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Predictor Variables

• Claim file variables– Provider bill, Provider type– Injury

• Derived from claim file variables– Attorneys per zip code– Docs per zip code

• Using external data– Average household income– Households per zip

Page 20: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Neural Network Ranking of Variables

Variable Rank Sensitivity Statistic Importance

Health Insurance 1 1.01335 100%Provider 2 Bill 2 1.00987 74%Provider 1 Bill 3 1.00681 51%Territory 4 1.00652 49%Attorneys/Zip 5 1.00507 38%Injury Type 6 1.00396 30%Report Lag 7 1.00303 23%Provider 2 Type 8 1.00272 20%Provider 1 Type 9 1.00210 16%Tretment Lag 10 1.00198 15%Households/Zip 11 1.00156 12%Attorney 12 1.00051 4%Emergency Treatment 13 1.00034 3%Claimants per City 14 1.00025 2%Providers/Zip 15 1.00024 2%Age 16 1.00018 1%Providers per City 17 1.00016 1%Distance 18 1.00010 1%

Page 21: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Variable Importance for IME Requested for 3 Methods

Rank Treenet MARS S-Plus Neural

1 Provider 2 Bill Health Insurance Health Insurance2 Attorneys/Zip Provider 2 Bill Provider 2 Bill3 Territory Injury Type Provider 1 Bill4 Health Insurance Report Lag Territory5 Injury Type Provider 1 Bill Attorneys/Zip6 Provider 1 Bill Tretment Lag Injury Type7 Provider 1 Type Providers per City Report Lag8 Report Lag Avg Household Price Provider 2 Type9 Attorney Territory Provider 1 Type

10 Age Attorney Tretment Lag11 Provider 2 Type Providers/Zip Households/Zip

Page 22: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Variable Importance (IME) Based on Average of Methods

Important Variable Summarizations for IME Tree Models, Other Models and Total

Total Score

Tree Score

Other Score

Variable Variable type

Total Score Rank Rank Rank

Health Insurance F 16529 1 2 1 Provider 2 Bill F 12514 2 1 3 Injury Type F 10311 3 3 2 Territory F 5180 4 4 7 Provider 2 Type F 4911 5 6 4 Provider 1 Bill F 4711 6 5 5 Attorneys Per Zip DV 2731 7 7 14 Report Lag DV 2650 8 10 8 Treatment Lag DV 2638 9 13 6 Claimant per City DV 2383 10 12 9 Provider 1 Type F 1794 11 9 13 Providers per City DV 1708 12 11 11 Attorney F 1642 13 8 16 Distance MP1 Zip to Clt Zip DV 1134 14 18 10 AGE F 1048 15 17 12 Avg. Household Price/Zip DM 907 16 16 15 Emergency Treatment F 660 17 14 18 Income Household/Zip DM 329 18 15 20 Providers/Zip DV 288 19 20 17 Household/Zip DM 242 20 19 19 Policy Type F 4 21 21 21

Page 23: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Trends Using External Information

• People still rely on Masterson’s indices and other indices based on the CPI

• Shortcomings– Hedonic adjustment– Substitution– Imputed rental cost– Geometric chaining– See www.shadowstats.com or Getting Prices Right by Economic Policy

Institute and Dean Baker • Insurance inflation has typically been much higher than these

indications• Many need reliable trend indications on smaller segments of their

data• Trend is another weak link in the modeling process

Page 24: Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc

Questions?