View
220
Download
0
Category
Tags:
Preview:
Citation preview
Free and Cheap Sources of External Data
CAS 2007 Predictive Modeling Seminar
Louise Francis, FCAS, MAAAFrancis Analytics and Actuarial Data Mining, Inc.
Louise_francis@msn.com
www.data-mines.com
Objectives
• Information sharing
• Introduce some useful sources of data to augment company internal databases
• Show examples of applications using external data
Why Augment Data?
• For small companies, new lines of business, internal data may not be sufficient
• Add variables (i.e, demographic and economic) that are not in data
Some Kinds of External Data
• Demographic• Geographic• Economic
– Unemployment rate, avg wage, etc– Financial Market
• Insurance data• Occupational• Weather
Zip Code Level Data
• Census bureau web site, www.census.gov has a wealth of information
• May require some processing effort to put into useful format for analysis
• For a small fee there are vendors who pre-process some of the useful data
• One of them is zip-codes.com
Zip-codes.com
Some Useful Variables
• Average Income
• Population
• Average house value
• # people per house
• Latitude, longitude– Use to compute distances
• City, county
Distance formula
The Data
ZipCode PrimaryRecord ZipCodePopulation HouseholdsPerZipcode WhitePopulation
90071 P 6 0.0000 3.0090010 P 1,943 996.0000 780.0090014 P 3,518 2,587.0000 868.0091608 P 0 0.0000 0.0090015 P 15,134 5,339.0000 4,485.00
California Auto Data by ZIP
• BI Exposures
• BI Losses
• BI Claims
• PD Exposures
• PD Losses
• PD Claims
CAARP Data
• CAARP data
• California Auto Assigned Risk Plan
• Collected by state
• Aggregated data
• Request from Statistical Analysis Division of department
California Proposed Changes to Territory Rating
Effect of Change by County
Effect of Change by Pure Premium Group
Instruction Set 1 Pct Change Instruction Set 2 Pct Change Instruction Set 3Pct Change * Percentile Group of PPBI
Mean
7.889% 9.516% 18.600%
5.143% 6.416% 11.250%
2.381% 3.050% 5.507%
-.124% -.225% -.125%
-4.240% -5.448% -8.777%
2.136% 2.575% 5.110%
Percentile Group of PPBI1
2
3
4
5
Total
Instruction Set1 Pct Change
Instruction Set2 Pct Change
Instruction Set3 Pct Change
Effect of Change by Average House Value
Instruction Set 1 Pct Change Instruction Set 2 Pct Change InstructionSet 3 Pct Change * Percentile Group of AverageHouseValue
Mean
3.308% 4.101% 8.126%
2.117% 2.986% 5.336%
2.393% 3.121% 5.478%
2.936% 3.603% 6.100%
2.369% 2.945% 4.598%
2.739% 3.498% 6.411%
Percentile Group ofAverageHouseValue1
2
3
4
5
Total
Instruction Set1 Pct Change
Instruction Set2 Pct Change
Instruction Set3 Pct Change
Effect of Change by Average Income
Instruction Set 1 Pct Change Instruction Set 2 Pct Change Instruction Set3 Pct Change * Percentile Group of IncomePerHousehold
Mean
3.450% 4.203% 8.755%
3.001% 4.046% 7.119%
2.298% 2.973% 5.276%
1.615% 2.241% 3.384%
2.518% 3.080% 4.278%
2.739% 3.498% 6.411%
Percentile Group ofIncomePerHousehold1
2
3
4
5
Total
Instruction Set1 Pct Change
Instruction Set2 Pct Change
Instruction Set3 Pct Change
The Data used for Fraud Model
Described in “Distinguishing the Forest From the Trees”, Derrig and Francis, 2005 CAS Winter Forum
The Fraud Surrogates used as Dependent Variables
• Independent Medical Exam (IME) requested
• Special Investigation Unit (SIU) referral– (IME successful)– (SIU successful)
• Data: Detailed Auto Injury Claim Database for Massachusetts
• Accident Years (1995-1997)
Predictor Variables
• Claim file variables– Provider bill, Provider type– Injury
• Derived from claim file variables– Attorneys per zip code– Docs per zip code
• Using external data– Average household income– Households per zip
Neural Network Ranking of Variables
Variable Rank Sensitivity Statistic Importance
Health Insurance 1 1.01335 100%Provider 2 Bill 2 1.00987 74%Provider 1 Bill 3 1.00681 51%Territory 4 1.00652 49%Attorneys/Zip 5 1.00507 38%Injury Type 6 1.00396 30%Report Lag 7 1.00303 23%Provider 2 Type 8 1.00272 20%Provider 1 Type 9 1.00210 16%Tretment Lag 10 1.00198 15%Households/Zip 11 1.00156 12%Attorney 12 1.00051 4%Emergency Treatment 13 1.00034 3%Claimants per City 14 1.00025 2%Providers/Zip 15 1.00024 2%Age 16 1.00018 1%Providers per City 17 1.00016 1%Distance 18 1.00010 1%
Variable Importance for IME Requested for 3 Methods
Rank Treenet MARS S-Plus Neural
1 Provider 2 Bill Health Insurance Health Insurance2 Attorneys/Zip Provider 2 Bill Provider 2 Bill3 Territory Injury Type Provider 1 Bill4 Health Insurance Report Lag Territory5 Injury Type Provider 1 Bill Attorneys/Zip6 Provider 1 Bill Tretment Lag Injury Type7 Provider 1 Type Providers per City Report Lag8 Report Lag Avg Household Price Provider 2 Type9 Attorney Territory Provider 1 Type
10 Age Attorney Tretment Lag11 Provider 2 Type Providers/Zip Households/Zip
Variable Importance (IME) Based on Average of Methods
Important Variable Summarizations for IME Tree Models, Other Models and Total
Total Score
Tree Score
Other Score
Variable Variable type
Total Score Rank Rank Rank
Health Insurance F 16529 1 2 1 Provider 2 Bill F 12514 2 1 3 Injury Type F 10311 3 3 2 Territory F 5180 4 4 7 Provider 2 Type F 4911 5 6 4 Provider 1 Bill F 4711 6 5 5 Attorneys Per Zip DV 2731 7 7 14 Report Lag DV 2650 8 10 8 Treatment Lag DV 2638 9 13 6 Claimant per City DV 2383 10 12 9 Provider 1 Type F 1794 11 9 13 Providers per City DV 1708 12 11 11 Attorney F 1642 13 8 16 Distance MP1 Zip to Clt Zip DV 1134 14 18 10 AGE F 1048 15 17 12 Avg. Household Price/Zip DM 907 16 16 15 Emergency Treatment F 660 17 14 18 Income Household/Zip DM 329 18 15 20 Providers/Zip DV 288 19 20 17 Household/Zip DM 242 20 19 19 Policy Type F 4 21 21 21
Trends Using External Information
• People still rely on Masterson’s indices and other indices based on the CPI
• Shortcomings– Hedonic adjustment– Substitution– Imputed rental cost– Geometric chaining– See www.shadowstats.com or Getting Prices Right by Economic Policy
Institute and Dean Baker • Insurance inflation has typically been much higher than these
indications• Many need reliable trend indications on smaller segments of their
data• Trend is another weak link in the modeling process
Questions?
Recommended