View
214
Download
0
Category
Tags:
Preview:
Citation preview
March 11-12, 1999, Nashville, Tennessee
Mark Scully, Tillinghast-Towers Perrin
Use Macro to Enter TP Logo
The Use of Multivariate Analysis Techniques to Design a Class Plan
1999 CAS Seminar on Ratemaking
2
Overview of Presentation
Background
Multivariate analysis techniques: Generalized Linear Models (GLMs) Classification and Regression Trees (CART,CHAID)
Implementation Pricing Marketing Agents’ compensation Results monitoring
3
Several Factors are Converging toward Better Analysis of Customer and Prospect Attributes
Greater emphasis on pricing vs. underwriting
Increased familiarity with techniques
Faster computers
Influence of direct writers, non-standard cos.and banks
Use of multiple distribution channels
Increased competition
4
Why Multivariate Statistical Techniques?
Most rating variables are correlated.
Different variables may be showing the same underlying effect.
Repeated use of univariate techniques leads to double-counting of same effects.
Can capture interactions.
Provides more than a point estimate, also standard errors.
5
Different Rating Variables may be Manifestations of the Same Underlying Effect
DrivingIntensity
AnnualMileage
VehicleMake/Model
DriverAge
Underlying Effect Rating Variables
6
Interactions Arise when the Combined Effect of two Variables Differs from the Sum of their Single Effects
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
Age of Driver
Re
lati
vit
y
Female Male
The differential between femaleand male differs
by age.
7
Confidence Intervals Indicate the Degree of Certainty Inherent in Relativity Estimates
German Bonus-Malus: Frequency Model
0
0.5
1
1.5
2
2.5
3
Bonus-Malus Class
Rela
tivit
y
8
Confidence Intervals Indicate the Degree of Certainty Inherent in Relativity Estimates
Territorial Relativies: Frequency Model
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
1 2 3 4 5 6 7 8 9 10
Territory
Rela
tivit
y
9
Confidence Intervals Indicate the Degree of Certainty Inherent in Relativity Estimates
Territorial Relativies: Severity Model
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 2 3 4 5 6 7 8 9 10
Territory
Rela
tivit
y
10
Statistical Rating Techniques Indicate the Relative Explanatory Power of each Variable...
…and the extent to which variables are correlated.
Variable A
Variable B
11
What statistical techniques do we commonly use?
Generalized Linear Models (GLMs)
Classification and regression trees CHAID CART
12
What are GLMs?
Statistical procedure for measuring the effect of one or more independent variables upon a dependent variable
Dependent variables are, for ratemaking, typically: frequency and severity
GLMs allow extreme flexibility in model structure and design multiplicative or additive plans (or others) different error distributions variable interactions
Explicitly produce relativity estimates (and more)
13
Basic Theory of GLMs (I)
Let Yi, I=1,2,…,n be observations from a random variable. We model them as follows:
,)(1
exhY i
T
iii
Where:
h=the link function
xi=a vector of variables associated with the i-th observation
I=a scalar parameter (the offset)
=the parameter vector
ei=an error term(with mean equal to 0)
14
Basic Theory of GLMs (II)
Typically, the random term ei is chosen from the exponential family with density in the following general form:
),(/
)(exp)(
ycw
byyf
Where and are parameters and w the weight of each observation.
If we denote the mean of this distribution as then its variance may be expressed as V() /w, where V(•) is referred to as the variance function.
15
Basic Theory of GLMs (III)
DistributionVarianceFunction
CanonicalLink
Normal 1 Identity
Poisson Log
Binomial (1- ) Logit=log( /(1- )}
Gamma 2 Reciprocal= -1
InverseGaussian
3 -2
16
Literature on GLMs
Generalized Linear Models, Second Edition, P. McCullach and J.A. Nelder, Chapman & Hall 1989 (ISBN 0 412 31760 5)
“Statistical Motor Rating: making Effective Use of Your Data”, M.J. Brockman and T.S. Wright, JIA 119, III, 457-543 (April 1992).
“Technical Aspects of Domestic Lines Pricing”, Greg Taylor, University of Melbourne Research Paper 45 (ISBN 0 7325 1474 6)
17
GLMs-Some Practical Considerations (I)
A log link function produces multiplicative relativities.
Separate models for frequency and severity: Better understanding of data Appropriate distributions exist
Typical error distributions for frequency: Poisson/Quasi-Poisson Negative binomial
Typical distributions for severity: Normal Gamma Inverse Gaussian
18
GLMs-Some Practical Considerations (II)
Variables may be modeled as continuous covariates or categorical factors
An array of statistical and practical tests exists for model testing: Variable significance tests Quantile plots Residual plots Comparison of actual data to model
19
Comparison of Actual to Model Helps to Identify Areas Currently Under- or Overpriced
-500
0
500
1,000
1,500
2,000
16-2021-25
26-3031-35
36-4041-45
46-5051-60
61-7071-80
81-99
Age of Driver
Pu
re P
rem
ium
-30%
-10%
10%
30%
50%
70%
90%
110%
130%
150%
170%
% D
iffe
ren
ce
% Difference Indicated Current
Loss-Segments:How much do we write?Are we growing here?How many $ involved?Other reasons to stay here?
Profit-Segments:How much do we write?Are we losing business?How many $ involved?How do we get more?
20
The Significance of these Profit/Loss Areas Depends also on their Volume of Business
Gain/(Loss) in Millions of $
-10
-5
0
5
10
15
20
16-20 21-25 26-30 31-35 36-40 41-45 46-50 51-60 61-70 71-80 81-99
Age of Driver
Note: Gain/(Loss) = (Current PP - Indicated PP) x Exposures
21
What are classification and regression trees?
Procedures for successively subdividing data into homogeneous groups
Like GLMs, they use a dependent variable and one or more independent ones
Result is not necessarily symmetric
Implicitly capture the natural interactions between factors
Can produce a simpler rating plan or form a single rating variable out of many
Produces homogeneous groups(i.e., a tree structure) but no rating plan or relativities
22
Classification and Regression Trees produce an asymmetrical grouping of the data
Bestand
SF M.O1/2
Männlich.
Kfz-Alter< 2
Weiblich
SF 1-3
Typ 10-15
R & A
Typ 16-20
Garage Keine.
SF 5-10.
Typ 10-17
SF 11-15 SF 15-22
Männlich Weiblich
1 2
3
4 5
6
7 8
9 10
11
12 13
Typ 21-25
Typ 18-25
Kfz-Alter> 2 Beamten
23
Some differences between CHAID and CART
Dependent variable for CHAID must be categorical; for CART it can be metric
Different splitting algorithm (e.g., CHAID uses a Chi-squared test using contingency tables)
CHAID splits into multiple groups, CART makes binary splits
Different stopping criteria
24
GLMS may be used to Produce a Rating Plan with Variables Generated by CART or CHAID
PotentialRating
Variables
CART/CHAID
Analysis
CART/CHAID
Variables
GLMAnalysis
25
Results from the Rating Analysis Can be Used Beyond the Production of a Rating Plan
ActuariallyOptimalModel
Constraints:•Regulatory•Agents•Stability•Competition•etc.
Rating Analysis
Rating PlanActually
Implemented
•Marketing•UW Guidelines•Agents’ Compensation•Monitoring
Recommended