Upload
sarah-mcknight
View
171
Download
26
Embed Size (px)
DESCRIPTION
GLM I: Introduction to Generalized Linear Models. By Curtis Gary Dean Vice President and Actuary,. Classical Multiple Linear Regression. Y i = a 0 + a 1 X i 1 + a 2 X i 2 …+ a m X im + e i Y i are the response variables and X ij are predictors - PowerPoint PPT Presentation
Citation preview
1
GLM I: Introduction to Generalized Linear Models
GLM I: Introduction to Generalized Linear Models
By Curtis Gary DeanVice President and Actuary,
By Curtis Gary DeanVice President and Actuary,
2
Classical Multiple Linear RegressionClassical Multiple Linear Regression
Yi = a0 + a1Xi1 + a2Xi2 …+ amXim + ei
Yi are the response variables and Xij are predictors
ei is random error term: Normally distributed
E[ei] = 0 and Var(ei) = σ2 is constant
Yi = a0 + a1Xi1 + a2Xi2 …+ amXim + ei
Yi are the response variables and Xij are predictors
ei is random error term: Normally distributed
E[ei] = 0 and Var(ei) = σ2 is constant
3
Classical Multiple Linear RegressionClassical Multiple Linear Regression
μi = E[Yi] = a0 + a1Xi1 + …+ amXim
Yi is Normally distributed random variable with variance σ2
Want to estimate μi = E[Yi] for each i
μi = E[Yi] = a0 + a1Xi1 + …+ amXim
Yi is Normally distributed random variable with variance σ2
Want to estimate μi = E[Yi] for each i
4
Response Yi has Normal Distribution
-12 -7 -2 3 8 μ i
5
Problems with Traditional ModelProblems with Traditional Model
Number of claims is discrete
Claim sizes are skewed to the right
Probability of an event is in [0,1]
Variance is not constant across data points i
Nonlinear relationship between X’s and Y’s
Number of claims is discrete
Claim sizes are skewed to the right
Probability of an event is in [0,1]
Variance is not constant across data points i
Nonlinear relationship between X’s and Y’s
6
Generalized Linear Models - GLMsGeneralized Linear Models - GLMs
Fewer restrictions
Y can model number of claims, probability of renewing, loss severity, loss ratio, etc.
Large and small policies can be put into one model
Y can be nonlinear function of X’s
Classical linear regression model is a special case
Fewer restrictions
Y can model number of claims, probability of renewing, loss severity, loss ratio, etc.
Large and small policies can be put into one model
Y can be nonlinear function of X’s
Classical linear regression model is a special case
7
Generalized Linear Models - GLMsGeneralized Linear Models - GLMs
g(μi )= a0 + a1Xi1 + …+ amXim
g( ) is the link function
E[Yi] = μi = g-1(a0 + a1Xi1 + …+ amXim)
Yi can be Normal, Poisson, Gamma, Binomial, Compound Poisson, …
Variance can be modeled
g(μi )= a0 + a1Xi1 + …+ amXim
g( ) is the link function
E[Yi] = μi = g-1(a0 + a1Xi1 + …+ amXim)
Yi can be Normal, Poisson, Gamma, Binomial, Compound Poisson, …
Variance can be modeled
8
Exponential Family of DistributionsExponential Family of Distributions
assumption commona is/)(
)()(''][
)('][
,exp , ;
wa
abYVar
bYE
yca
byyf
9
Normal Distribution in Exponential FamilyNormal Distribution in Exponential Family
22
2
2
2
2
22
2
2
2
2
2
2ln2
2/exp
2
2exp
2
1lnexp
2
)(exp
2
1),;(
yy
yy
yyf
10
Some Math Rules: RefresherSome Math Rules: Refresher
yxyx
xxx
xrx
yxxy
xxx
ex
r
x
lnln)/ln()6(
ln)ln()/1ln()5(
ln)ln()4(
lnln)ln()3(
]ln[exp]exp[ln)2(
)exp()1(
1
11
Poisson Distribution in Exponential FamilyPoisson Distribution in Exponential Family
)!ln(1
)(lnexp]Pr[
!lnexp]Pr[
!]Pr[
yy
yY
y
eyY
y
eyY
y
y
)!ln(1
)(lnexp]Pr[
!lnexp]Pr[
!]Pr[
yy
yY
y
eyY
y
eyY
y
y
12
Poisson Distribution in Exponential FamilyPoisson Distribution in Exponential Family
ed
dabYVar
eed
dbYE
aeb
e
2
2
)()(''][
)('][
1)(and)(
ln
ed
dabYVar
eed
dbYE
aeb
e
2
2
)()(''][
)('][
1)(and)(
ln
13
Compound Poisson DistributionCompound Poisson Distribution
Y = C1 + C2 + . . . + CN
N is Poisson random variable Ci are i.i.d. with Gamma distribution
This is an example of a Tweedie distribution
Y is member of Exponential Family
Y = C1 + C2 + . . . + CN
N is Poisson random variable Ci are i.i.d. with Gamma distribution
This is an example of a Tweedie distribution
Y is member of Exponential Family
14
Members of the Exponential FamilyMembers of the Exponential Family
• Normal• Poisson• Binomial• Gamma• Inverse Gaussian• Compound Poisson
15
Variance StructureVariance Structure
E[Yi] = μi = b' (θi) → θi = b' (-1)(μi )
Var[Yi] = a(Φi) b' ' (θi) = a(Φi) V(μi)
Common form: Var[Yi] = Φ V(μi)/wi
Φ is constant across data but weights applied to data points
E[Yi] = μi = b' (θi) → θi = b' (-1)(μi )
Var[Yi] = a(Φi) b' ' (θi) = a(Φi) V(μi)
Common form: Var[Yi] = Φ V(μi)/wi
Φ is constant across data but weights applied to data points
16
V(μ) Normal 0
Poisson Binomial (1-) Tweedie p, 1<p<2 Gamma 2
Inverse Gaussian 3
Recall: Var[Yi] = Φ V(μi)/wi
Variance Functions V(μ)
17
Variance at Point and Fit Variance at Point and Fit
A Practioner’s Guide to Generalized Linear Models: A CAS Study Note
18
Why Exponential Family?Why Exponential Family?
Distributions in Exponential Family can model a variety of problems
Standard algorithm for finding coefficients a0, a1, …, am
Distributions in Exponential Family can model a variety of problems
Standard algorithm for finding coefficients a0, a1, …, am
19
Modeling Number of ClaimsModeling Number of Claims
Number ofPolicy Sex Territory Claims in 5 Years
1 M 02 02 F 01 03 F 01 04 F 02 15 F 01 06 F 02 17 M 02 28 M 02 29 M 02 1
10 F 01 1: : : :
20
Assume a Multiplicative ModelAssume a Multiplicative Model
μi = expected number of claims in five years
μi = BF,01 x CSex(i) x CTerr(i)
If i is Female and Terr 01
→ μi = BF,01 x 1.00 x 1.00
μi = expected number of claims in five years
μi = BF,01 x CSex(i) x CTerr(i)
If i is Female and Terr 01
→ μi = BF,01 x 1.00 x 1.00
21
Multiplicative ModelMultiplicative Model
μi = exp(a0 + aSXS(i) + aTXT(i))
μi = exp(a0) x exp(aSXS(i)) x exp( aTXT(i))
i is Female → XS(i) = 0; Male → XS(i) = 1
i is Terr 01 → XT(i) = 0; Terr 02 → XT(i) = 1
μi = exp(a0 + aSXS(i) + aTXT(i))
μi = exp(a0) x exp(aSXS(i)) x exp( aTXT(i))
i is Female → XS(i) = 0; Male → XS(i) = 1
i is Terr 01 → XT(i) = 0; Terr 02 → XT(i) = 1
22
Natural Log Link FunctionNatural Log Link Function
ln(μi) = a0 + aSXS(i) + aTXT(i)
μi is in (0 , ∞ )
ln(μi) is in ( - ∞ , ∞ )
ln(μi) = a0 + aSXS(i) + aTXT(i)
μi is in (0 , ∞ )
ln(μi) is in ( - ∞ , ∞ )
23
Poisson Distribution in Exponential FamilyPoisson Distribution in Exponential Family
eb
yy
yY
)(
ln
)!ln(1
lnexp]Pr[
eb
yy
yY
)(
ln
)!ln(1
lnexp]Pr[
24
Natural Log is Canonical Link for PoissonNatural Log is Canonical Link for Poisson
θi = ln(μi)
θi = a0 + aSXS(i) + aTXT(i)
θi = ln(μi)
θi = a0 + aSXS(i) + aTXT(i)
25
Estimating Coefficients a1, a2, .., amEstimating Coefficients a1, a2, .., am
Classical linear regression uses least squares
GLMs use Maximum Likelihood Method
Classical linear regression uses least squares
GLMs use Maximum Likelihood Method
26
Likelihood and Log LikelihoodLikelihood and Log Likelihood
)];(ln[,..),..;(
,..)],..;(ln[,..),..;(
);(,..),..;(
111
1111
111
ii
n
i
n
iii
yfy
yLy
yfyL
27
Find a0, aS, and aT for PoissonFind a0, aS, and aT for Poisson
n
1i
a)()(0
111
)()(0)(a
:Maximize
)!ln(,..),..;(
iTTiSS
i
xaxaiiTTiSS
i
n
iii
eyxaxa
yeyy
28
Iterative Numerical Procedure to Find ai’sIterative Numerical Procedure to Find ai’s
Use statistical package or actuarial software
Specify link function and distribution type
Use statistical package or actuarial software
Specify link function and distribution type
29
Solution to Our ExampleSolution to Our Example
a0 = -.288 → exp(-.288) = .75 aS = .262 → exp(.262) = 1.3 aT = .095 → exp(.095) = 1.1
μi = exp(a0) x exp(aSXS(i)) x exp( aTXT(i))
μi = .75 x 1.3XS(i) x 1.1XT(i)
i is Male,Terr 01 → μi = .75 x 1.31 x 1.10
a0 = -.288 → exp(-.288) = .75 aS = .262 → exp(.262) = 1.3 aT = .095 → exp(.095) = 1.1
μi = exp(a0) x exp(aSXS(i)) x exp( aTXT(i))
μi = .75 x 1.3XS(i) x 1.1XT(i)
i is Male,Terr 01 → μi = .75 x 1.31 x 1.10
30
Other Common Link FunctionsOther Common Link Functions
Identity link: canonical link for Normal
Logit link ln[ μ / (1- μ) ]: canonical link for Binomial maps (0,1) onto ( -∞, ∞)
Reciprocal 1/μ: canonical link for Gamma
But, you do not have to match link and distribution. Choose link and distribution that make sense!
Identity link: canonical link for Normal
Logit link ln[ μ / (1- μ) ]: canonical link for Binomial maps (0,1) onto ( -∞, ∞)
Reciprocal 1/μ: canonical link for Gamma
But, you do not have to match link and distribution. Choose link and distribution that make sense!
31
Measuring Goodness of FitMeasuring Goodness of Fit
Deviance is measure of goodness of fitCompare different modelsCompare effect of dropping/adding Xij’s
Also called log likelihood ratio statistic
Deviance is measure of goodness of fitCompare different modelsCompare effect of dropping/adding Xij’s
Also called log likelihood ratio statistic
32
More on DevianceMore on Deviance
)]ˆ,..,ˆ;,..,(),..,;,..,([2
model your from parameters:,..,
data from estimated be can as parameters
many as withmodel saturated:,..,
)],..,;,..,(),..,;,..,([2
m1111
m1
max1
m11max11
nnn
nn
yyyyyyD
aa
aa
aayyaayyD
33
Generalized Linear Models - GLMsGeneralized Linear Models - GLMs
Response Y can be discrete or continuous
Y can be number of claims, probability of renewing, loss severity, loss ratio, etc.
Large and small policies can be put into one model.
Y can be nonlinear function of X’s
Measure goodness of fit using Deviance
Response Y can be discrete or continuous
Y can be number of claims, probability of renewing, loss severity, loss ratio, etc.
Large and small policies can be put into one model.
Y can be nonlinear function of X’s
Measure goodness of fit using Deviance
34