Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Flexible Regression and Smoothing”Packages, Diagnostics and Algorithms
Bob Rigby Mikis Stasinopoulos
Graz University of Technology, Austria, November 2016
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 1 / 37
Outline
1 The R packages
2 The (gamlss) package and the gamlss() function
3 DiagnosticsNormalised quantile residualsWorm plots
4 Algorithms
5 End
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 2 / 37
The R packages
The R packages
gamlss the original package
gamlss.dist all gamlss.family distributions
gamlss.data different sets of data [
gamlss.add for extra additive terms
gamlss.cens for censored (left, right or interval) response variables
gamlss.demo demos for distributions and smoothing
gamlss.nl non-linear term fitting
gamlss.tr generating truncated distributions
gamlss.mx finite mixtures distributions and random effects
gamlss.spatial for spatial models
gamlss.util for extra utilities
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 3 / 37
The (gamlss) package and the gamlss() function
The gamlss() function
gamlss( formula = ~1, sigma.formula= ~1,
nu.formula = ~1, tau.formula = ~1,
family = NO(),
data = sys.parent(), weights = NULL,
contrasts = NULL, method = RS(), start.from = NULL,
mu.start = NULL, sigma.start = NULL,
nu.start = NULL, tau.start = NULL,
mu.fix = FALSE, sigma.fix = FALSE, nu.fix = FALSE,
tau.fix = FALSE, control = gamlss.control(...),
i.control = glim.control(...), ...)
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 4 / 37
The (gamlss) package and the gamlss() function
Arguments of the gamlss() function
formula = y ∼ x1 + x3sigma.fo = ∼ x1
nu.fo = ∼ x2tau.fo = ∼ 1
data = abdom
family = LO
weights = freq
method = mixed(10,50)
control = gamlss.control(trace=FALSE)
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 5 / 37
The (gamlss) package and the gamlss() function
Starting values
Generally are not needed:
start.mu= 2 or start.mu=fitted(m1, "mu")
The same applies for other parametersstart.sigma , start.nu, start.tau
Starting from a previous model
start.from= model1
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 6 / 37
The (gamlss) package and the gamlss() function
The gamlss package
Available functions
Fitting or Updating a Model
Extracting Information from the Fitted Model
Selecting a Model
Plotting and Diagnostics
Centile Estimation
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 7 / 37
The (gamlss) package and the gamlss() function
Fitting or Updating a Model
gamlss() for fitting and creating a gamlss object
refit() to refit a gamlss object (i.e. continue iterations)
update() to update a given gamlss model object
gamlssML() fitting a parametric distribution to a single (response)variable
histDist() to fit and plot a parametric distribution
fitDist() select a parametric distribution from an appropriate list
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 8 / 37
The (gamlss) package and the gamlss() function
Extracting Information from the Fitted Model
GAIC() generalised Akaike information criterion (or AIC)
coef() the linear coefficients
deviance() the global deviance −2 log Lfitted() the fitted values for a distribution parameter
predict() to predict from new data individual distribution parametervalues
predictAll() to predict from new data all the distribution parametervalues
print() : to print a gamlss object
residuals() to extract the normalised (randomised) quantile residuals
summary() to summarise the fit in a gamlss object
vcov() to extract the variance-covariance matrix of the betaestimates.
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 9 / 37
The (gamlss) package and the gamlss() function
Selecting a Model
add1() drop1() to add or drop a single term
stepGAIC() to select explanatory terms in one parameter
stepGAICAll.A() to select explanatory terms in all the parameters(strategy A)
stepGAICAll.B() to select explanatory terms in all the parameters(strategy B)
gamlssCV() to evaluate a model using cross valdation
add1TGD() drop1TGD() to add or drop a single term using a validationor test data set
stepTGD() selecting variables using a test data set the global deviancefor new (test) data set given a fitted gamlss model.
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 10 / 37
The (gamlss) package and the gamlss() function
Diagnostics
plot() a plot of four graphs for the normalized (randomized)quantile residuals
pdf.plot() for plotting the pdf functions for a given fitted gamlssobject or a given gamlss.family distribution
Q.stats() for printing and plotting the Q statistics of Royston andWright (2000).
rqres.plot() for plotting QQ-plots of different realisations ofrandomised residuals (for discrete distributions)
wp() worm plot of the residuals from a fitted gamlss object
dtop() detrended Own’s plot of the residuals
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 11 / 37
The (gamlss) package and the gamlss() function
Centile estimation
lms() a function trying to automate the process of fitting growthcurves
centiles() to plot centile curves against an x-variable.
centiles.com() to compare centiles curves for more than one object.
centiles.split() as for centiles(), but splits the plot at specifiedvalues of x.
centiles.pred() to predict and plot centile curves for new x-values.
centiles.fan() fan plot od centile curves
fitted.plot() to plot fitted values for all the parameters against anx-variable
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 12 / 37
The (gamlss) package and the gamlss() function
Other useful functions
prof.dev() the profile global deviance of one of the distributionparameters
prof.term() for plotting the profile global deviance of one of the model(beta) parameters
show.link() for showing available link functions
term.plot() for plotting additive (smoothing) terms
gen.likelihood() generates the likelihood from a GAMLSS fittedmodel [used in vcov()]
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 13 / 37
Diagnostics Normalised quantile residuals
Normalised quantile residuals-continuous response
If yi is an observation from a continuous response variable then
ûi = F (yi |θ̂i )
where F (yi |θi ) is the assumed cumulative distribution function for case i .
ri = Φ−1(ui )
So ri will have an approximately standard normal distribution
ri = ”z-scores”
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 14 / 37
Diagnostics Normalised quantile residuals
Normalised quantile residuals-continuous response
0 5 10 15
0.000.05
0.100.15
0.200.25
Y
f(Y)
u=F(y)
y
0 5 10 15
0.00.2
0.40.6
0.81.0
Y
F(Y)
y
u
−3 −2 −1 0 1 2 3
0.00.2
0.40.6
0.81.0
Z
ΦΦ((Z))
u
r
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 15 / 37
Diagnostics Normalised quantile residuals
Normalised quantile residuals-discrete response
If yi is an observation from a discrete integer response then
ûi
is a random value from the uniform distribution on the interval
[u1, u2] =[F (yi − 1|θ̂i ),F (yi |θ̂i )
]r̂i = Φ
−1(ûi )
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 16 / 37
Diagnostics Normalised quantile residuals
Normalised quantile residuals-continuous response
0 5 10 15
0.020.04
0.060.08
0.100.12
Y
f(Y)
● y
0 5 10 15
0.00.2
0.40.6
0.81.0
Y
F(Y)
u1
u2 u
y
−3 −2 −1 0 1 2 3
0.00.2
0.40.6
0.81.0
Z
ΦΦ((Z))
r
u
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 17 / 37
Diagnostics Worm plots
using plot()
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 18 / 37
Diagnostics Worm plots
Worm plots: using wp() van Buuren and Fredriks [2001]
-4 -2 0 2 4
-0.2
-0.1
0.00.1
0.2
Unit normal quantile
Deviation
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 19 / 37
Diagnostics Worm plots
Worm plots: different types
−4 −2 0 2 4
−0.4
0.2
Unit normal quantile
Dev
iatio
n(a) resid mean too small
−4 −2 0 2 4
−0.4
0.2
Unit normal quantile
Dev
iatio
n
(b) resid mean too large
−4 −2 0 2 4
−0.4
0.2
Unit normal quantile
Dev
iatio
n
(c) resid variance too high
−4 −2 0 2 4
−0.4
0.2
Unit normal quantile
Dev
iatio
n
(d) resid variance too low
−4 −2 0 2 4
−0.4
0.2
Unit normal quantile
Dev
iatio
n
(e) resid negative skewness
−4 −2 0 2 4−0
.40.
2Unit normal quantile
Dev
iatio
n
(f) resid positive skewness
−4 −2 0 2 4
−0.4
0.2
Unit normal quantile
Dev
iatio
n
(g) resid lepto−kyrtotic
−4 −2 0 2 4
−0.4
0.2
Unit normal quantile
Dev
iatio
n
(h) resid platy−kyrtotic
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 20 / 37
Diagnostics Worm plots
Multiple worm plots against different range of x-variable
−1.0
0.0
0.5
1.0
−3 −2 −1 0 1 2 3
−1.0
0.0
0.5
1.0
−3 −2 −1 0 1 2 3
−1.0
0.0
0.5
1.0
−3 −2 −1 0 1 2 3
Unit normal quantile
Devia
tion
0 5 10 15
Given : xvar
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 21 / 37
Diagnostics Worm plots
Multiple worm plots against different range of twox-variables
−1.0
0.0
0.5
1.0
−3 −1 1 3 −3 −1 1 3
−1.0
0.0
0.5
1.0−1
.00.
00.
51.
0
−3 −1 1 3 −3 −1 1 3
−1.0
0.0
0.5
1.0
Unit normal quantile
Devia
tion
6 8 10 12 14 16 18
Given : lboopen
02
46
8
Give
n : l
nosc
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 22 / 37
Diagnostics Worm plots
Worm plot conclusion
Worm plot are useful in:
checking the assumed distribution of the response variable,
checking whether the distribution is fitted correctly in all part ofexplanatory variable(s)
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 23 / 37
Algorithms
Algorithms: GAMLSS model
The model
yind∼ D(µ,σ,ν, τ )
g1 (µ) = X1β1 + s11(x11) + . . .+ s1J1(x1J1)
g2 (σ) = X2β2 + s21(x21) + . . .+ s2J2(x2J2)
g3 (ν) = X3β3 + s31(x31) + . . .+ s3J3(x3J3)
g4 (τ ) = X4β4 + s41(x41) + . . .+ s4J4(x4J4)
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 24 / 37
Algorithms
Algorithms: GAMLSS model random effects
yind∼ D(µ,σ,ν, τ )
g1 (µ) = X1β1 + Z11γ11 + . . .+ Z1k1γ1J1g2 (σ) = X2β2 + Z21γ21 + . . .+ Z2k2γ2J2g3 (ν) = X3β3 + Z31γ31 + . . .+ Z3k3γ3J3g4 (τ ) = X4β4 + Z41γ41 + . . .+ Z4k4γ4J4 ,
γkJ ∼ N(0, λ−1kJ G−1) (1)
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 25 / 37
Algorithms
Algorithms: Estimation
we will need estimates for the ‘betas’,
β = (β1,β2,β3,β4)
the ‘gammas’γ = (γ11, . . . ,γ1J1 ,γ21, . . . ,γ4J4),
and the ‘lambdas’
λ = (λ11, . . . ,λ1J1 ,λ21, . . . ,λ4J4).
For parametric models: ML estimators for βFor smoothing (random effect models) : MAP estimators for β and γ forfixed λ
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 26 / 37
Algorithms
Algorithms: posterior probability
f (β,γ, |y,λ) ∝ f (y|β,γ)f (γ|λ)∝ L(β,γ)︸ ︷︷ ︸
Likelihood
∏k
∏j
f (γkj |λkj)︸ ︷︷ ︸prior for γ
,
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 27 / 37
Algorithms
Algorithms: penalised likelihood
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 28 / 37
Algorithms
Algorithms: Estimating β and γ for fixed λ
RS a generalisation of the MADAM algorithm, Rigby andStasinopoulos (1996a)
CG a generalisation of Cole and Green (1992) algorithm
mixed a mixture of RS+CG (i.e. j iterations of RS, followed by kiterations of CG)
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 29 / 37
Algorithms
Algorithms
(a) RS
µ
σ 255
260
265
270
270
275
275
280
280
285
285
290
290
295
295
300
300
305
305
310
310
315
315
320
320
325
325
330
330
335
335
340
340
345
345
350
350
355
355
360
365
365
370
375
375
380
385
390
390
395
405
410
420
420
0.15 0.25 0.35 0.45
1.0
1.5
2.0
2.5
(b) CG
µ
σ 255
260
265
270
270
275
275
280
280
285
285
290
290
295
295
300
300
305
305
310
310
315
315
320
320
325
325
330
330
335
335
340
340
345
345
350
350
355
355
360
365
365
370
375
375
380
385
390
390
395
405
410
420
420
0.15 0.25 0.35 0.45
1.0
1.5
2.0
2.5
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 30 / 37
Algorithms
Algorithms: The RS algorithm
the outer iteration
the inner iteration (or local scoring or GLIM algorithm)
the modified backfitting algorithm
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 31 / 37
Algorithms
Algorithms: outer iteration
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 32 / 37
Algorithms
Algorithms: inner iteration
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 33 / 37
Algorithms
Algorithms: modified backfitting
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 34 / 37
Algorithms
Algorithms: properties
Advantages of algorithms
1 flexible modular fitting procedure
2 easy implementation of new distributions
3 easy implementation of new additive terms
4 simple starting values for (µ, σ, ν, τ) easily found
5 stable and reliable algorithms
6 fast fitting (for fixed hyperparameters)
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 35 / 37
Algorithms
Algorithms: Estimating λ
locally: when the method of estimation of each λkj is applied eachtime within the backfitting algorithm
globally: when the method is applied outside the RS or CG GAMLSSalgorithm.
Different methodologies for estimating the smoothing hyper-parameters:
Generalised cross validation (GCV),
Generalised Akaike information criterion (GAIC), and
Maximum likelihood based methods (ML/REML).
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 36 / 37
End
ENDfor more information see
www.gamlss.org
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing” 2015 37 / 37
The R packagesThe (gamlss) package and the gamlss() functionDiagnosticsNormalised quantile residualsWorm plots
AlgorithmsEnd