Flexible smoothing with P-splines: some...

Preview:

Citation preview

1

Flexible smoothing with P-splines: someapplications

Maria DurbanDepartment of Statistics, Universidad Carlos III de Madrid, Spain

Joint work with Raymon Carroll, Iain Currie, Paul Eilers, Jareck Harezlaand Matt Wand

Department of Economics,Bielefeld University, June 2003

2

What is this talk about?

2

What is this talk about?

• Introduction

? Smoothing? Why P-splines?? Mixed model representation of P-splines

2

What is this talk about?

• Introduction

? Smoothing? Why P-splines?? Mixed model representation of P-splines

• Applications

? Additive models? Models with heteroscedastic errors? Smoothing and correlation? Generalised additive models

2

What is this talk about?

• Introduction

? Smoothing? Why P-splines?? Mixed model representation of P-splines

• Applications

? Additive models? Models with heteroscedastic errors? Smoothing and correlation? Generalised additive models

• P-splines for longitudinal data

3

Canadian Occupational Prestige Data (B. Blishen, 1971)

Data consist of prestige scores, average income (in $1000) and education(in years) for 102 occupations.

income

pres

tige

0 5000 10000 15000 20000 25000

2040

6080

education

pres

tige

6 8 10 12 14 16

2040

6080

4

Smoothing

• Prestige score varies smoothly along the income range

• A suitable model for these data could be:

y = f(x) + ε

where x is the covariate (income) f is a smooth function of x whichdepends on λ =smoothing parameter

• Smoothing methods fall into two groups:

? Specified by the fitting procedure: Kernels? Solution of a minimisation problem: Splines

5

0 5000 10000 15000 20000 25000

income

020

4060

8010

0

pres

tige

6

P-spline• Eilers and Marx, 1996.

• They are a generalisation of ordinary regression.

• Modify the log-likelihood by a penalty on the regression coefficients.

y = f(x) + ε f(x) ≈ Ba S = (y −Ba)′(y −Ba) + λa′Pa

a = (B′B + λP )−1B′y

6

P-spline• Eilers and Marx, 1996.

• They are a generalisation of ordinary regression.

• Modify the log-likelihood by a penalty on the regression coefficients.

y = f(x) + ε f(x) ≈ Ba S = (y −Ba)′(y −Ba) + λa′Pa

a = (B′B + λP )−1B′y

P-splines receive also other names:

• Penalised splines

• pseudosplines

• low-rank smoothers

7

Basis for P-splines

B-splines, truncated polynomial basis, radial basis, etc.

7

Basis for P-splines

B-splines, truncated polynomial basis, radial basis, etc.

B-splines

• B-spline: bell-shaped like Gauss curve

• Polynomial pieces smoothly joining at the knots

7

Basis for P-splines

B-splines, truncated polynomial basis, radial basis, etc.

B-splines

• B-spline: bell-shaped like Gauss curve

• Polynomial pieces smoothly joining at the knots

Truncated polynomial

For example: truncated linear basis for knots κ1, . . . , κk is:

1,x, (x− κ1)+, . . . , (x− κk)+

8

0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

0.5

0.6

B-spline basis

0 10 20 30 40

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Scaled B-splines and their sum

0 10 20 30 40

010

2030

Truncated lines basis

9

Why P-splines?

• The number of basis functions used to construct the function estimatesdoes not grow with the sample size

• Quite insensitive to the choice of knots (Ruppert, 2000)

• Computationally simpler

• No need for backfitting in the case of additive models

• Easily extended to 2 or more dimesions and non Gaussian errors

10

Psplines: mixed model approach

10

Psplines: mixed model approach

y = f(x) + ε ε ∼ N(0, σ2R)

We write f(x) = Ba. It can be shown that Ba may be written as

Xβ︸︷︷︸fixed

+ Zu︸︷︷︸random

u ∼ N(0, σ2uI) λ = σ2/σ2

u

y = Xβ+Zu+ε Cov

[uε

]=

[σ2

uI 00 σ2R

]Cov[y] = V = Rσ2+Z ′Zσ2

u

11

Use REML for variance parameters

l(V ) = −12

log |V |−12

log |X ′V X|−y′(V −1−V −1X(X ′V X)−1X ′V −1)y,

Given R, σ2 and σ2u, β and u are solutions to:[

X ′R−1X X ′R−1ZZ ′R−1X Z′R−1Z + λI

] [βu

]=

[X ′R−1

Z ′R−1

]y.

12

Advantages

• Unified approach

• Automatic selection of smoothing parameter

• Likelihood ratio test for model selection

• Already implemented in standard sofware: Splus, SAS, R.

13

APPLICATIONS

14

Additive models: Prestige data revisited

y = f(income)︸ ︷︷ ︸X1β1+Z1u1

+ f(education)︸ ︷︷ ︸X2β2+Z2u2

= Xβ + Zu + ε Cov

[uε

]=

σ2u1

I 0 00 σ2

u2I 0

0 0 σ2I

β = (β′

1,β′2)

′ u = (u′1,u

′2)

′ X = [X1 : X2] Z = [Z1 : Z2]

15

Partial residuals plot

0 5000 10000 15000 20000 25000

income

-20

-10

010

20

part

ial r

esid

uals

6 8 10 12 14 16

education

-20

-10

010

2030

part

ial r

esid

uals

16

Is the model additive?: Conditional plots

6 8 10 12 14 16 6 8 10 12 14 16

2040

6080

2040

6080

6 8 10 12 14 16

education

pres

tige

17

Two-dimensional P-splines

Now y = f(income, education) + ε = Ba + ε, where

B = B1 ⊗B2 P = λ1P 1 ⊗ In2 + λ2In2 ⊗ P 1

8

10

12

14

education5000

10000

15000

20000

25000

income

2040

6080

pres

tige

18

Smoothing and correlation (Currie and Durban, 2002)

AIC and GCV lead to underestimation of the smoothing parameter in thepresence of positive serial correlation. The general approach to modellingwith P-splines takes care of this problem.

18

Smoothing and correlation (Currie and Durban, 2002)

AIC and GCV lead to underestimation of the smoothing parameter in thepresence of positive serial correlation. The general approach to modellingwith P-splines takes care of this problem.

Wood profile data

320 measurements of the profile of a block of wood subject to grinding.

Sampling distance

Pro

file

0 50 100 150 200 250 300

7080

9010

011

012

0

19

Lag

AC

F

0 5 10 15 20 25

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

Residuals AR(1)

20

0 50 100 150 200 250 300Sampling Distance

7080

9010

011

012

0

Pro

file

21

Lag

AC

F

0 5 10 15 20 25

0.0

0.2

0.4

0.6

0.8

1.0

Residuals AR(2)

Other examples in Durban and Currie (2003), Computational Statistics.

22

Smoothing and heteroscedasticity (Currie and Durban(2002)

Simulated experiment to test crash helmets, 133 head accelerations andtimes after impact

Time (ms)

Acc

eler

atio

n (g

)

10 20 30 40 50

-100

-50

050

23

Fit y = Ba + ε with V ar(ε) = σ2V and V = W−1,W = diag(w1, . . . , wn).

Use P-splines to smooth Ri = log r2i r2

i = (yi − yi)2/σ2 andw−1

i ∝ exp(Ri).

••••• •••••••••••• ••••••••••••

••••••

••

••

••••

••

•••••••

•••••

••••

••

••••

•••••

••

•••

•••

••

••••••

•••• •

•• ••• •

Time (ms)

Res

idua

ls s

quar

ed

10 20 30 40 50

02

46

810

••••• •••••••••••••••••••••••••••••

••••••

•••••

••

•••••

•••••

••••••••••

••

•••••••••••

•••

••••••••••

••••

•••••

••

•••

••

•••••••••••• ••• •• •

Time (ms)

Inve

rse

wei

ghts

10 20 30 40 500

12

3

24

Generalised additive models: Count data

The one-parameter exponential family model, with canonical link, has jointdensity,

f(y|η) = exp {y′η − 1′b(η) + 1′c(y)}the linear predictor η = Ba, using the mixed model representation ofP -splines we rewrite Ba = Xβ + Zu

f(y|u) = exp {y′(Xβ + Zu)− 1′exp(Xβ + Zu)− 1′log(Γ(y + 1))}

and u ∼ N(0, σ2uI).

Iterate between penalised quasi-likelihood (PQL) of Breslow (1993) (toestimate β and u) and REML (to estimate variance components).

In the case of count data λ = 1/σ2u.

25

The data

Male policyholders, source: Continuous Mortality Investigation Bureau(CMIB).

For each calendar year (1947-1999) and each age (11-100) we have:

• Number of years lived (the exposure).

• Number of policy claims (deaths).

Mortality of male policyholders has improved rapidly over the last 30 years

⇓Model mortality trends overtime and dependence on age.

26

27

Additive model: Fitted curves for Ages 34 and 60

Year

log(

mu)

1950 1960 1970 1980 1990 2000

-7.8

-7.6

-7.4

-7.2

-7.0

-6.8

-6.6

-6.4

Year

log(

mu)

1950 1960 1970 1980 1990 2000-5

.0-4

.8-4

.6-4

.4-4

.2

Age: 34 Age: 60

28

Tensor model: Fitted curves for Ages 34 and 60

Year

log(

mu)

1950 1960 1970 1980 1990 2000

-7.6

-7.4

-7.2

-7.0

-6.8

-6.6

-6.4

Year

log(

mu)

1950 1960 1970 1980 1990 2000

-5.0

-4.8

-4.6

-4.4

-4.2

Age: 34 Age: 60

29

Age-Period

Age-Period-Cohort

Tensor

30

Forecasting with P-splines

Treat the forecasting of future values as a missing value problem.

• We have data for ny years and na ages and wish to forecast nf years

• Define a weight matrix V = blockdiagonal(I,0) I is an identity matrixof size nyna, 0 is a square matrix of size nf

• Define a new basis: B = BV and proceed as before

31

Forecast

Age: 34

1950 1960 1970 1980 1990 2000

Year

-8.5

-8.0

-7.5

-7.0

-6.5

log(

mu)

Age: 60

1950 1960 1970 1980 1990 2000

Year

-5.5

-5.0

-4.5

-4.0

log(

mu)

TruePredictionC.I.

32

P-splines for longitudinal data

33

The data

Objetive: Determine the effect of 4 surgical treatments on coronary sinuspotasium in dogs

• 36 dogs

• 4 treatments

• 7 measurements per dog

34

2 4 6 8 10 12

time

3.0

3.5

4.0

4.5

5.0

5.5

6.0

pota

ssiu

m

Group 1

2 4 6 8 10 12

time

3.0

3.5

4.0

4.5

5.0

5.5

pota

ssiu

m

Group 2

2 4 6 8 10 12

time

3.0

3.5

4.0

4.5

5.0

5.5

6.0

pota

ssiu

m

Group 3

2 4 6 8 10 12

time

3.0

3.5

4.0

4.5

5.0

5.5

pota

ssiu

m

Group 4

35

Models for longitudinal data

Basic Model yij = α0 + α1tij + βi0 + εij 1 ≤ j ≤ 7 1 ≤ i ≤ 36

35

Models for longitudinal data

Basic Model yij = α0 + α1tij + βi0 + εij 1 ≤ j ≤ 7 1 ≤ i ≤ 36

⇓ Relax linearity assumption

Model A yij = fgr(i)(tij) + βi0 + εij 1 ≤ gr(i) ≤ 4

35

Models for longitudinal data

Basic Model yij = α0 + α1tij + βi0 + εij 1 ≤ j ≤ 7 1 ≤ i ≤ 36

⇓ Relax linearity assumption

Model A yij = fgr(i)(tij) + βi0 + εij 1 ≤ gr(i) ≤ 4

⇓ Add random slope + general covariance matrix

Model B yij = fgr(i)(tij) + βi0 + βi1tij + εij

35

Models for longitudinal data

Basic Model yij = α0 + α1tij + βi0 + εij 1 ≤ j ≤ 7 1 ≤ i ≤ 36

⇓ Relax linearity assumption

Model A yij = fgr(i)(tij) + βi0 + εij 1 ≤ gr(i) ≤ 4

⇓ Add random slope + general covariance matrix

Model B yij = fgr(i)(tij) + βi0 + βi1tij + εij

⇓ Subject specific curves

Model C yij = fgr(j)(tij) + gi(tij) + εij

36The mixed model associated to Model A is:

y = X + Zu + ε Cov

[u

ε

]=

ΣgrI 0 0

0 σ2β0

0

0 0 σ2I

X =

X time...

X time

X time =

1 t1... ...

1 t7

Z =

Z1 1 0 · · · 0... ... . . . ...

1 0 · · · 0

Z2 0 1 · · · 0... ... . . . ...

0 1 · · · 0

Z3... ... ... ...

0 0 · · · 1... ... . . . ...

Z4 0 0 · · · 1

Zgr(i) =

Ztime...

Ztime

Σgr =

σ2

1I

σ22I

σ23I

σ24I

37

time

pota

sium

2 4 6 8 10 12

4.0

4.4

4.8

5.2

time

pota

sium

2 4 6 8 10 12

3.4

3.5

3.6

3.7

time

pota

sium

2 4 6 8 10 12

3.4

3.8

4.2

4.6

time

pota

sium

2 4 6 8 10 12

3.6

3.8

4.0

4.2

38

The mixed model associated to Model B is:

y = X + Zu + ε Cov

[u

ε

]=

Σgr 0 0

0 blockdiag(Σ) 0

0 0 σ2I

Z =

Z1 X time 0 · · · 0... ... . . . ...

X time 0 · · · 0Z2 0 X time · · · 0

... ... . . . ...0 X time · · · 0

Z3... ... ... ...0 0 · · · X time... ... . . . ...

Z4 0 0 · · · X time

39

The mixed model associated to Model C is:

y = X + Zu + ε Cov

[u

ε

]=

Σgr 0 0 0

0 blockdiag(Σ) 0 0

0 0 σ2cI 0

0 0 0 σ2I

Z =

Z1 X time 0 · · · 0 Ztime 0 · · · 0... ... . . . ... ... ... . . . ...

X time 0 · · · 0 Ztime 0 · · · 0

Z2 0 X time · · · 0 0 Ztime · · · 0... ... . . . ... ... ... . . . ...

0 X time · · · 0 0 Ztime · · · 0

Z3... ... ... ... ... ... ... ...

0 0 · · · X time 0 0 · · · Ztime... ... . . . ... ... ... . . . ...

Z4 0 0 · · · X time 0 0 · · · Ztime

40

time

pota

sium

2 4 6 8 10 12

2.8

3.0

3.2

3.4

3.6

3.8

4.0

4.2

time

pota

sium

2 4 6 8 10 12

4.5

5.0

5.5

time

pota

sium

2 4 6 8 10 12

3.2

3.4

3.6

3.8

time

pota

sium

2 4 6 8 10 12

4.2

4.4

4.6

4.8

time

pota

sium

2 4 6 8 10 12

2.8

3.0

3.2

3.4

3.6

time

pota

sium

2 4 6 8 10 12

3.5

4.0

4.5

5.0

time

pota

sium

2 4 6 8 10 12

3.4

3.6

3.8

4.0

4.2

time

pota

sium

2 4 6 8 10 12

3.5

4.0

4.5

5.0

41

Conclusions and work in progress

41

Conclusions and work in progress

• P -splines are useful tool to model data in many situations

• P-splines as mixed models

• Easy to implement in standard sorfware

• Model selection

42

References

42

References

• Currie, I. and Durban, M. and Eilers, P. (2003). Smoothing and forecasting mortality

rates.

• Currie, I. and Durban, M. (2002). Flexible smoothing with P-splines: a unified

approach. Statistical Modelling 2.

• Durban, M. and Currie,I. (2003). A note on P -Spline additive models with correlated

errors. Computational Statistics, 18.

• Durban, M., Harezla,J., Carrol, R. and Wand, M. (2003). Simple fitting of

subject-specific curves for longitudinal data.

• Eilers, P.H.C. & Marx, B.D. (1996). Flexible smoothing with B-splines ans penalties.

Statist. Sci. 11.

• Ruppert, D., Wand, M.P., Carroll, R.J. (2003). Semiparametric Regression. Cambridge

University Press.

• Wand, M.P. (2003). Smoothing and mixed models. Comput. Stat. 18.

Recommended