Missing data in time series: A note on the equivalence of the dummy variable and the skipping approaches

ARTICLE IN PRESS

0167-7152/$ - se

doi:10.1016/j.sp

E-mail addr

Statistics & Probability Letters 78 (2008) 257–264

www.elsevier.com/locate/stapro

Missing data in time series: A note on the equivalence of thedummy variable and the skipping approaches

Tommaso Proietti

Dipartimento S.E.F. e ME.Q, Via Columbia 2, 00133 Rome, Italy

Received 13 March 2006; received in revised form 22 February 2007; accepted 23 May 2007

Available online 16 June 2007

Abstract

This note shows the equivalence of the dummy variable approach and the skipping approach for the treatment of

missing observations in state space models. The equivalence holds when the coefficient of the dummy variable is considered

as a diffuse rather than a fixed effect. The equivalence concerns both likelihood inference and smoothed inferences.

r 2007 Elsevier B.V. All rights reserved.

Keywords: Kalman filter; Smoothing; Influence; Cross-validation

1. Introduction

A well-known result is that estimating a missing observation by skipping the Kalman filter (KF) updatingstep is equivalent to introducing a dummy variable (additive outlier) in the measurement equation, filling themissing value arbitrarily. This result (in different frameworks) appears in a number of papers: Sargan andDrettakis (1974), Bruce and Martin (1989), Ljung (1993). A detailed discussion can be found in Fuller (1996,Section 8.7). However, if the additive outlier is treated as a fixed effect, with zero covariance matrix, thelikelihood is defined differently and a correction has to be computed in the second case, see Gomez et al.(1999). The correction factor is related to the determinantal term of the likelihood and depends in a simplefashion from quantities computed under the model for the complete observations, requiring a single run of theKF and smoothing filter.

To our knowledge, a proof the equivalence of the skipping approach and the dummy approach for thedefinition of the likelihood and for smoothing is not available. This note aims at bridging the gap, providing asimple proof that when the additive outlier is treated as diffuse, with arbitrarily large covariance matrix, thecorrection to the likelihood takes place automatically. This is convenient, as no extra programming effort isnecessary once a programme handling diffuse initial conditions and regression effects has been implemented.

The equivalence is also carried forward to smoothed inferences, concerning the estimation of the states andthe disturbances. The derivation of analytical expressions for the influence of an observation on thesequantities, made in De Jong (1996), is greatly simplified in the dummy variable setup as they depend in asimple fashion on the output of the KF and smoother run on intervention variables.

e front matter r 2007 Elsevier B.V. All rights reserved.

l.2007.05.031

ess: [email protected]

www.elsevier.com/locate/stapro

dx.doi.org/10.1016/j.spl.2007.05.031

mailto:[email protected]

ARTICLE IN PRESST. Proietti / Statistics & Probability Letters 78 (2008) 257–264258

The plan of the paper is the following: Section 2 introduces the dummy variable approach for stationarystate space models with no regression effects, under fixed and diffuse conditions, and derives the predictionerror decomposition form of the likelihood under the latter. In Section 3 we present the alternative strategy ofhandling missing observations, known as the skipping approach, and prove that the likelihood for this modelis equivalent to the dummy variable one. In Section 4 the equivalence is extended to smoothed estimates of thestates and the disturbances, and measures of influence of an observations are given, which depends in a simpleway on the output of the KF and smoothing filter run on the intervention variable.

2. The Dummy variable approach

Let yt denote a vector stationary time series with N elements; the state space model is

yt ¼ Z tat þ G tet; t ¼ 1; 2; . . . ;T , (1)

atþ1 ¼ T tat þH tet; t ¼ 1; 2; . . . ;T , (2)

with a1�Nða1;s2P1Þ, where a1 and s2P1 denote the unconditional mean and covariance matrix of at, andet�NIDð0; s2IÞ. The system matrices, Z t;G t;T t;H t, are functionally related to a vector of hyperparameters, h.

The Kalman filter (KF) is a well-known recursive algorithm for computing the minimum mean squareestimator of at and its mean square error (MSE) matrix conditional on Y t�1 ¼ fy1; y2; . . . ; yt�1g. Defining

at ¼ EðatjY t�1Þ; MSEðatÞ ¼ s2Pt ¼ E½ðat � atÞðat � atÞ0jY t�1�,

the filter consists of the following recursions:

mt ¼ yt � Z tat; F t ¼ Z tPtZ0t þ G tG

0t,

qt ¼ qt�1 þ m0tF�1t mt; K t ¼ ðT tPtZ

0t þH tG

0tÞF�1t ,

atþ1 ¼ T tat þ K tmt; Ptþ1 ¼ TtPtL0t þH tJ

0t (3)

with Lt ¼ T t � K tZ t and J t ¼ H t � K tG t; mt ¼ yt � EðytjY t�1Þ are the filter innovations, with MSE matrixs2F t. The filter is started off with a1 ¼ 0, P1 ¼ H0H 00 and q0 ¼ 0. The log-likelihood for the model is, apartfrom a constant term,

Lðy1; . . . ; yT ; hÞ ¼ �1

2NT ln s2 þ

XT

t¼1

ln jF tj þ s�2qT

" #, (4)

where qT ¼PT

t¼1m0tF�1t mt.

Suppose that an intervention is included at t ¼ i so that the measurement equation becomes

yt ¼ Z tat þ I tðiÞdþ G tet, (5)

where I tðiÞ is an indicator variable taking value 1 for t ¼ i and 0 elsewhere. For its statistical treatment, the KF(3) at t ¼ i is augmented by the following recursions:

Vþt ¼ I tðiÞI � Z tAþt ,

Aþtþ1 ¼ T tAþt þ K tV

þt ¼ K iI tðiÞ þ LtA

þt ,

Sþt ¼ Sþt�1 þ V0þt F�1t Vþt ,

sþt ¼ sþt�1 þ V0þ

t F�1t mt, (6)

for t ¼ i; . . . ;T with starting conditions: Aþi ¼ 0, Sþi�1 ¼ 0 and sþi�1 ¼ 0. This amounts to apply the KF to theintervention signature I tðiÞI .

ARTICLE IN PRESST. Proietti / Statistics & Probability Letters 78 (2008) 257–264 259

When d is treated as a fixed effect, the log-likelihood can be written as (Rosenberg, 1973)

�1

2NT ln s2 þ

XT

t¼1

ln jF tj þ s�2 qT � 2s0þ

Tdþ d0SþT d� �" #

.

The MLE of d is thus d ¼ Sþ�1T sþT and the concentrated likelihood is

�1

2NT ln s2 þ

XT

t¼1

ln jF tj þ s�2 qT � s0þ

T Sþ�1T sþT

� �" #.

There is, however, a conceptual difficulty with the fixed effects model, as was clearly pointed out by Bell(1989, p. 408), in that ‘‘the use of an indicator variable lets the mean at a given point be anything while stillassuming that the observation is normal with the same variance and covariances as other observations,whereas omitting observations makes no assumption at all about it’’.

In the sequel, d is treated as a diffuse effect, that is ½CovðdÞ��1 converges to zero in the Euclidean norm (seeDe Jong, 1991), e.g. d�Nð0;kIÞ; k!1; this is equivalent to making no assumption on the covariance of theith observation.

De Jong (1991) has shown that d can be concentrated out of the likelihood function, so that d ¼ Sþ�1T sþT andMSEðdÞ ¼ s2Sþ�1T . The diffuse log-likelihood function is

LDVðy1; . . . ; yT ; hÞ ¼ �1

2NðT � 1Þ ln s2 þ

XT

t¼1

ln jF tj þ ln jSþT j þ s�2 qT � s0þ

T Sþ�1T sþT

� �" #. (7)

This function is the likelihood for a rank T � 1 transformation of the observations, which makes the datainvariant to d.

The following theorem is a restatement of Theorem 2 in De Jong and Penzer (1998).

Theorem 1. The estimate of d and the diffuse LF can be written as

d ¼M�1i ui; MSEðdÞ ¼ s2M�1

i , (8)

LDV ¼ �1


XT

t¼1

ln jF tj þ ln jM ij þ s�2ðqT � u0iM�1i uiÞ

" #, (9)

where ui and M i are the output at t ¼ i of the smoothing filter:

ut ¼ F�1t mt � K 0trt; M t ¼ F�1t þ K 0tN tK t,

rt�1 ¼ Z 0tF�1t mt þ L0trt; N t�1 ¼ Z 0tF

�1t Z t þ L0tN tLt (10)

started with rT ¼ 0 and NT ¼ 0.

Proof. We begin by noting that Vþi ¼ I and Vþt ¼ �Z tLt;iþ1K i for t ¼ i þ 1; . . . ;T with Lt;iþ1 ¼ Lt�1 � � �Liþ1

and Liþ1;iþ1 ¼ I . Hence

sþT ¼XT

t¼i

V0þ

t F�1t mt

¼ F�1i mi � K 0iXT

t¼iþ1

L0t;iþ1Z0tF�1t mt

¼ F�1i mi � K 0iri

¼ ui,


SþT ¼XT

t¼i

V0þ

t F�1t Vþt

¼ F�1i þ K 0iXT

t¼iþ1

L0t;iþ1Z 0tF�1t Z tLt;iþ1

!K i

¼ F�1i þ K 0iN iK i

¼M i.

Replacing into the expressions for d and (7) yields the result. &

Using a different argument, De Jong and Penzer (1998) show that

yt � Eðytjy1; . . . ; yt�1; ytþ1; . . . ; yT Þ ¼M�1t ut.

The next theorem provides an alternative expression for the likelihood function, based on the one-step-ahead prediction error decomposition. This will prove useful in the comparison with that arising from theskipping approach.

Theorem 2. For model (5), let F t ¼MSEðmtjY t�1Þ, and mt ¼ EðmtjY t�1Þ, where F t ¼ F t þ Vþt Sþ�1t�1 V0þ

t and

mt ¼ mt � Vþt Sþ�1t�1 st�1. Then,

LDV ¼ �1


Xi�1t¼1

ln jF tj þXT

t¼iþ1

ln jF tj þ s�2 qi�1 þXT

t¼iþ1

m0tF�1

t mt

!" #. (11)

Proof. To show that the determinantal part of the LF is as stated we provide the following recursion for jST j:

jSþT j ¼ jSþT�1 þ V

0þ

T F�1T VþT j

¼ jSþT�1jjI þ Sþ�1T�1V0þ

T F�1T VþT j

¼ jSþT�1jjI þ F�1T VþT Sþ�1T�1V0þ

T j

¼ jSþT�1jjFT j�1jFT j.

Iterating this result for t ¼ T � 1;T � 2; . . . ; i þ 1 and recalling that Sþi ¼ F�1i produces

ln jSþT j ¼XT

t¼iþ1

ln jF tj �XT

t¼i

ln jF tj.

Moreover,

qT � s0þ

T Sþ�1T sþT ¼ qT�1 � s0þ

T�1Sþ�1T�1sþT�1 þ m0T F

�1

T mT

which, applied recursively, yields

qT � s0T Sþ�1T sT ¼ qi � s0þ

i Sþ�1i sþi þXT

t¼iþ1

m0tF�1

t mt.

Now, as qi ¼ qi�1 þ m0iF�1i mi and s

0þ

i Sþ�1i sþi ¼ m0iF�1i mi, result (11) follows directly. &

3. The Skipping approach

When the ith observation is missing, the KF is forced to skip the updating step at time t ¼ i, so that

aðmÞiþ1 ¼ T iai; PðmÞiþ1 ¼ T iPiT

0i þH iH

0i, (12)

and qðmÞi ¼ qi�1.

From time i þ 1 on the KF (3) is run with mt; at; qt;K t;Ptþ1;Lt; J t replaced by mðmÞt ; aðmÞt ; qðmÞt ;K ðmÞt ;

PðmÞtþ1;LðmÞt ; J ðmÞt . See Harvey et al. (1998).


The log-likelihood function LðmÞ ¼Lðy1; . . . ; yi�1; yiþ1; . . . ; yT ; hÞ is

LðmÞ ¼ �1


Xi�1t¼1

ln jF tj þXT

t¼iþ1

ln jFðmÞt j þ s�2

Xi�1t¼1

m0tF�1t mt þ

XT

t¼iþ1

m0ðmÞt F ðmÞ

�1t m

ðmÞt

!" #.

(13)

Theorem 3. LðmÞ ¼LDV.

Proof. The KF resulting from the skipping approach is related to the full sample KF (3) by the followingequations:

mðmÞt ¼ mt � Vþt Sþ�1t�1 sþt�1; F ðmÞt ¼ F t þ Vþt Sþ�1t�1 V

0þ

t ,

KðmÞt ¼ K t � Aþtþ1S

þ�1t V

0þ

t F�1t ,

aðmÞtþ1 ¼ atþ1 � Aþtþ1Sþ�1t sþt ; PðmÞtþ1 ¼ Ptþ1 þ Aþtþ1Sþ�1t A

0þ

tþ1, (14)

where Aþt ¼ Lt;iþ1K i. These relations hold for t ¼ i þ 1: from (12),

aðmÞiþ1 ¼ aiþ1 � K imi ¼ aiþ1 � Aþiþ1Sþ�1i sþi

and

PðmÞiþ1 ¼ Piþ1 þ K iF iK0i ¼ Piþ1 þ Aþiþ1Sþ�1i A

0þ

iþ1.

Hence

mðmÞiþ1 ¼ Z iþ1a

ðmÞiþ1 ¼ miþ1 � Z iþ1A

þiþ1Sþ�1i sþi ¼ miþ1 � Vþiþ1S

þ�1i sþi

and

F ðmÞiþ1 ¼ Z iþ1PðmÞiþ1Z

0iþ1 þ G iþ1G

0iþ1 ¼ F iþ1 þ Vþ

iþ1Sþ�1i V

0þ

iþ1.

The formula for the gain matrix is obtained noticing that

F ðmÞ�1

iþ1 ¼ F�1iþ1 � F�1iþ1Vþiþ1Sþ�1iþ1 V

0þ

iþ1F�1iþ1

whence

K ðmÞiþ1 ¼ ðT iþ1PðmÞiþ1Z0iþ1 þH iþ1G

0iþ1ÞF

ðmÞ�1

iþ1 ,

¼ ðK iþ1F iþ1 � T iþ1K iF iV0ðmÞ

iþ1ÞFðmÞiþ1,

¼ K iþ1 � Aþiþ2Sþ�1iþ1 V0þ

iþ1F�1iþ1.

In conclusion, mðmÞt ¼ mt and F ðmÞt ¼ F t. &

When d is treated as a fixed effect, the correction that has to be applied to the determinantal part of thelikelihood is �0:5 ln jM ij, which is available from a run of the smoothing filter (10).

4. Influence and deletion diagnostics

In this section we use the previous results in a different perspective. Assuming that the full sample isavailable we aim at computing measures of influence on smoothed inferences. De Jong (1989) proved that thesmoothed estimate of the state at t, ~at ¼ EðatjYT Þ, and its MSE matrix, s2 ~Pt ¼ E½ðat � ~atÞðat � ~atÞ

0jYT �, are

~at ¼ at þ Ptrt�1; ~Pt ¼ Pt � PtN t�1Pt,

where rt�1 and N t�1 are given in the second line of (10).

Now, let ~aðmÞt ¼ EðatjYðiÞT Þ, where Y ðiÞ

T ¼ ðy1; . . . ; yi�1; yiþ1; . . . ; yT Þ is the information at T excluding yi.

Moreover, let ~PðmÞ

t ¼ E½ðat � ~aðiÞt Þðat � ~a

ðmÞt Þ0jY ðmÞT Þ�.


Theorem 4.

~at ¼ ~aðmÞt þ ðAþt þ PtR

þt�1ÞM

�1i ui,

~Pt ¼ ~PðmÞ

t � ðAþt þ PtR

þt�1ÞM

�1i ðA

þt þ PtR

þt�1Þ0,

where

Rþt�1 ¼ Z 0tF�1t Vþt þ L0tR

þt , (15)

with Aþt ¼ 0 for toi þ 1. Also, for toi, Vþ

t ¼ 0 and Rþt�1 ¼ L0tRþt ¼ L0i;tR

þi�1.

Proof. The orthogonal set fm1; . . . ; mT g is a linear transformation of the set

m1; . . . ; mi�1; mðmÞiþ1; . . . ; m

ðmÞT ; ui

n o.

This set is orthogonal too, since ui and mðmÞt depend only on fmj ; jXig, and

CovðmðmÞt ; uiÞ ¼ Cov mt � Vþt Sþ�1t�1 sþt�1;

XT

i

V0þ

t F�1t mt

!¼ 0; 8t4i.

Thus, applying a standard result in uncorrelated linear projection:

~at ¼ ~aðmÞt þ s�2Covðat � aðmÞt ; uiÞM�1i ui

¼ ~aðmÞt þ s�2Covðat � at þ Aþt Sþ�1t�1 sþt�1; uiÞM�1i ui

¼ ~aðmÞt þ Pt

XT

j¼i

L0t;jZ0jF�1j Vþj þ Aþt

!M�1

i ui

The formula for the MSE matrix is derived similarly from

MSEðatjYT Þ ¼MSEðatjYðiÞ

T Þ � Covðat; uiÞM�1i Covðat; uiÞ

0: &

Hence,

~at � ~aðmÞt ¼ ðA

þt þ PtR

þt�1ÞM

�1i ui (16)

provides the measure of influence of the ith observation on the state estimate at time t. The quantities on theright-hand side are readily available from the augmented KF for model (5); s�2 Covðat; uiÞ ¼ Aþt þ PtR

þ

t�1 isthe leverage of yi on ~at (De Jong, 1996).

We now show that

Rþi�1 ¼ Z 0iM i � T 0iN iK i. (17)

Rþi�1 ¼ Z 0iF�1i þ L0iZ

0iþ1F�1iþ1V

þiþ1 þ � � � þ L0T ;iZ

0T F�1T VþT

¼ Z 0iF�1i � ðL

0iZ0iþ1F�1iþ1Z iþ1K i þ � � � þ L0T ;iZ

0T F�1T ZT LT ;iþ1K iÞ

¼ Z 0iF�1i � L0iN

0iK i

¼ Z 0iF�1i � T 0iN iK i þ Z 0iK

0iN iK i

¼ Z 0iM i � T 0iN iK i

which proves (17). Therefore, we recover the result derived in De Jong (1996), ~ai � ~aðmÞi ¼ PiðZ

0iM i � T 0iN iK iÞ.

Let et ¼ EðetjYT Þ denote the smoothed disturbance. Koopman (1993) shows that

et ¼ G tF�1t mt þ J 0trt.

We are now interested in assessing the influence of the ith observation on the smoothed estimate of thedisturbance et.


For this purpose, we denote eðmÞt ¼ EðetjY

ðmÞT Þ, and, taking the expectation of both sides of (1) alternatively

with respect to YT and YðiÞT ,

yt ¼ Z t ~at þ G tet,

yt ¼ Z t ~aðmÞ

t þ G teðmÞ

t ,

we write

Z tð~at � ~aðmÞ

t Þ ¼ �G tðet � eðmÞt Þ; tai,

yi � EðyijYðiÞT ÞÞ ¼ Z ið~ai � ~a

ðmÞi Þ þ G iðei � eðmÞi Þ. (18)

Moreover, from the transition equation (2)

~atþ1 � ~aðmÞtþ1 ¼ T tð~at � ~a

ðmÞt Þ þH tðet � eðmÞt Þ. (19)

Theorem 5. Defining

Eþt ¼ G 0tF�1t Vþt þ J 0tR

þt (20)

the scaled smoothed disturbances of the transition equation are given by

H tðet � eðmÞt Þ ¼ H tEþt M�1

i ui. (21)

Moreover,

H iðei � eðmÞi Þ ¼ H iðG0iM i �H 0iN iK iÞM

�1i ui.

Proof.

H tðet � eðmÞt Þ ¼ ð~atþ1 � ~aðmÞtþ1Þ � T tð~at � ~a

ðmÞt Þ

¼ ½ðAþtþ1 þ Ptþ1Rþt Þ � T tðA

þt þ PtR

þt�1Þ�M

�1i ui

¼ ½K tVþt þ ðT tPtLt þH tJ

0tÞRþt � TtPtðZ

0tF�1t Vþt þ L0tR

þt Þ�M

�1i ui

¼ ½K tVþt þH tJ

0tRþt � T tPtZ

0tF�1t Vþt �M

�1i ui

¼ H tðG0tF�1t Vþt þ J 0tR

þt ÞM

�1i ui.

Note that, for toi, Eþt ¼ J 0tRþ

t . Also, H0ðe0 � eðmÞ0 Þ ¼ H0H00Rþ0M�1

i ui.In order to prove the last statement, we first note that

Rþi ¼ Z 0iþ1F�1iþ1V

þiþ1 þ Liþ1Z

0iþ2F�1iþ2V

þ

iþ2 þ � � � þ L0T ;iþ1Z0T F�1T VþT ,

¼ � ðZ 0iþ1F�1iþ1Z iþ1K i þ Liþ1Z0iþ2F�1iþ2Z iþ2Liþ1K i þ � � � þ L0T ;iþ1Z 0T F�1T ZT LT ;iþ1K iÞ,

¼ �N 0iK i,

which gives

Eþi ¼ G 0iF�1i Vþi þ J 0iR

þi

¼ G 0iF�1i þH 0iR

þi � G 0iK

0iRþi

¼ G 0iM i �H 0iN iK i: &

Theorem 6. For tai the scaled smoothed measurement disturbance is given as follows:

G tðet � eðmÞt Þ ¼ G tEþt M�1i ui. (22)


Proof. From (18)

G tðet � eðmÞt Þ ¼ � Z tð~at � ~aðmÞ

t Þ

¼ � Z tðAþt þ PtR

þt�1ÞM

�1i ui

¼ ½�Z tAþt � Z tPtðZ

0tF�1t Vþt þ L0tR

þt Þ�M

�1i ui

¼ ½Vþ

t � ðF t � G tG0tÞF�1t Vþt � Z tPtL

0tRþt �M

�1i ui

¼ G tðG0tF�1t Vþt þ J 0tR

þt ÞM

�1i ui

¼ G tEþt M�1

i ui.

In the derivation we used the easily established relation: Z tPtL0t ¼ �G tJ

0t. &

Theorem 7. The influence of yi on the smoothed disturbance:

et � eðmÞt ¼ Eþt M�1

i ui. (23)

Proof. The proof is immediate, as the matrix ½G 0tH0t�0 has full column rank. &

In conclusion, the computation of the influence for et and at via the forward recursion (19), depend onquantities readily available from a run of the smoothing filter on the dummy variable. Eþt provides a measureof leverage of yi on et.

5. Conclusions

The paper has showed the equivalence between the skipping approach and the dummy variable (additiveoutlier) approach for both likelihood and smoothed inferences, and use the latter for deriving suitablealgorithms for computing deletion diagnostics. The extension to the class of nonstationary state space modelsis available from the author.

References

Bell, W.R., 1989. Discussion of the paper by Bruce and Martin: leave-k-out diagnostics for time series. J. Roy. Statist. Soc. Ser. B 51,

408–409.

Bruce, A.G., Martin, R.D., 1989. Leave-k-out diagnostics for time series. J. Roy. Statist. Soc. Ser. B 51, 363–424 (with discussion).

De Jong, P., 1989. Smoothing and interpolation with the state space model. J. Amer. Statist. Assoc. 84, 1085–1088.

De Jong, P., 1991. The diffuse Kalman filter. Ann. Statist. 19, 1073–1083.

De Jong, P., 1996. Fixed interval smoothing. Working paper, London School of Economics.

De Jong, P., Penzer, J., 1998. Diagnosing shocks in time series. J. Amer. Statist. Assoc. 93, 796–806.

Fuller, W.A., 1996. Introduction to Statistical Time Serie Wiley Series in Probability and Statistics. Wiley, New York.

Gomez, V., Maravall, A., Pena, D., 1999. Missing observations in ARIMA models: skipping approach versus additive outlier approach.

J. Econometrics 88, 341–363.

Harvey, A.C., Koopman, S.J., Penzer, J., 1998. Messy time series. In: Fomby, T.B., Hill, R.C. (Eds.), Advances in Econometrics, vol. 13.

JAI Press, New York.

Koopman, S.J., 1993. Disturbance smoother for state space models. Biometrika 80, 117–126.

Ljung, G.M., 1993. on outlier detection in time series. J. Roy. Statist. Soc. Ser. B 55, 559–567.

Rosenberg, B., 1973. Random coefficient models: the analysis of a cross-section of time series by stochastically convergent parameter

regression. Ann. Econom. Social Measurement 2, 399–428.

Sargan, J.D., Drettakis, E.G., 1974. Missing data in an autoregressive model. Internat. Econom. Rev. 15, 39–58.

Documents

Missing data in time series: A note on the equivalence of the dummy variable and the skipping approaches