Cross-validation approximation in functional linear regression

This article was downloaded by: [University of Florida]On: 06 October 2014, At: 06:24Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Statistical Computation andSimulationPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/gscs20

Cross-validation approximation infunctional linear regressionMohammad Hosseini-Nasab aa Department of Statistics, Faculty of Mathematical Sciences ,Shahid Beheshti University , Tehran , IranPublished online: 27 Feb 2012.

To cite this article: Mohammad Hosseini-Nasab (2013) Cross-validation approximation in functionallinear regression, Journal of Statistical Computation and Simulation, 83:8, 1429-1439, DOI:10.1080/00949655.2012.662502

To link to this article: http://dx.doi.org/10.1080/00949655.2012.662502

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/gscs20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/00949655.2012.662502

http://dx.doi.org/10.1080/00949655.2012.662502

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Journal of Statistical Computation and Simulation, 2013Vol. 83, No. 8, 1429–1439, http://dx.doi.org/10.1080/00949655.2012.662502

Cross-validation approximation in functional linear regression

Mohammad Hosseini-Nasab*

Department of Statistics, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran

(Received 27 April 2011; final version received 27 January 2012)

Cross-validation has been widely used in the context of statistical linear models and multivariate dataanalysis. Recently, technological advancements give possibility of collecting new types of data that arein the form of curves. Statistical procedures for analysing these data, which are of infinite dimension,have been provided by functional data analysis. In functional linear regression, using statistical smoothing,estimation of slope and intercept parameters is generally based on functional principal components analysis(FPCA), that allows for finite-dimensional analysis of the problem. The estimators of the slope and interceptparameters in this context, proposed by Hall and Hosseini-Nasab [On properties of functional principalcomponents analysis, J. R. Stat. Soc. Ser. B: Stat. Methodol. 68 (2006), pp. 109–126], are based on FPCA,and depend on a smoothing parameter that can be chosen by cross-validation. The cross-validation criterion,given there, is time-consuming and hard to compute. In this work, we approximate this cross-validationcriterion by such another criterion so that we can turn to a multivariate data analysis tool in some sense.Then, we evaluate its performance numerically. We also treat a real dataset, consisting of two variables;temperature and the amount of precipitation, and estimate the regression coefficients for the former variablein a model predicting the latter one.

Keywords: cross-validation; eigenfunction, eigenvalue; functional data analysis; regression; stochas-tic expansion

MSC: 65C60; 68U20

1. Introduction

Functional data analysis (FDA) is a relatively new and rapidly growing area of statistics, astechnological advancements have made it possible to collect new types of data that are in theform of curves. These data are of infinite dimension, and should be treated by special statisticaltools provided by FDA. It is distinctly non-parametric in character, and depends on conceptsand theorems of operator theory. The standard approach for estimating the slope and interceptin functional linear regression is based explicitly on functional principal components analysis(FPCA), allowing for finite-dimensional analysis of the problem.

Cross-validation has been widely used in the context of classical regression models andclassification problems. Cross-validation is also used in the context of FDA. Amato et al. [1]used cross-validation for dimension reduction in functional regression. Gostanzo et al. (2006)proposed cross-validation for determining the number of partial least-squares components inregression; see also Ramsay and Silverman [2,3], Yao et al. [4], Viviani et al. [5], Nerini andGhattas [6], Matsui et al. [7], Shin [8], and He et al. [9,10] for further details.

*Email: [email protected]

© 2013 Taylor & Francis

Dow

nloa

ded

by [

Uni

vers

ity o

f Fl

orid

a] a

t 06:

24 0

6 O

ctob

er 2

014

1430 M. Hosseini-Nasab

Hall and Hosseini-Nasab [11] gave stochastic expansions for estimators of eigenvalues andeigenfunctions, providing not only a new understanding of the effects of truncating to a finitenumber of principal components, but also pointing to new methodology such as bootstrap confi-dence statements for eigenvalues and eigenfunctions. The paper also provided new insights intomore conventional FDA methods, including those which are used for functional linear regres-sion. In particular, Hall and Hosseini-Nasab [11] proposed estimators of the slope and interceptparameters in functional linear regression. The estimators depend on a smoothing parameter thatcan be chosen by cross-validation. Although they gave cross-validation method in the contextof FDA, it is time-consuming and hard to compute. In this work, in Section 2, we shall firstintroduce functional linear regression models and the corresponding estimators for the slopeand intercept of the model. Then in Section 3, we approximate the cross-validation criteriongiven by Hall and Hosseini-Nasab [11] by another such criterion, used in the context of theclassical linear regression. Section 4 contains numerical results where performances of the cross-validation criteria for choosing the smoothing parameter in functional linear regression have beencompared.

2. Functional linear regression model

The functional simple linear regression model is expressed as:

Yi = a +∫

IbXi + εi, 1 ≤ i ≤ n, (1)

where b and Xi are square-integrable functions from a bounded interval, say I, to the real line, a, Yi

and εi are scalars, a and b are deterministic, the pairs (X1, ε1), . . . , (Xn, εn) are independent andidentically distributed, the random functions Xi are independent of the errors εi, σ 2 = E(ε2) < ∞,E(ε) = 0 and

∫I E(X2) < ∞, where ε and X are distributed as εi and Xi, respectively.

Estimation of b is intrinsically an infinite-dimensional problem. Therefore, in functional linearregression, the problem involves using smoothing or regularization methods which enable us toreduce dimension. This is one aspect in which functional linear regression differs from morefamiliar linear models. Depending on the purpose for which the estimator b is used, the amount ofsmoothness needed is different [12,13]. When dealing with estimating b, optimal smoothness of bwill usually result in

∫bx, which is used for prediction, being over-smoothed for estimating

∫bX

given a value x of X . This is due to the integration involved in computing∫

bx from b, resulting inadditional smoothness. Therefore, the manner in which b is used for prediction is different fromthat for estimating b.

However, these two problems are not entirely separated, because knowing the behaviour ofthe slope function b, for example in what points b(t) takes large or small values, gives use-ful information about the role of the functional explanatory variables in the model, which inturn provides information regarding the location at which a future observation x of X willhave greatest influence on the value of

∫bx. Thus, it seems that the problem of estimating

the slope b is a prologue to estimating∫

bx, and perhaps it is because of this that, unlike thecase of classical linear regression, there is significant interest in estimating b in its own right inthis field [12]. See, for example, Ferraty and Vieu [14], Cuevas et al. [15], Cardot et al. [16],Ramsay and Silverman [3], Li and Hsing [17], Hall and Horowitz [18], and Hall and Hosseini-Nasab [13] in which estimation of b and in particular, convergence rates of the estimator b to bare discussed.

Dow

nloa

ded

by [

Uni

vers

ity o

f Fl

orid

a] a

t 06:

24 0

6 O

ctob

er 2

014

Journal of Statistical Computation and Simulation 1431

2.1. Coefficient estimation

Let E be the class of square-integrable functions from I to the real line, and let the bivariatefunction K with K(u, v) = cov{X(u), X(v)} be defined on I × I . We define the operator K suchthat it takes ψ ∈ E to K ψ as follows:

(Kψ)(x) =∫

IK(u, v)ψ(v) dv. (2)

The spectral decomposition of the operator K can be written as:

K(u, v) =∞∑

j=1

θjψj(u)ψj(v), (3)

where the eigenvalues θj and their corresponding eigenfunctions ψj satisfy the eigenequationKψj = θjψj, and the eigenvalues are ordered such that

θ1 ≥ θ2 ≥ · · · ≥ 0. (4)

Based on a set of independent and identically distributed observations of X(t), say X1(t), . . . , Xn(t),the empirical estimator of K , denoted by K , is defined by:

K(u, v) = 1

n

n∑i=1

{X(u) − X(u)}{X(v) − X(v)}, (5)

where X(.) = n−1 ∑ni=1 Xi(.). Similar to Equation (2), we can define the operator K with kernel

K(u, v) that admits an expansion analogous to Equation (3);

K(u, v) =∞∑

j=1

θjψj(u)ψj(v),

where, analogous to Equation (4), θ1 ≥ θ2 ≥ · · · ≥ 0 is an enumeration of the eigenvalues ofK . In particular, Kψj = θjψj. The two operators K and K are symmetric, positive semi-definiteHilbert-Schmidt on E.

If we express Xi and b in terms of the orthonormal basis ψ1, ψ2, . . ., then we have

Xi =∞∑

j=1

ξijψj, b =∞∑

j=1

bjψj, (6)

where ξij = ∫Xiψj and bj = ∫

bψj denote the associated generalized Fourier series of Xi and b,respectively. Then, model (1) can be equivalently written as

Yi = a +∞∑

j=1

bjξij + εi. (7)

Estimations of slope and intercept parameters in functional linear regression are generally basedon using statistical smoothing, leading to dimension reduction.

The true value, (a0, b0) say, of (a, b) may be estimated by the least-squares method throughminimizing

n∑i=1

(Yi − a −r∑

j=1

bjξij)2, (8)

Dow

nloa

ded

by [

Uni

vers

ity o

f Fl

orid

a] a

t 06:

24 0

6 O

ctob

er 2

014


with respect to a, b1, b2, . . . , br , and taking bj = 0 for j ≥ r + 1. Hall and Hosseini-Nasab [11]show that the estimators are

a = Y −r∑

j=1

bj, b(u) =r∑

j=1

bjψj(u) =r∑

j=1

θ−1j gjψj(u), (9)

where gj = (1/n)∑n

i=1(Yi − Y)(ξij − ξj), Y = n−1 ∑i Yi and ξj = n−1 ∑

i ξij.

3. Cross-validation approximation

The smoothing parameter r in Equation (9) can be chosen by cross-validation. In the context ofFDA, the predictive cross-validation criterion is given by

CV1(r) = 1

n

n∑i=1

{Yi − a(−i;r) −

∫I

b(−i;r) Xi

}2

. (10)

Here, (a(−i;r), b(−i;r)) denotes the least-squares estimator of (a, b), which is obtained by confiningattention to the set Zi, say, of all data pairs (Xj, Yj) excluding the ith; and both a(−i;r) and b(−i;r)

use the empirical Karhunen–Loève expansion of length r computed from Zi. We also show everyquantity, obtained by all data pairs except the ith, with index (−i). In the context of the classicallinear regression, however, we can use

CV2(r) = 1

n

n∑i=1

(Yi − Yi)2

(1 − Hii)2, (11)

where Yn×1 = HYn×1 is the predictor of Y, the n × n matrix H denotes the hat matrix, and Hii

is its ith diagonal element. This cross-validation criterion is easy to compute as there is no needto calculate the integral in CV1. It is clear that these two are not the same, since in computingCV1 when excluding the ith observation, we have to obtain ψj with only n − 1 other observations,and then obtain the coefficients ξij by using the current estimated eigenfunctions. This is not thecase for CV2. However, when computing CV1, if we neglect excluding the ith observation incalculation of the coefficients ξij, these two are the same.

If we truncate the two series b(t) and Xi(t) in N , then we have the linear modelY = �(1;N)b(N) +ε, where

Y =⎛⎜⎝

Y1...

Yn

⎞⎟⎠ , �(1;N) =

⎛⎜⎝

1 ξ11 . . . ξ1N...

......

...1 ξn1 . . . ξnN

⎞⎟⎠ , b(N) =

⎛⎜⎜⎜⎝

ab1...

bN

⎞⎟⎟⎟⎠ , ε =

⎛⎜⎝

ε1...εn

⎞⎟⎠

are the vector of response variable, matrix of covariates, vector of parameters, vector of errors,respectively, and ξij = ∫

xiψj. To emphasize on the truncation of the two series in N , we usethe index N for the vectors and matrices obtained by those N terms. By using the ordinary leastsquares method, we have

b(N) = (�T(1;N)�(1;N))

−1�T(1;N)Y. (12)

Let

Y∗ = (Y1, . . . , Yi−1, ξ(N)i b

(−i;N), Yi+1, . . . , Yn)

T,

where ξ(N)i = (1, ξi1, . . . , ξiN )T and b

(−i;N) = (a(−i;N), b(−i;N)1 , . . . , b(−i;N)

N )T. Here, b(−i;N)

denotesthe least-squares estimator of b(N) computed using all data except the ith. Therefore,

Dow

nloa

ded

by [

Uni

vers

ity o

f Fl

orid

a] a

t 06:

24 0

6 O

ctob

er 2

014


b(−i;N) = (�(N)

T�(N))−1�(N)

TY∗, and Y(−i;N) = H(N)Y∗, where H(N) = �(1;N)(�(1;N)

T�(1;N))−1

�(1;N)T. We have:

Yi − Y (−i;N)i = Yi − Y (N)

i + H(N)ii (Yi − Y (−i;N)

i ). (13)

Hence, Yi − Y (−i)i = (Yi − Yi)/(1 − H(N)

ii ), and we can write CV1 as follows:

CV1(r) ≈ 1

n

n∑i=1

⎛⎝Yi − a(−i;r) −

r∑j=1

ξij b(−i;r)j

⎞⎠

2

= 1

n

n∑i=1

(Yi − Y (−i;r)i )2 = 1

n

n∑i=1

(Yi − Y (r)i )2

(1 − H(N)ii )2

. (14)

Note that the approximated quantity in Equation (14) has been obtained by applying Xi =∑∞j=1 ξijψj and b(−i;r) = ∑r

j=1 b(−i;r)j ψ

(−i)j , in which the ψ

(−i)j s are obtained by confining atten-

tion to the set of all data excluding the ith. Moreover, there we have used the approximation〈ψ(−i)

j , ψk〉 ≈ δjk , where δjk is the Kronecker delta. Let us assign all elements of matrix �(1;N)

except the ones in the first column of matrix �(N), so that we have �(1;N) = [1, �(N)], where

1n×1 = (1, . . . , 1)T, then Y(N)

is obtained as follows:

Y(N) = H(N) Y,

where

H(N) =[

1

n11T +

(I − 1

n11T

)�(N)�

−1�(N)

T

(I − 1

n11T

)],

� = �T(N)

(I − 1

n11T

)�(N) = n diag(θ1, . . . , θN ). (15)

Analogous to the classical linear regression contexts, we can also use generalized cross-validation,obtained by substituting the average of the diagonal elements of H for the denominator of CV2,instead of the ith diagonal element. It is:

GCV2(r) = 1

n

n∑i=1

(Yi − Yi)2

(1 − (1/n) tr(H))2, (16)

where tr(H) in GCV2 denotes trace of matrix H.

4. Numerical results

The numerical work undertaken here, involves a substantial simulation study of data which are inthe form of functions rather than vectors. This includes inverting high-dimensional matrices, useof special methods to solve high-dimensional equations, and extensive Monte Carlo simulation.

The purpose of this simulation study is to provide a numerical assessment of what we discussedin Section 3, especially, to compare the performances of the three cross-validation criteria forchoosing the smoothing parameter of the slope and intercept estimators.

As Hall and Hosseini-Nasab [11] pointed out, although CV introduced in Equation (10) isavowedly designed for prediction, it works well in estimating the slope function too, which isexpected to require significantly more smoothing than prediction as explained before. It should benoted that when there are more than one local minimum for CV, the smallest value of r is chosendue to recommendations in more conventional cases (see for example, [19,20]).

Dow

nloa

ded

by [

Uni

vers

ity o

f Fl

orid

a] a

t 06:

24 0

6 O

ctob

er 2

014


4.1. Models used in simulation study

In the work reported here, each Xi is distributed as X = ∑j≥1 ξjψj and is defined on I = [0, 1], with

ψj(t) = 21/2 cos(jπ t) and the ξjs denote independent variables with zero means and respectivevariances θj = j−2, for = 1, 2, or 3. The latter three cases will be referred to as models (i),(ii), and (iii), respectively. The distributions of the ξjs are either normal or centred exponentialwith mean zero and variance θj. Furthermore, the errors εi are N(0, 1) and we take a = 0 andb(t) = π2(t2 − 1

3 ) = ∑j(−1)j2j−2ψj(t).

All estimated quantities shown by the graphs later are computed over 1000 simulated datasets.There are three different kinds of lines seen in each graph in the panels. The solid lines graph

the first quartile, median or second quartile and third quartile of the distributions of ‖b − b‖2 and|a − a|, when b and a are computed with r = r, producing the minimum of CV1(r). The threequartiles of the distribution of ‖b − b‖2 denoted by integrated squared error (ISE), are plotted inthe first row of the panels in Figures 1–3 for each case (i), (ii), and (iii) separately, and of thelatter are graphed in the second row. For computing the dashed lines, the procedure is the sameas the solid lines, except for the value of r = r chosen by CV2(r), instead of CV1(r). For thedotted lines, however, the minimizer of GCV2(r) is considered as the value of r = r. We have alsoobtained the kernel density estimates of ISE for small sample sizes (n = 50, 100, and 200) and themodels where there are noticeable differences between the performances of the cross-validationcriteria (Figures 4–6).

When considering model (i) and the first quartile of ISE as the graphs show, there are no signif-icant differences among the three lines obtained by the three cross-validation criteria (Figure 1).With the second quartile, however, GCV2 and then, CV2 have slightly better performance whenthe sample size is small (Figures 4–6). Regarding the third quartile, we also see a better perfor-mance of CV2 and GCV2 than CV1. When n = 50, for example, CV2 is as good as GCV2 but

200 400 600 800 1000

0.05

0.10

0.15

0.20

0.25

Sample Size (n)

Firs

t qua

rtile

ISE

(i)

200 400 600 800 1000

0.1

0.2

0.3

0.4

Sample Size (n)

Med

ian

ISE

(i)

200 400 600 800 1000

0.2

0.4

0.6

0.8

1.0

Sample Size (n)

Thi

rd q

uart

ile IS

E

(i)

200 400 600 800 1000

0.01

0.02

0.03

0.04

0.05

Sample Size (n)

Firs

t qua

rtile

est

imat

or o

f a

(i)

200 400 600 800 1000

0.02

0.04

0.06

0.08

0.10

Sample Size (n)

Med

ian

estim

ator

of a

(i)

200 400 600 800 1000

0.04

0.08

0.12

0.16

Sample Size (n)

Thi

rd q

uart

ile e

stim

ator

of a

(i)

Figure 1. Performance of the distribution of integrated squared error (ISE) and of estimators of a, when r is chosen bycross-validation. The solid lines graph the three quartiles of the distribution of ‖b − b‖2 and |a − a|, when b and a arecomputed by r = r, producing the minimum value of CV1(r). The dashed and dotted lines graph the three quartiles whenr = r is selected by cross-validation CV2(r) and GCV2(r), respectively. The sample sizes are n = 50, 100, 200, 500, and1000, and the underlying model is model (i).

Dow

nloa

ded

by [

Uni

vers

ity o

f Fl

orid

a] a

t 06:

24 0

6 O

ctob

er 2

014


200 400 600 800 1000

0.2

0.3

0.4

0.5

Sample Size (n)

Firs

t qua

rtile

ISE

(ii)

200 400 600 800 1000

0.2

0.3

0.4

0.5

0.6

0.7

Sample Size (n)

Med

ian

ISE

(ii)

200 400 600 800 1000

0.5

1.5

2.5

3.5

Sample Size (n)

Thi

rd q

uart

ile IS

E

(ii)

200 400 600 800 1000

0.01

0.02

0.03

0.04

0.05

Sample Size (n)

Firs

t qua

rtile

est

imat

or o

f a

(ii)

200 400 600 800 1000

0.02

0.04

0.06

0.08

0.10

Sample Size (n)

Med

ian

estim

ator

of a

(ii)

200 400 600 800 1000

0.04

0.08

0.12

0.16

Sample Size (n)

Thi

rd q

uart

ile e

stim

ator

of a

(ii)

Figure 2. Performance of the distribution of ISE and of estimators of a, when r is chosen by cross-validation. Thesolid lines graph the three quartiles of the distribution of ‖b − b‖2 and |a − a|, when b and a are computed by r = r,producing the minimum value of CV1(r). The dashed and dotted lines graph the three quartiles when r = r is selectedby cross-validation CV2(r) and GCV2(r), respectively. The sample sizes are n = 50, 100, 200, 500, and 1000, and theunderlying model is model (ii).

200 400 600 800 1000

0.1

0.2

0.3

0.4

0.5

0.6

Sample Size (n)

Firs

t qua

rtile

ISE

(iii)

200 400 600 800 1000

0.2

0.3

0.4

0.5

0.6

0.7

Sample Size (n)

Med

ian

ISE

(iii)

200 400 600 800 1000

12

34

56

7

Sample Size (n)

Thi

rd q

uart

ile IS

E

(iii)

200 400 600 800 1000

0.01

0.02

0.03

0.04

0.05

Sample Size (n)

Firs

t qua

rtile

est

imat

or o

f a

(iii)

200 400 600 800 1000

0.02

0.04

0.06

0.08

0.10

Sample Size (n)

Med

ian

estim

ator

of a

(iii)

200 400 600 800 1000

0.04

0.08

0.12

0.16

Sample Size (n)

Thi

rd q

uart

ile e

stim

ator

of a

(iii)

Figure 3. Performance of the distribution of ISE and of estimators of a, when r is chosen by cross-validation. Thesolid lines graph the three quartiles of the distribution of ‖b − b‖2 and |a − a|, when b and a are computed by r = r,producing the minimum value of CV1(r). The dashed and dotted lines graph the three quartiles when r = r is selectedby cross-validation CV2(r) and GCV2(r), respectively. The sample sizes are n = 50, 100, 200, 500, and 1000, and theunderlying model is model (iii).

Dow

nloa

ded

by [

Uni

vers

ity o

f Fl

orid

a] a

t 06:

24 0

6 O

ctob

er 2

014


0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

Est

imat

ed d

ensi

ty o

f IS

E

n=50, model(i)0 5 10 15

0.0

0.2

0.4

0.6

0.8

Est

imat

ed d

ensi

ty o

f IS

En=50, model(ii)

0 5 10 15

0.0

0.1

0.2

0.3

0.4

0.5

Est

imat

ed d

ensi

ty o

f IS

E

n=50, model(iii)

Figure 4. Estimated distribution of ISE when r is chosen by the three cross-validation criteria. The solid lines in thegraph show the distribution of ‖b − b‖2, when b is computed by r = r, producing the minimum value of CV1(r). Thedashed and dotted lines in the graph show the distribution of ISE when r = r is selected by cross-validation CV2(r) andGCV2(r), respectively. The sample size is n = 50 and the underlying models are (i), (ii), and (iii).

0 2 4 6 8 10

0.0

0.5

1.0

1.5

2.0

Est

imat

ed d

ensi

ty o

f IS

E

n=100, model(i)

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

Est

imat

ed d

ensi

ty o

f IS

E

n=100, model(ii)

0 2 4 6 8 10

0.0

0.1

0.2

0.3

0.4

0.5

Est

imat

ed d

ensi

ty o

f IS

E

n=100, model(iii)

Figure 5. Estimated distribution of ISE when r is chosen by the three cross-validation criteria. The solid lines in thegraph show the distribution of ‖b − b‖2 when b is computed by r = r, producing the minimum value of CV1(r). Thedashed and dotted lines in the graph show the distribution of ISE when r = r is selected by cross-validation CV2(r) andGCV2(r), respectively. The sample size is n = 100 and the underlying models are (i), (ii), and (iii).

0 1 2 3 4 5 6

0.0

0.5

1.0

1.5

2.0

2.5

Est

imat

ed d

ensi

ty o

f IS

E

n=200, model(i)

0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Est

imat

ed d

ensi

ty o

f IS

E

n=200, model(ii)

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Est

imat

ed d

ensi

ty o

f IS

E

n=200, model(iii)

Figure 6. Estimated distribution of ISE when r is chosen by the three cross-validation criteria. The solid lines in thegraph show the distribution of ‖b − b‖2 when b is computed by r = r, producing the minimum value of CV1(r). Thedashed and dotted lines in the graph show the distribution of ISE when r = r is selected by cross-validation CV2(r) andGCV2(r), respectively. The sample size is n = 200 and the underlying models are (i), (ii), and (iii).

Dow

nloa

ded

by [

Uni

vers

ity o

f Fl

orid

a] a

t 06:

24 0

6 O

ctob

er 2

014


for n = 100, GCV2 does a better job when compared with CV2 (this is also clear from Figures 4and 5). Furthermore, there is no significant difference between the performances of the threecross-validation criteria in estimating |a − a|. However, CV1 in the first quartile of |a − a| tendsto have slightly better performance when n ≤ 200.

For model (ii), when considering the first quartile, GCV2 works better. Moreover, the solid anddashed lines obtained by CV1 and CV2 tend to coincide. With the median, GCV2 has slightlybetter performance for all sample sizes. With regard to the third quartile, however, for n = 50,GCV2 and then CV2 works better than CV1. When n = 100, CV1 gives a smaller value of ISEwhen compared with the other two criteria (Figures 2 and 5). Furthermore, when n is large, GCV2

is best. Besides, as the graphs show, there is no significant difference among the performance ofthe three CV criteria in the case of distribution of |a − a| (Figure 2).

Under model (iii), there are no significant differences among the performance of the three CVcriteria from the viewpoint of the first and second quartile of ISE. But GCV2 tends to have slightlybetter performance on the median of ISE when sample size is very large (Figure 3). With regardto the third quartile, GCV2 performs better than the other two for all sample sizes, except forn = 100 and 500, where the other two perform better (Figure 5). Moreover, the solid and dashedlines corresponding to CV1 and CV2 tend to coincide. When considering |a − a|, we do not seeany significant differences among the performances of CV1, CV2, and GCV2 (Figure 3).

4.2. Performance of the cross-validation criteria on a real dataset

We have treated Ramsay’s Canadian weather station temperature dataset [3], consisting of twovariables; temperature and the amount of precipitation, and estimated the regression coeffi-cients for the former variable in a model predicting the latter one. The data were collectedevery month from each of n = 35 stations. To explain variation in the total annual precipita-tion through the year based on the temperature variation pattern, we have constructed the linearregression model, Yi = a + ∫

I b(t) Xi(t) dt + εi, where Y and X(t) are the total annual precipita-tion and temperature, respectively. Because the former variable was distributed across differentclimatic areas, it was highly variable from one weather station to another. Thus, we have usedits logarithm as the dependent variable. Specifically, we regarded the dependent variable Y asYi = log10(

∑12j=1 precij) − 1

35

∑35i=1 log10(

∑12j=1 precij), where precij denotes the amount of pre-

cipitation reported by station i during month j. Then, we estimated the slope b and intercept afrom the data using the estimators proposed in Equation (9).

0 2 4 6 8 10 12

−0.

010

0.00

00.

010

Month

Val

ue o

f est

imat

ed b

(t)

−0.4 −0.2 0.0 0.2

−0.

6−

0.2

0.2

0.6

Predicted log annual precipitation

Log

annu

al p

reci

pita

tion

Figure 7. The left panel gives the estimated regression slope when taking the log total annual precipitation as thedependent variable. The smoothing parameter, chosen by CV2 and GCV2, was r = 2. The estimated intercept wasalso a = −0.2413092. The right panel displays the predicted log total annual precipitation values against log annualprecipitation values.

Dow

nloa

ded

by [

Uni

vers

ity o

f Fl

orid

a] a

t 06:

24 0

6 O

ctob

er 2

014


Then, we have chosen the smoothing parameter r in the slope of the regression model by CV1,CV2, and GCV2. We have obtained r = 1 and r = 2, respectively, by CV1 and both CV2 andGCV2. Because, taking r = 2 gives an estimation of b in a two-dimensional subspace, whichgives a better sense when compared with r = 1. Therefore, we take r = 2. Then, we have usedthe model to predict the log total annual precipitation, and summarized the fit in terms of theconventional coefficient of determination R2 (Figure 7). Here R2 = 0.86, indicating a relativelysuccessful fit.

5. Conclusion

We have explored the performance of the cross-validation criterion for selecting the smoothingparameter in functional linear regression model. The cross-validation criterion (CV1) proposedby Hall and Hosseini-Nasab [11] is time-consuming and difficult to compute, especially for largesample sizes, due to the need to calculate an integral in its formula. On the other hand, as theresults show, for small sample sizes, CV2 and GCV2 that are approximations of CV1 and are easyto compute, have better performance for the slope estimation, compared with CV1. Moreover, forsmall sample sizes, the values of ISE computed from r = r that is chosen by CV1(r) generallyhave higher variation compared with those computed from the two other cross-validation criteria.To estimate the intercept, however, there is no significant difference between the performancesof the three criteria. Therefore, we recommend the use of CV2 or GCV2, instead of CV1, forestimating the parameters of the model.

Acknowledgements

The author is very grateful to Professor M. G. Vahidi-Asl and M. Ganjali for the English editing of the article, and tworeviewers for helpful comments.

References

[1] U.Amatoa,A.Antoniadisb, and I. De Feisa, Dimension reduction in functional regression with applications, Comput.Statist. Data Anal. 50 (2006), pp. 2422–2446

[2] J.O. Ramsay and B.W. Silverman, Applied Functional Data Analysis, Springer, New York, 2002.[3] J.O. Ramsay and B.W. Silverman, Functional Data Analysis, 2nd ed., Springer, New York, 2005.[4] F. Yao, H.G. Muller, and G.L. Wang, Functional linear regression analysis for longitudinal data, Ann. Stat. 33(6)

(2005), pp. 2873–2903.[5] R. Viviani, G. Gron, and M. Spitzer, Functional principal component analysis of fMRI data, Human Brain Mapping

24 (2005), pp. 109–129.[6] D. Nerinia and B. Ghattas, Classifying densities using functional regression trees: Applications in oceanology,

Comput. Stat. Data Anal. 51 (2007), pp. 4984–4993.[7] H. Matsui, Y. Araki, and S. Konishi, Multivariate regression modeling for functional data, J. Data Sci. 6 (2008),

pp. 313–331.[8] H. Shin, Partial functional linear regression, J. Stat. Plann. Infer. 139 (2009), pp. 3405–3418.[9] G. He, H.G. Muller, and J-L. Wang, Functional linear regression via canonical analysis, Bernoulli 16(3) (2010),

pp. 705–729.[10] G. He, H.G. Müller, J.L. Wang, and W. Yang, Functional linear regression via canonical analysis, Bernoulli 16(3)

(2010), pp. 707–729.[11] P. Hall and M. Hosseini-Nasab, On properties of functional principal components analysis, J. R. Stat. Soc. B: Stat.

Methodol. 68 (2006), pp. 109–126.[12] T. Cai and P. Hall, Prediction in functional linear regression, Ann. Statist. 34 (2007), pp. 2159–2179.[13] P. Hall and M. Hosseini-Nasab, Theory for high-order bounds in functional principal components analysis, Math.

Proc. Camb. Phil. Soc. 149 (2009), pp. 225–56.[14] F. Ferraty and P. Vieu, Fractal dimensionality and regression estimation in semi-normed vectorial spaces, C.R. Acad.

Sci. Paris Ser. I 330 (2000), pp. 139–142.[15] A. Cuevas, M. Febrero, and R. Fraiman, Linear functional regression: The case of fixed design and functional

response, Can. J. Statist. 30 (2002), pp. 285–300.

Dow

nloa

ded

by [

Uni

vers

ity o

f Fl

orid

a] a

t 06:

24 0

6 O

ctob

er 2

014


[16] H. Cardot, F. Ferraty, and P. Sarda, Spline estimators for the functional linear model, Statist. Sin. 13 (2003),pp. 571–591.

[17] Y. Li and T. Hsing, On rates of convergence in functional linear regression, J. Multivariate Anal. 98(9) (2007),pp. 1782–1804.

[18] P. Hall and J.L. Horowitz, Methodology and convergence rates for functional linear regression, Ann. Statist. 35(2007), pp. 70–91.

[19] B.U. Park and J.S. Marron, Comparison of data-driven bandwidth selectors, J. Am. Statist. Assoc. 85 (1990),pp. 66–72.

[20] P. Hall and J.S. Marron, Local minima in cross-validation functions, J. R. Statist. Soc. B 53 (1991), pp. 245–252.

Dow

nloa

ded

by [

Uni

vers

ity o

f Fl

orid

a] a

t 06:

24 0

6 O

ctob

er 2

014

Documents

Cross-validation approximation in functional linear regression