21
This article was downloaded by: [Moskow State Univ Bibliote] On: 04 January 2014, At: 05:53 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Statistics: A Journal of Theoretical and Applied Statistics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/gsta20 Estimation of dynamic models on the factors of marginal principal component analysis Nathanaël Mayo a a SAMM, Université Paris-1 , France Published online: 06 Feb 2011. To cite this article: Nathanaël Mayo (2011) Estimation of dynamic models on the factors of marginal principal component analysis, Statistics: A Journal of Theoretical and Applied Statistics, 45:1, 101-120, DOI: 10.1080/02331888.2010.541603 To link to this article: http://dx.doi.org/10.1080/02331888.2010.541603 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms- and-conditions

Estimation of dynamic models on the factors of marginal principal component analysis

Embed Size (px)

Citation preview

This article was downloaded by: [Moskow State Univ Bibliote]On: 04 January 2014, At: 05:53Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Statistics: A Journal of Theoretical andApplied StatisticsPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/gsta20

Estimation of dynamic models onthe factors of marginal principalcomponent analysisNathanaël Mayo aa SAMM, Université Paris-1 , FrancePublished online: 06 Feb 2011.

To cite this article: Nathanaël Mayo (2011) Estimation of dynamic models on the factors ofmarginal principal component analysis, Statistics: A Journal of Theoretical and Applied Statistics,45:1, 101-120, DOI: 10.1080/02331888.2010.541603

To link to this article: http://dx.doi.org/10.1080/02331888.2010.541603

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Statistics, Vol. 45, No. 1, February 2011, 101–120

Estimation of dynamic models on the factors of marginalprincipal component analysis

Nathanaël Mayo*

SAMM, Université Paris-1, France

(Received 30 October 2010; final version received 2 November 2010 )

In this paper, we legitimate the use of parametric time series models such as ARCH and hidden Markovmodels on the factors of marginal principal component analysis. More generally, we study the maximum ofapproximate likelihood. It is usually inefficient itself but shows good properties under weaker assumptionsthan the classical maximum likelihood estimator. Its main advantage is a crucial gain of computationalcomplexity since the maximization procedure reduces to elementary and separable maximization steps.Moreover, we provide efficient and robust upgrades of the procedure that strongly advocate the use of theMALE.

Keywords: PCA, dynamic factor, high dimension time series, estimated parameters

1. Introduction

Factor models are general and powerful statistical tools designed for vector data. They modeldata as linear combinations of a few hidden factors, i.e. distinct realities that explain most ofthe structure in the data. Factor models are popular in many fields, especially when the ambientdimension is high: image and sound treatment, survey analysis, financial series or DNA-relateddata. However, most multivariate time series contain both coordinatewise and time dependence,while the temporal dimension is not taken into account by factor models. These are usually basedon marginal quantities and hence designed for i.i.d. data. Time series are often the inputs of thesemodels and estimating the factors’ dynamics is an issue that naturally arises.

Risk measurement in finance illustrates this issue well. On the one hand, classical approaches(Markowitz optimization in portfolio management [1], Capital Asset Pricing Model [2]) rely on acovariance matrix between assets; on the other hand, the dynamic properties of returns are also ofinterest (in the ARCH setting for instance, see [3]). Such models are easily overparameterized sothat the estimation procedure by maximum likelihood happens to show bad properties in practice.No doubt this becomes a far more crucial issue when the dimension of data is high!

The pragmatic way (see, e.g. [4,5]) to access dynamic properties is to apply a dynamic modelon the factors obtained via some marginal factor model. The main drawback of this two-step

*Email: [email protected]

ISSN 0233-1888 print/ISSN 1029-4910 online© 2011 Taylor & FrancisDOI: 10.1080/02331888.2010.541603http://www.informaworld.com

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

102 N. Mayo

procedure is that preestimation of the factors induces a perturbation on the usual behaviour ofthe dynamic model! But in most cases, the statistical implications of this perturbation are merelynot discussed. This paper studies this estimation procedure on a frequentist basis. Our resultsstrongly advocate the pragmatic procedure: estimating dynamic models on factors estimated withmarginal principal component analysis (PCA).

In the first section, we present the independent dynamic factor (IDF) model, a generalization offactor models to the temporal case. In the case of SWARCH factors, we derive assumptions underwhich the maximum likelihood estimator (MLE) of IDF is asymptotically consistent, normal andefficient.Yet the resulting likelihood is typically multimodal and overparametrized. Its maximiza-tion behaves very badly in practice, so there is a need for alternative estimation procedure. Westudy the MALE procedure in the second section. It relies on plugging in the likelihood a prees-timator of a subset of parameters. Once plugged, the maximization over the temporal parametersbecomes separable over the factors. The dimension of each maximization is greatly reduced andthe procedure is much more stable in practice. We study the asymptotic behaviour of MALE inthe third section. The cost to pay for a tractable procedure is usually an increase in asymptoticvariance. Nevertheless, we provide an upgrade of MALE that reaches efficiency, as well as anefficiency theorem in the Markovian centred case. In the fourth section finally, the procedure isapplied simulated and real data.

2. Mixing time and coordinatewise dependences

2.1. The IDF model

Factor Models1 are useful when dealing with data in high dimension. They rely on the linearrelations Y = ZB + ε that explain observed variables Y = (Y 1, . . . , YN) using unobserved inde-pendent factors Z, with an invertible mixing matrix B. PCA is a basic factor model used whenε = 0 and B ∈ ON(R). It consists of the diagonalization of the covariance matrix of Y in anorthogonal basis. We also study this noiseless model, as more complex models require it. Indeed,PCA provides them with an estimated number of factor, a filtering step and/or an initial parametervalue. Although designed in the i.i.d setting, the following model leaves PCA meaningful andallows a dynamic modelling of the factors.

Definition 1 (IDF model) Yt ∼ IDF if it is a stationary process in RN (row vector), and ∃B ∈

ON(R), (Z1, . . . , ZN) = Z = YB−1 are independent processes.

Each factor Zn is now embodied with a dynamic model, the parameters of which are to beestimated. IDF are useful for time series forecast, since one only needs forecast the prevailingfactors to obtain a forecast for the whole vector Y . We model factors as autoregressive time seriesmodels. These are random sequences ξt such that conditionally on ξt = (ξt−1, . . . , ξt−M), ξt aredrawn independently from some parametric law g(ξt ; θ(ξt )). We specifically have in mind theARCH case.

Definition 2 (ARCH (Autoregressive Conditional Heteroskedasticity)) Let ε be a centred unit-variance strong white noise with density φ. ξt ∼ ARCH(M) model if ξt = σt (ξt )εt , with σ 2

t (ξt ) =β0 + ∑M

m=1 βmξ 2t−m. Its log-likelihood writes ln g(ξt ; θ(ξt )) = ∑T

t=1 ln φ(ξt/σt (ξt )) − ln σt (ξt ).

In practice though, the parametric form θ(ξt ) is deterministic and may not be flexible enoughto account for the behaviour of empirical series, which could contain sudden evolution of theparameter. A possible answer is to allow θ to be defined conditionally on a hidden process.

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

Statistics 103

The wide range of resulting distributions can capture the sudden evolution of parameters, andallows for non-Markovianity as well.

Definition 3 (AHMM (Autoregressive Hidden Markov Model2)) ξt ∼ AHMM if there exists aMarkov chain xt such that the (ξt |ξt , xt ) are drawn independently from some parametric lawgxt

(ξt ; θxt(ξt )).

The processes x and ξ take value in X and Y , with card mathcalX = P < ∞. Denote qθ (x, x ′)andπθ the transition matrix and invariant probability3 of the chainx. In this parametric setting, eachdensity g has parameters depending on the state, i.e. potentially indexed by x. Their collectionis denoted θ = (θx)x∈X and is the parameter space. We identify the parameters θ with thefunction ξ �→ θ(ξ ), and we can rewritegθ (ξt |ξt , xt ) = gxt

(ξt ; θxt(ξt )). Define alsoM = maxx Mx ,

where the Markov orders Mx in each state are assumed to be finite. This model is denotedAHMM(M ,qθ ,gθ ,θ ).

The extensive definition is given in [9], with assumptions under which the MLE is asymptot-ically efficient. Further applications to testing the value of P can be found in [10]. We use thesame notations and refer to [9] for any results or definitions on AHMM more precisely stated.Thelikelihood of ξ is a linear combination of autoregressive Markovian model’s likelihoods. Whenthe chain starts in its invariant probability4, the Bayes formula leads to the likelihood

Lξ (ξ1, . . . , ξT ; θ) =∑

x1,...,xT

πθ (x1)qθ (x1, x2) · · · qθ (xT −1, xT )

T∏t=1

gxt(ξt ; θxt

(ξt )).

We can now complete the definition of the IDF model: the factors Z = (Z1, . . . , ZN) aresuch that each Zn is an AHMM(Mn,qn

θn,gn

θn,θn) with likelihood LZn

of the form Lξ above. LetM = maxn Mn, while Yt = (Yt−1, . . . , Yt−M)′ denotes the M last observations of Y , gathered

in a [M, N ] matrix (rows are observations). Define Yt = (Yt , . . . , Yt−M)′ the same way. Finally,denote Bn the transpose of the nth line of B. An elementary but crucial property is:

Lemma 1 An IDF model is itself an AHMM(M, Qθ, Gθ , θ), with hidden chain X =(X1, . . . , XN) of transition Qθ(X, X′) = ∏

n qnθn

(Xn, X′n), parameters θ = (B, θ), where θ =(θ1, . . . , θN) and

GB,θ (Yt |Yt , Xt ) =N∏

n=1

gnθn

(YtBn|YtBn, Xnt ).

Proof Because of independence across factors, it is obvious that Z is an AHMM with chain

X = (X1, . . . , XN), where GZθ ( ¯Z) = ∏N

n=1 gnθn

(Zn). To conclude, we apply the change in variable¯Z �→ ¯ZB. This implies no Jacobian term because once gathered in a single vector of size [1, M.T ],

the transformation writes vec(Y ) = vec(Z).(Id ⊗ B), and |det((Id ⊗ B))| = |det(B)|M+1 = 1since B ∈ ON(R) (⊗ is the Kronecker product and vec the line concatenation operator). �

This leads us to define θX, GX and MX as the counterparts of θ , G and M restricted to thejoint state X, as well as θn,X, Gn,X and Mn,X when restricted to factor n and state X. For the IDFhowever, the coordinates of Z are independent, so that its likelihood simply writes

LY (Y1, . . . , YT ; θ, B) =N∏

n=1

LZn [Y1Bn, . . . , YT Bn; θn].

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

104 N. Mayo

2.2. Estimation of IDF models with MLE

Under usual conditions, both AHMM and factorial models can be well estimated with maximumlikelihood. Knowing whether this still holds for the IDF model is the first issue. Our results onthe MLE are restricted to the SWARCH case, but the approach can be extended to other usualfamilies. βn,X

m denotes the parameter βm of the SWARCH, for factor n and hidden joint state X.The three following results are proved in appendix.

Lemma 2 Let Y ∼ IDF − SWARCH. The full process wt = (Yt , . . . , Yt−m, Xt) is an irreducibleand aperiodic Harris positive Markov chain when

(B2) (a) Each factor has an innovation with density probability φ an d

{φ > 0} is a connected open set containing 0 .

(b) ∃X∗ ∈ X /∀n,∑

m>0,X

Qθ(X∗, X)βn,X

m < 1

Note that we do not require dynamics in each regime to be stationary5. As soon as explosiveregimes have low enough probability to be hit, Harris positivity is ensured by (B2b). An AHMMis called regular when assumptions (A1)–(A8) in [9] hold. Regularity ensures consistency of theMLE, and its efficiency when in addition the asymptotic Fisher Information I is positive. Provingthis positiveness is usually difficult, and this is the main theoretical limitation on the MLE6. InIDF, we prove that additional parameter B cannot itself lower the rank. However, positivenesscould fail because derivatives that mix B and θ possibly lower the rank of the Information matrix.Sometimes, the theory of regular multivariate models gives the result (in the multivariate ARCHcase, for instance, see [11]).

Theorem 1 Under assumptions (B1)–(B7) (given in appendices), the SIDF model is regular.

Proposition 1 Let IB be the asymptotic Fisher Information of the IDF, the factors of whichare Markovian, uncorrelated (in time) and without unknown dynamic parameter (only B is to beestimated). Under the assumptions of Lemma A7 and (A6)—(A8), we have IB 0, when

(B9) : ∀n, T r

(∂2gn( ¯y)

∂ ¯y2

)> 0.

2.3. Practical issues

The ICA is a more general model than PCA, since the matrix B need not be orthogonal. Eventhough most of our results can be adapted to the ICA, its parameter set is not compact (ON(R)

was) and this could raise technical issues of compactification of the parameter set. Nevertheless,many solutions, coming from the dynamic ICA models, are available to perform the maximizationof the likelihood. They usually use an EM algorithm, where both factors and states are hiddenvariables. The case of AR sources7 has been studied in [12], while the case of HMM sources istreated in [13,14]. The E-step can be performed with Monte-Carlo methods, such as Monte-CarloMarkov Chain or particle filter: see [15,16] for instance. These algorithms could certainly beadapted to the IDF model as well.

However, the instability of the MLE in both factor models and AHMM has been reported(see e.g. [3,7]). This occurs when the number of parameters increases because the density is amixture and hence multimodal. Searches of local extrema need to be performed for too many

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

Statistics 105

initial parameters and the trade-off between computation costs and stability is a crucial issue. Itreaches critical level on the multivariate IDF model! For instance, the activity of many financialagents (such as brokers) is exposed to more than 10 000 stocks8. In these cases, the computationof the MLE seems unrealistic. Also recall that the optimization needs to be performed additionaltimes if one wants to test for the nullity of some parameters, to avoid local maxima or to comparesome models using backtest.

We now propose a two-step estimator the main goal of which is to provide a small cost ofcomputation. But even though the full likelihood procedure may remain preferable, our estimatorcan still be used as an initial parameter’s value in the procedures above (see Section 4.2).

3. The pragmatic procedure

The pragmatic solution uses a preestimator of a subset of parameters that we plug into the likeli-hood (maximum of approximate likelihood). The key idea is to use a preestimator of B in orderto separate the objective function, allowing us to work separately on each factor. Compared withthe MLE, the complexity drops to the sum complexities of each monovariate model. For IDF

models, this two-step procedure turns to

(1) Preestimate B and plug this estimator into the joint likelihood of the IDF.(2) Maximize this approximate likelihood with respect to the other parameters.

Definition 4 (Approximate likelihood) If LY = L(b, Y, θ) for an unknown parameter b esti-mated by b, define LY = L(b, Y, θ) and θMALE its argmax.

Step (2) is separable and reduces to N independent maximizations, assuming

(H1) : = B × 1 × · · · × N

that is disjoint parameters across factors (no constraint exists across all θn and B). (H1) is notreally a strong assumption as it is very consistent with the factor model perspective9. Finally if B

was known the program [maxθLY ] would turn to

∀n, maxθn

Ln(YBn, θn(YBn)).

Our definition of IDF model is restrictive as it gives the prevailing role to coordinatewisedependence (Zt is conditioned upon before Xt ), hence some time series model as in [17] or [11]do not fit the IDF definition if left unconstrained. We could however condition upon the chain first,and define the SIDF model when the joint hidden regime Xt is not a collection of the independentXn

t ; as well as the BSWIDF model when the matrix B depends on the state Xt . Note first thatassumption (A1) would fail if using hidden chains in such general models. Thus, the generalproof of consistency could not apply. In any case, emphasizing the factor model perspective isvital in high dimension since it provides a preestimation/separation procedure, while computingdirectly the maxima of the full likelihood is often unrealistic.

We now chose PCA as a preestimator. It is valid since from the Definition 1, the marginalcovariance matrix � = Var (Yt ) is diagonaliazable in the basis B. Let I be the set of indices ofeigenvectors of � the eigenvalues of which have multiplicity one in the spectra of �. Denote by

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

106 N. Mayo

pcaI the corresponding eigenvectors and eigenvalues. Define the following assumptions:

(H2) (a) Yt is ergodic and E||Yt ||4 < ∞.

(b) An estimator �of�exists, for which the CLT holds.

(H3) I = [1, . . . , N] : all eigenvalues of � are distinct .

Proposition 2 (Preestimation) Under (H2a) the eigenvectors of pcaI is a consistent estimatorof the vectors BI . It is moreover asymptotically normal under (H2b).

Corollary 1 (Identifiability of IDF) Under (H2a) and if each monovariate model, on factorswith indices in I, is identifiable, BI and (θi)i∈I are identifiable in the IDF model.

Note that IDF is a complete factor model since it requires exactly N factors. The dimensionreduction is actually done after the preestimation step, due to the flexibility of PCA. Outsideof identifiable factors I , the basis is indeed defined randomly in eigenspaces and PCA does notcapture some independent direction, although it may happen that the model remains identifiable10.(H3) is a sufficient condition for identifiability of B, providing identifiability of the whole IDFmodel. However, the MALE only requires its one factor to be identifiable, that is n0 ∈ I .

Flexibility moreover allows us to select significant factors as a proper subset of identifiablefactors. For instance, there may be too many identifiable factors, so that only a few prevailingfactors should be selected. In this case, it is not required to estimate the dynamics of all eigen-vectors. One just stops extracting eigenvectors when inertia parts are too low. There is more toflexibility in the forecast setting, where one could select only well-predictable factors, that couldbe determined say by some backtest validation.

Notations and assumptions.

Yt are the multivariate data generated by an IDF model and n0 is the currently estimated factor(the MALE estimates θn0 ). M0 denotes the full IDF model, but subparametrized by θn0 andb = Bn0 . All the other parameters are fixed to their true value. The index n0 is dropped. θ denotesthe parameters of the current factor, while gθ and qθ are its autoregressive density and hiddentransition. LT (z1, . . . , zt ; θ) and T lT = T lT (z1, . . . , zt ; θ) are its univariate likelihood and log-likelihood, I0 its asymptotic Fisher information. When it is clear from the context, we drop thearguments of the functions gθ and lT . Finally define the following assumptions (↪→ stands forweak convergence and random variables are taken under the true law (θ∗, B∗))

(H4) : √T ( ˙lT (z1, . . . , zT ; θ∗), b) ↪→ N (0, �).

Definition 5 We say that assumptions (A) hold for the current factor if they hold for the modelM0. We say that assumption (A9) hold for the current factor if I0 is full rank.

The current eigenvector b = Bn0 is assumed identifiable via PCA (i.e. n0 ∈ I , see Section 2) andb is the PCA-preestimated eigenvector. Instead of using the actual factor zt = Z

n0t = ZtB

∗b, theMALE is based on the estimated factor zt = ZtB

∗b. Dots are used to denote derivatives w.r.t θ (forinstance, f = ∂f /∂θ). For a function f , define f = f (z), f = f − f , Lipb any majorant ofsupb,b f / b and ∇bf = ∂f /∂b. For instance, if f is the identity, then y = Y b ⇒ Lipb y =||Y ||.

Factors we are not estimating are still involved, but only at the true value of their parameters.If some factors are not identifiable or if their moments conditions do not hold uniformly overcompact parameters’ subsets, the MLE is not likely to be convergent or efficient respectively,while the MALE is still a good estimator for the other non-pathological factors.

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

Statistics 107

4. Asymptotic results

Theorem 2 (Asymptotic of MALE)

• If (H2a) holds and (A1)–(A5) hold for the current factor, the MALE is consistent.• If (H2b) holds and (A1)–(A9) hold for the current factor, then

√T .(θMALE − θ∗) = O(1).

• If (H4) holds and (A1)–(A9) hold for the current factor, then√

T (θMALE − θ∗) ∼N (0, �MALE), with �MALE = C�C ′ and C = (E ¨lT )−1 diag(Id, E ∇b

˙lT ).

Proof Under (A1)–(A4), Lemma (4) in [9] shows that limT →∞ supθ,B |T −1lT (θ, b) −l(θ, b)| → 0 a.s. for some continuous l(θ, b), so that lim∞ supθ |T −1 lT (θ) − l(θ, b∗)| → 0. Butunder (A5) θ∗ is the well-separated maximum of θ �→ l(θ, b∗). Now under (A1 − −A9),

θ − θ∗ = ( ¨lT + o(1))−1[ ˙lT + (∇b˙lT + o(1)) b],

θ − θ∗ = ¨lT −1[ ˙lT + ∇b˙lT b] + o( b) + o(θ − θ∗)

the last equality is valid because theorem 3 in [9] shows that ¨lT is an uniform ergodicsequence. Finally when

√T ( ˙lT , b) ↪→ N(0, �), Slutsky’s lemma yields to the result because

diag(Id, ∇b˙lT )

P→ diag(Id, E∇b˙lT ) �

The additional term in the development usually leads to an efficiency loss, which is the priceto pay for a tractable procedure. Since (H4) is hard to prove (it is based on lT rather than solelygθ and qθ ), this theorem usually implies only a rate of convergence. Yet we underline that onlythe Fisher Information of the univariate model needs to be full rank, without taking care of howadditional mixing parameters B could reduce it, while proving positiveness for the MLE is muchmore problematic.

4.1. The Markovian case

More can be said in the Markov case, when the regime of the estimated factor does not switch(X is reduced to a single point). (A9) is usually known and (H4) can be proved. We give aconsistency theorem showing MALE could well behave even when the classical MLE does not;and an efficiency theorem under additional assumptions.

Proposition 3 For an IDF model with ARCH factors verifying (A), (H4) holds.

Proof lT = (1/T )∑

t ln gθ (¯Yt ) and (d ln gθ , b) is a C1 function of all empirical moments of

order 2 (centred and not) which is asymptotically normal, so that the delta-method applies. �

4.1.1. Consistency

The MALE could well behave when the classical MLE does not. This happens as only the currentfactor’s density is used: if some other factor is ill-estimable via MLE, it should not disturb thecurrent factor’s estimation. Our next theorem shows that when the MLE on the monovariatemodel is consistent, and under a Lipschitz condition on the whole model, the MALE remainsconsistent. Consistency is obtained when the size of these Lipschitz numbers balances a slow rateof convergence of b.

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

108 N. Mayo

Theorem 3 (Consistency of the MALE) For the current Markovian factor, assume that lT (·) →l(·) a.s. θ -uniformly and that θ∗ is a well-separated maximum of l(·). If one of the followingconditions hold

• supθ Lipb (lT ) = Oas(1) and b is consistent;• supθ Lipb (lT ) = oas(T

1/2) and√

T ||b − b|| is bounded in probability;

then θMALEa.s.→ θ.

Proof Here supθ lT ≤ supθ T −1 ∑Tt=1 Lipb f (Yt |Yt−1)|| b|| = o(1). Since lT →u l a.s. the

sequence of the approximate likelihoods converges uniformly to the same limit. �

Remark 1 In the assumption of the previous theorem, the supremum over θ and the θ uniformityneed only to be local, i.e. hold for θ ranging over some neighbourhood of θ∗. Note that the b

assumptions are, respectively, valid under (H2a) and (H2b). The second bound could even beproved in the non-stationary case11 with some maximal inequality.

4.1.2. Efficiency

Until now, no use has been made of specific properties of PCA. Here, b and b lay on the unitsphere and have linear impact on observations. This allows us to prove efficiency when the factorwe are estimating is perturbed by other centred factors.

Theorem 4 (Equivalence between MLE and MALE) Assume that (A) holds for the currentMarkovian factor and that EZr = 0 for r �= n0. Then θMALE is asymptotically equivalent toθMLE, hence consistent, normal and efficient.

Although this theorem is specific to the case of linear models such as IDF, it is remarkable:the properties of parameters b (linear impact on data and normalized) imply the efficiency (andnormality) without assuming an unpleasant condition of the joint asymptotic normality of ( ˙lT , b),nor the Markov character of the other factors than the current one.

4.2. One step efficiency

Theorem 3 shows that the MALE trades efficiency for computation costs. It is however asufficiently good estimator to be used as a starting point for MLE. This procedure reachesone step efficiency: rather than perform the Newton–Raphson descent until convergence,only one descent step12 is enough to obtain an efficient estimator. Naturally define θMALE =(B, θMALE,1, . . . , θMALE,N ) as the MALE estimated full parameter set, where each factor has beentreated separately.

Definition 6 (One descent step from the MALE) Let θdMALE denote θMALE discretized to the grid

n−1/2Z

L (L being the dimension of the parameter set). θ1N−R is obtained with one gradient descentstep from θd

MALE, when estimators of the various derivatives are obtained with plugging θdMALE,

and writes

θ1N−R = θdMALE − ¨lT (Y1, . . . , YT , θd

MALE)−1 ˙lT (Y1, . . . , YT , θdMALE)

Theorem 5 If assumptions (A1)–(A9) hold for the whole model, θ1 N−R is asymptoticallyconsistent, normal and efficient.

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

Statistics 109

Proof Theorem 5.48 on p. 73 in [18] gives this result in the i.i.d case. We claim that thisproof holds in the dependent setting without any modification. We now check the conditions of

this theorem.√

T δθ lT converges in distribution and ¨lT P→ l (proved, respectively, by Lemma A2and Theorem 2 in [9]). Our Theorem 3 proves that θMALE is

√T -consistent. Finally, for every

non-random sequence θT = θ∗ + O(1/√

T ), there exists θT ∈ [θT , θ∗] such that

˙lT (θT ) − ˙lT (θ∗) = ¨lT (θT ) (θT − θ∗) + o(θT − θ∗) = ¨lT (θ∗) (θT − θ∗) + o(θT − θ∗)

so that√

T ( ˙lT (θT ) − ˙lT (θ∗)) − ¨lT (θ∗)(θT − θ∗) = o(1). �

5. Applications

5.1. Robust adaptations of MALE for scale parameter

In the previous sections, we asked how the usual MLE is affected by preestimation. We nowwonder whether estimates that exploits this perturbation could improve on the estimation. TheMALE was treated as a preestimated parameter problem, and the derivative of some statisticsw.r.t the observations was of constant use. It is in spirit very close to the Influence Curve in robuststatistics theory (see [19] for example). Because of the bilinearity with respect to (b, Yt ), theperturbation on the parameter b appears as a perturbation of the data. Each univariate likelihoodis taken on the noisy factor y = Zb instead of the true factor y = Zb. In the IDF Model, theMALE can also be seen as an issue of contaminated law. Using simulations, we now investigatei.i.d. models with two dimensions. Kratio denotes the ratio of eigenvalues in the covariance matrix.We simulate M samples of T observations, and are mainly concerned with small values of T andsmall values of Kratio.

Since the contamination impacts the location but not the scale of the full sample, robust estima-tors can improve scale estimation. Take the model Z1 ∼ N(0, s2) and Z2 ∼ Laplace. We wish toestimate s2 = 1 the variance of the first factor Z1. On the first PCA-preestimated factors Z1, wecompare the empirical counterparts of variance and MeanAbsolute Error (m.a.e): s1 = Var (z) ands2 = π/2(E|Z − EZ|)2. We also define the composite estimator s3 = wratios1 + (1 − wratio) s2,with wratio = 1/2Kratio(1 − 1/2Kratio). This idea works quite well: the MSEs decrease of about15%, for small values of T and/or small values of Kratio (see Table 1). This simple exampleadvocates for the use of the MALE with alternative robust QMLE.

It is more difficult to apply these ideas to a location parameter (trying to use locally robustestimator for instance). Indeed, the structure of preestimation is not very consistent with thecontamination model. This structure is more similar to error-in-variables models (see, [20] forinstance): rather than having only a fraction of noisy observations, the two steps procedure (pressti-mation) creates a small multiplicative noise on each of them. The use of error-in-variablesprocedures in the preestimation step, and, last but not least, the extension of these ideas to adynamic setting, are deferred to future work.

Table 1. Relative efficiency mse(s1)/mse(s3) − 1 in the scale model (%).

Kratio 1.05 1.1 1.2 1.5 2

T = 20 16 16 18 18 13T = 50 11 14 15 13 6T = 100 9 13 15 9 1T = 200 10 13 13 3 −5T = 500 11 15 9 −7 −16

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

110 N. Mayo

5.2. Volatility forecasts

PCA usually detects one prevailing factor when applied to a set of stocks’ returns. This factoris called the Market Line and usually gathers more than 50% of the inertia. The Market Lineaccounts for the strong positive correlations across the various stocks. This one-factor model andits variations are known as the CAPM (see, for instance, [1,2]). It is popular to compute the Costof Capital (the expected excess drift), under the additional assumption of market efficiency. Thisprocedure copes with the fact that trends are much more unstable than volatilities on real data.

However, the strong correlations hold not only for the returns, but also for their squares: theperiods of high and low volatility are often shared by different stocks. Our data consist in the dailyreturns for 20 stocks belonging to the CAC40 (20 first in alphabetic order), observed from 9 July30 to January 2009, for a total of 400 observations. In this section, we backtest different algorithmsusing sliding windows. For S = 300 and every t = 1, . . . , 100 = T , we use observations on datesbetween t − S and t to estimate the model and derive the forecast of σ 2

t+1 that we compare withthe squares y2

t+1. Let εt be the vector of forecast errors and define the Mean Square Error of

the prediction � = Eε′t εt , where E stands for the empirical mean over the observations between

T + 1 and T + S. For two procedures A and B, define the overperformance

γ AB = 1

N

∑n

�An,n

�Bn,n

.

A few monovariate backtests using ARCH(M) have first been performed, on marginal factorsand on a few stocks. The value M = 1 has always led to the smallest forecast error, for M =1, . . . , 10; although M usually seems significantly greater than 1. Hence we compare the followingthree models:

• The IDF(N ) with ARCH(1) factors. The PCA step keeps only the N first eigenvectors. Thisexample of IDF model is almost identical to the factor-ARCH model studied in [21].

• The MV-ARCH with fixed correlations. The conditional variance-matrix writes �Yt |Yt−1 =diagn(σZn

t |Znt−1

).R.diagn(σZnt |Zn

t−1), where the factors Zn

t are ARCH(1) processes, diagn() arediagonal matrices, and R is a fixed correlation matrix. This MVARCH model is describedin [22]13 toolbox.

• The ARCH aggregation, where each variable is an ARCH(1) (original variables are treatedindependently). Note we do not use GARCH model, since they are not consistent with AHMMperspective: the σt−k terms prevent gθ () from being Markovian.

This setting is not in favour of the IDF. First we did not use hidden regimes : there would bemuch fewer local maxima to be avoided. Then retaining only one factor is an important loss ofinformation. Using a few more factors would improve the IDF. Finally our analysis is led in a smalldimension, because backtesting even 100 observations is not realistic with multivariate modelswhen the dimension reaches a few thousand of variables. This restricted setting was set to avoidprohibitive computation time. As an example, the MVARCH implementation would require morethan 200 h14 to perform the full backtest when N = 50 stocks, while the MALE would requireless than 2 min.

The first important fact is that the monovariate ARCH models overperforms the IDF(1) by12% and the MVARCH by 7%. Multivariate models imply a predictive loss because either a fewdimensions are used to predict the whole dataset, either they contain too many parameters. Ofcourse, when only one stock A is of interest (or even a few), different approaches should be used15.However, many financial entities’ activities have exposure to the volatilities of the full market,and not only of a few stocks. When a portfolio of weights w = (wn) is diversified, the errors needbe treated jointly. This is where the multivariate models turn fully useful.

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

Statistics 111

In the IDF, the factors’parameters only are efficiently estimated. Hence the overperformanceshould be higher for non-sparse portfolios w, especially those close to the factors. This holds onour data. The IDF models overperforms the MVARCH by 10% and the ARCH by 15% whenpredicting the first factor. When w = (1/N, . . . , 1/N), the overperfomance are 8% and 13%.We can also consider the co-error matrices16. �ARCH − �IDF turns out to be rather positive thannegative: the positive inertia (sum of positive eigenvalues) is 4 times larger than the negative inertia.Hence the forecast error for an arbitrary portfolio is more often larger in theARCH than in the IDF.

When comparing the IDF and MVARCH, we note that eigenvectors of the estimated R arevery close to PCA’s eigenvectors. At any date, the cosinus between these vectors lies above 0,8(0,95 for the first factor). Hence there is no gain in defining the factors dynamically rather thanmarginally. This is an important fact in favour of the IDF approach. Finally this backtest is slightlyin favour of the IDF model against the MV-ARCH. As usually, PCA acts as a filter which reducesthe number of parameters, allowing us to avoid overlearning.

5.3. Instability of the correlations

We now use the returns of the Stoxx 500 index between January 2006 and January 2009. We fit theIDF(1) model with 2-states Gaussian HMM factors, that is Z1|X ∼ N (μX, σX). The annualizedestimated parameters (using MALE) are μ1 = 7%, μ2 = −13%, σ1 = 25% and σ2 = 50%. Hencethe second state corresponds to crisis and periods of stress, while the first corresponds to steadyperiods. The market remains in each state for a quite long time, since the estimated transitionprobabilities are q1,1 = 0, 91 and q2,2 = 0, 84. See [3] for similar results.

Using posterior probabilities17P of the HMM filter, we split the sample into two parts with

the rule Yt ∈ Yi if P(Xt = i) > 0, 7 for i = 1, 2. It allows us to increase the homogeneity of theestimated subsamples, while retaining 80% of the observations. We now compute R1 and R2, thecorrelation matrix over each subsample. On average over the stocks, corr1 = 0, 43 in the first stateand corr2 = 0, 76 in the second state. Each correlation pair increases by 40% on average.

This result indicates that stocks behave more alike to each other during volatility crisis, hencereinforcing the systematic risk. This is why the mixing matrix B should be indexed by the hiddenstate Xt . We believe using this stylized fact could improve on the predictive power of the models.Even though the preestimation of B with PCA becomes irrelevant in this case, we have usedposterior probabilities as preestimator. However, the assumption (A1) fails, and the general resultsfrom [9] cannot be applied.

6. Conclusion

The MALE procedure is a pragmatic approach to deal with time series in high dimension. It is onthe one hand very attractive: it is tractable and fast to compute, since the complexity is the sameas in monovariate models. This is the key achievement of the MALE over the classical MLE.Second, the exploratory aspect of PCA remains: one can base the choice of the factor’s dynamicon how they look like, or stop estimating factors with too low inertia. Finally, the MALE doesnot behave badly compared with the classical MLE. The rate is most often

√T and we provided

efficient or robust upgrades. The MALE can even be efficient itself.On the other hand, the MALE inherits the limitations of the preestimator it is built upon. In the

case of PCA, the MALE leads to spurious results when we are not able to discriminate indepen-dent dynamics with marginal inertia. In such cases however, other factor models such as ICA orNon-negative Matrix Factorization (NMF) could lead to good preestimator of factors. The proofis likely to extend because we do not require B to be orthogonal. This extension of MALE is veryattractive as it would overwhelm its main default.

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

112 N. Mayo

Notes

1. We mostly consider factor models based on variance identifications (PCA, see [6]) and not distribution (ICA(independent component analysis), see [7]) or positivity (NMF, see [8]).

2. It is better to use the name ‘Hidden Markov’ Models rather than the term ‘switching autoregression’ used in [9]because it would abbreviate to SWAR while in the literature the word switching is devoted to the various declinationsof HMM models for specific parametric dynamics, such as SWAR(p) for AR(p) process and SWARCH(p) forARCH(p) process.

3. The assumption (A1) ensures its existence.4. In [9], likelihoods with other initial conditions are also addressed. Most of them are asymptotically equivalent under

the assumption of uniform forgetting of the initial state.5. Which would be written ∀(n, X)

∑m>0 βn,X

m < 1.6. Conversely, as the MALE does not depend on the full Information Matrix, see Section 3.7. Sources are the factor in the ICA.8. This dimension could moreover be multiplied by the trading places.9. Factors are distinct explanations and hence have distinct parameters, while the way they combine is fully described

by B.10. For instance, a mixture of a Gaussian and a non-Gaussian signal is identifiable with ICA.11. Although outside the scope of this paper, conditions (A) can be used without stationarity assumptions, see [9].12. In practice a few iterations gives better results.13. We used the implementation available at http://www.kevinsheppard.com/wiki/MFE_ToolboxMFE.14. Under Matlab7 and processor Intel(R) Core(TM) T7200.15. A higher predictive power could be obtained with a specialization of the factors. For instance, one can use only

stocks highly correlated with A, or use both the factor and the stock to predict A, or maybe include other factors.16. Pay attention to the fact that the co-errors are relevant when one has exposure to a portfolio of volatilities. It is

different from the volatility of a portfolio, because the forecast is not a linear combination of each coordinate (thevariance is bilinear).

17. These are the probabilities to be in each state at each date. They are an output of the maximization.18. Lemma A3 ensures its existence in the ARCH case, but the result holds in general.

19. Both the dimension N and the hidden space cardinal P are finite. These extrema are still denoted ¯φ, φ and φ.

20. These differentiations are w.r.t σ 2 and proved by straight computations, for example dσ−1 = −1/2σ−3dσ 2.21. We allow the basis B to depend on the hidden state. These parameters are denoted BX .

References

[1] H.M. Markowitz, Portfolio selection, J. Finance 7(1) (1952), pp. 77–91.[2] R. Roll, A critique of the asset pricing Theory’s tests, J. Financial Econ. 4 (1977), pp. 129–176.[3] J.D. Hamilton and R. Susmel, Autoregressive conditional heteroskedasticity and changes in regime, J. Econometrics

64(1–2) (1994), pp. 307–333.[4] J. Bialkowsky, S. Darolles, and G. Le Fol, Improving VWAP strategies: A dynamic volume approach, J. Banking

Finance 32 (2008), pp. 1709–1722.[5] V. Plerou, P. Gopikrishnan, B. Rosenow, L.A.N. Amaral, T. Guhr, and H.E. Stanley, A random matrix approach to

cross-correlations in financial data, Phys. Rev. E 65 (2002), p. 066126 [18 pages]. 2001.[6] I.T. Jolliffe, Principal Components Analysis, Springer Series in Statistics, Springer, Berlin, 2010.[7] S. Choi, A. Cichocki, H.M. Park, and S.Y. Lee, Blind source separation and independent component analysis: A

review, Neural Inform. Process. Lett. Rev. 6 (2004), pp. 1–57.[8] C. Févotte, N. Bertin, and J-L. Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence: With

application to music analysis, Neural Computation 21 (2009), pp. 793–830.[9] R. Douc, E. Moulines, and T. Ryden, Asymptotic properties of the maximum likelihood estimator in autore-

gressive models with Markov regime, 2004. Available at http://arxiv.org/PS_cache/math/pdf/0503/0503681v1.pdfarXiv:math/0503681v1.

[10] R. Rios and L. Rodriguez, Estimation in autoregressive models with Markov regime, 2006. Available athttp://arxiv.org/abs/math/0505081arXiv:math/0505081v1.

[11] J-M. Bardet and O. Winterberg, Asymptotic normality of the quasi maximum likelihood estimator for mul-tidimensional causal processes, 2007. Available at http://arxiv.org/PS_cache/arxiv/pdf/0712/0712.0679v1.pdfarXiv:0712.0679v1.

[12] B. Pearlmutter and L. Parra, Maximum likelihood blind source separation: A context-sensitive generalization of ICA,in Proceedings of the 1996 Conference on Advances in Neural Information Processing Systems, Vol. 9, M.C. Mozer,M.I. Jordan and Thomas Petsche, eds., MIT Press, Cambridge, 1997.

[13] H.Attias and C.E. Schreiner, Blind source separation and deconvolution: The dynamic component analysis algorithm,Neural Comput. 10 (1998), pp. 1373–1424.

[14] W.D. Penny and S.J. Roberts, Hidden Markov Models with extended observation densities, Technical Report, ImperialCollege of Science Technology and Medicine, London, 1999.

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

Statistics 113

[15] N. Murata, S. Ikeda, and A. Ziehe Adaptive on-line learning in changing environments, Advances in NeuralInformation Processing Systems, vol. 9, MIT Press, Cambridge, 1997.

[16] R. Everson and S.J. Roberts, Non-stationary Independent Component Analysis, Technical Report TR-99-1,Proceedings of the 8th International Conference on Artificial Neural Networks, ICANN-99, in Perspectives inNeural Computing, Springer Verlag, Berlin,1999, pp. 503–508.

[17] M. Saidane, Modéles à Facteurs conditionnellement hétéroscédastiques et à Structure Markovienne Cachée pourles séries financières, Institut de Mathèatiques et de Modélisation de Montpellier IMR CNRS 5149, 2006.

[18] A.W. van der Vaart, Asymptotic Statistics, Cambridge University Press, Cambridge, 2006.[19] F.R. Hampel, The influence curve and its role in robust estimation, J. Amer. Statist. Assoc. 62 (1974), pp. 1179–1186.[20] A. Chesher, The effect of measurement error, Biometrika 78 (1991), pp. 451–462.[21] R.F. Engle, V.K. Ng, and M. Rothschild, Asset pricing with a factor-ARCH covariance structure, J. Econometrics

45 (1990), pp. 213–238.[22] A. Silvennoinen and T. Terasvirta, Multivariate GARCH models, SSE/EFI Working Paper Series in Economics and

Finance No. 669, 2008. Available at http://swopec.hhs.se/hastef/papers/hastef0669.pdfS-WoPEc[23] S.P. Meyn and R.L. Tweedy, Markov Chains ans Stochastic Stability, Springer, Berlin, 1994.

Appendix 1: Regularity of HMAR Model

The asymptotic properties of the MLE for switching autoregressions have been studied only recently in a general setting :we use theorems 1 and 4 in [9] to ensure asymptotic consistency and efficiency for the MLE of an AHMM. These authorsprovide 10 conditions (A0)–(A9) we now restate for finitely many hidden regimes. V denotes some neighbourhood ofthe real parameter θ∗ and wt = (yt , yt , xt ) is the full and unobserved process.

We derive the simpler set of assumptions (B) for IDF model (Theorem 1). The idea is to aggregate regularity fromfactors to the whole model. However, moment assumptions are required to control likelihood at a linear combinationof factors. To achieve this, define for any function ψ : R → R ψ(t) = maxs∈[−t,t] |ψ(s)| and ψ(t) = min[−t,t] ψ . Let

σ 2 = minn,X βn,X0 , Zt = (Zt , Zt ).

Remark A1 The exponent 16 in (B7) is only twice more than the eight moments required for the usual ARCH. We claimit could be lowered to 10 if computing derivatives more precisely.

(A0) Parameter set ⊂ RK is compact and θ∗ is interior to .

(A1) (a) infθ∈

infx,x′∈X

qθ (x, x′) ≥ σ− > 0 and supθ∈

supx,x′∈X

qθ (x, x′) < ∞,

(b) infθ

infx,x′

∑x

gθ (y|y, x) > 0 and supθ

supx,x′

∑x

gθ (y|y, x) < ∞.

(A2) ∀θ, wt is an irreducible aperiodic Harris-positive Markov chain.

(A3) (a) supθ

gθ (y|y, x) < ∞,

(b) E| ln(γ )| < ∞, where γ = infθ

∑x

gθ (yt |yt , xt ).

(A4) q and g are continuous w.r.t. θ.

(A5) The model is identifiable.

(A6) q and g are twice continuously differentiable w.r.t. θ.

(A7) (a1) supθ∈V,x,x′

‖∇θ ln qθ (x, x′)‖ < ∞,

(a2) supθ∈V,x,x′

‖∇2θ ln qθ (x, x′)‖ < ∞,

(b1) E supθ∈V,x

‖∇θ ln gθ (yt |yt , xt )‖2 < ∞,

(b2) E supθ∈V,x

‖∇2θ ln gθ (yt |yt , xt )‖2 < ∞.

(A8) (a) x �→ supθ

gθ (y|y, x) ∈ L1, (y, y) − a.s.,

(b1) y �→ supθ∈V

‖∇θ gθ (y|y, x)‖ ∈ L1, (y, x) − a.s.,

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

114 N. Mayo

(b2) y �→ supθ∈V

‖∇2θ gθ (y|y, x)‖ ∈ L1, (y, x) − a.s.

(A9) The Fisher information matrix I is full rank.

(B0) (A0) holds for each factor.

(B1) (A1) holds for each factor.

(B3) (a) EZt < ∞ (b) ∀n, E| ln φn|(σ −1‖Zt‖2) < ∞.

(B4) Each gn is continuous w.r.t (y, yt , θ).

(B5) Each dynamic model on Zn is identifiable and (H3) holds.

(B6) Each gn is twice continuously differentiable w.r.t (y, yt , θ).

(B7) (a) E‖Zt‖16 < ∞,

(b) ∀n, E‖Zt‖16 · (d ln φ + d2 ln φ)2(σ−1 · ‖Zt‖2) < ∞.

Appendix 2: Markovian properties of IDF-SWARCH

In this appendix, references for theorems, propositions and definitions are from [23], which we make extensive use of.

Lemma A1 Let Y be a vector of N independent ARCH. Assume that each coordinate n has order Mn and a density (forits innovation) fn bounded away from 0 on a neighbourhood Vn of 0. Then Y is forward accessible.

Proof Let Ak(Y0) denotes accessible states in k steps, starting from Y0. At each step, each coordinate n can reach σnt .Vn,

so that ×Mnm=1β

n0 .Vn ⊂ An

k for all Y0. Because the innovation sequence of each coordinates does not impact others, we

obtain ×n ×maxn Mnm=1 βn

0 .Vn ⊂ Ak . This set is open and not empty, so that Y is forward accessible. �

Lemma A2 Let Y be a vector of N independent ARCH. Assume that each coordinate n has a density for its innovationfn the set Vn = {fn > 0} of which takes the form ]a, b[ with −a, b ∈ R+. Then Y is M-irreducible.

Proof We first prove the result for one ARCH coordinate. Let c = max(a, b). The sequence of innovation leadingto the highest value for σt lays in shrinking neighbourhood of c. It leads to a volatility sequence as close to σ 2

t+1 =β0 + (

∑m>0 βm)c2σ 2

t+1 as we wish. Such sequence of σ converges to a constant K in R+. It implies that for each initial

state, limkAk = K.[a, b] = M . Proposition 7.2.3 proves that M is minimal.We index the previous set M by n for each coordinate. Because the innovation sequence of each coordinates does not

impact others ×nMn is minimal for the whole chain Y . Because the limsup relation holds for any initial state (not onlyin M), it also proves that Y is indecomposable. �

Lemma A3 Let Y be a vector of N independent ARCH under the previous assumptions. Y is irreducible and aperiodic.

Proof Theorem 7.2.6 proves irreducibility and Theorem 7.3.5 proves aperiodicity because M is connected, henceaperiodic. (Proposition 7.3.4). �

Remark A2 Here we proved that M is connected in R. The assumption that supp(f ) is connected could be weaken byproving connection of M as a subset of supp(Y ). Our result is nevertheless sufficient for most applications.

We now extend these lemma to IDF. We need include both hidden regime Xt and factorial basis B. For this we assume∃X∗ ∈ X , ∀X ∈ X , Qθ (X, X∗) ≥ σ > 0. This assumption is weaker than (A1). For a matrix B ∈ MN(R), αB(·) denotesthe function mapping a vector of matrix a ∈ MN(R)M to the vector of linear images (αB(a))m = am.B. We abbreviatethe event {Xt = · · · = Xt+m = X∗} to {Xt+m = X∗}.

Proposition A1 SIDF is irreducible.

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

Statistics 115

Proof Let X∗ as above. For a set C = Cy

X∗ × X∗,

P(wt+m ∈ C) = P( ¯ym ∈ Cy

X∗ , Xt = X∗) ≥ P( ¯Ym ∈ Cy

X∗ , Xm = X∗)

= P(αB−1

X∗ ( ¯Zm). ∈ Cy

X∗ |Xm = X∗).P(Xm = X∗)

≥ σm+1.φ∗ ◦ αB−1

X∗ ⊗ δX∗

where φ∗ is an irreducibility measure for (Zt ) in the constant state18 X∗. For any w0 and a set with φ∗(C) > 0, there existsm such that P( ¯ym ∈ C

y

X∗ |X1 = · · · = Xm = X∗) > 0. Finally, let φ = φ∗(B−1X∗ ·) ⊗ δX∗ and C with φ(C) > 0. Since X

is finite, C writes⋃

X Cy

X × X, and then φ∗(Cy

X∗ ) > 0. This proves P(wm ∈ C|w0) > 0. �

Proposition A2 SIDF is aperiodic.

Proof Let k > 0, K0 = [−k; k]N.M and ||.|| some submultiplicative matrix norm. Let λ = maxn,X ||Bnx || and KZ =

αB−1

X∗ (K0). Define for any set C, ρm,C = P(wt+m ∈ C × {X∗}|wt ∈ K0 × {X∗}).

ρm,C = P( ¯Yt+m ∈ C, Xt+m = X∗| ¯Yt ∈ K0, Xt = X∗)

≥ P( ¯Yt+m ∈ C, X = X∗| ¯Yt ∈ K0; Xt = X∗)

= P( ¯Yt+m ∈ C| ¯Yt ∈ K0; ¯Xt+m = X∗).P( ¯Xt+m = X∗| ¯Yt ∈ K0, Xt = X∗)

≥ σm.ρ∗

where ρ∗ = P( ¯Yt+m ∈ C| ¯Zt ∈ KZ; ¯Xt+m = X∗) ≥ P( ¯Zt+m ∈ C/λ| ¯Zt ∈ KZ; ¯Xt+m = X∗) can be chosen non-zero,because KZ is compact and hence m-small for the chain Zt remaining only in the state X∗ (chosen as before), for some smallmeasure νZ (theorem 5.5.3). Finally, K = K0 × X is small for the chain wt with a small measure ν = σmνZ ◦ A × δX∗ ,where A(x) = α

B−1X∗ (x/λ).

More generally, let s ∈ N such that K is ksν-small with ks > 0. We prove in the same way that KZ = K0/λ isks .νZ-small relatively to the chain Z. Hence the g.c.d of such indices s is 1 and Y is aperiodic. �

Proposition A3 Let Y follow a BSWIDF-ARCH. Then wt = (Yt , Yt , Xt ) is Harris positive when

(B2) : ∃X∗ ∈ X/∀n ≤ N,∑

0<m≤M,X∈XqX∗,XβX,n

m < 1

Proof We start with the case B = Idn so that Y = Z. Let ‖.‖ be the norm-2, V (w) = (1/n)∑M−1

m=0 am‖Yt−m‖2 andαn

m = ∑X pX∗,XβX,n

m . We have

N E[V (wt+1)|wt , Xt−1 = X∗] ≤∑

n

∑X

pX∗,X

[a0β

X,n0 +

M−1∑m=0

(am + a0βX,nm )(Zn

t−m)2

]So that the drift writes N. V (wt−1) ≤ ∑

n Vn(wt−1), with

Vn(wt−1) ≤ a0αn0 − aM−1(Z

nt−M)2 +

M−1∑m=1

(am − am−1 + a0αnm)(Zn

t−m)2

Fix a0 = 1. The sequence am = 1 − infn

∑mm′=1 αn

m verifies ∀n, am ≤ am−1 − a0αnm, and remains strictly positive. Hence

V is a norm-like function. Let k = a0(α0 + 1)/aM−1 and C = [−k; k]N.M × (X \ {x∗}). C is a compact set outside ofwhich V (Z) ≤ −1 and V (Z) < ∞. Theorem 9.4.1 proves that w is non-evanescent and hence Harris recurrent(Theorem 9.0.2). Finally, Theorem 10.4.4 shows that w is a positive Harris chain.

When B �= Idn, we have proved that (Zt , Zt , Xt ) is Harris positive, and so is the linear transform wt =(ZtB

−1, ZtB−1, Xt ).

Remark A3 When B is allowed to depend on the joint state X, the same proof shows that w is Harris positive, withwt = (Yt , Yt , Xt , Xt ) and V (w) = ∑M−1

m=0 am||Yt−mB−1Xt−m

||2.�

Proposition A4 Let Y ∼ IDF-ARCH with ∀n,∑

m>0 βnm < 1 and without hidden regime. Then the CLT holds for the

vector of moments of order 2 (centred and not).

Proof Let wt = (Yt , Yt ) and ρ = 1 − maxn

∑m>0 βn

m. The result above holds with am − ρ/(M ∗ N + 1) (am are thesame as before). We can also multiply the function V by arbitrary constant C to obtain that, outside of some compact, Vn(wt−1) ≤ −f (w), with f (w) = max(1, K.||w||22) for every constant K. �

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

116 N. Mayo

Appendix 3: Moment assumptions for IDF

Reparametrization of the orthogonal IDF model

Definition A1 (Cayley transform) The Cayley transform of a matrix A is defined as C(A) = (Id − A)(Id + A)−1. It hasthe following properties:• C ◦ C = Id;• C is a C1- diffeomorphism between antisymmetric matrices AN(R) and the subset of ON(R) of matrices O with

−1 /∈ Sp(O);• dC(A) = −(Id + C(A)) dA (Id + A)−1;• d2C(A) = [Id + C(A)][dA2(Id + A)−1dA1(Id + A)−1 + dA1(Id + A)−1dA2(Id + A)−1]

We use Cayley transform but from −B rather that −Id . We make the reparametrization

B = C(A).B∗.

This rotation simplifies the calculation because we study functions around A∗ = 0. The derivatives at A = 0 simplifyto dC(A) = −dA and d2C(A) = ( 1

2 )[dA2 dA1 + dA1 dA2]. Also note that supA∈K ‖dC(A)‖ and supA∈K ‖d2C(A)‖ arefinite for any compact set K .

Proof of moment conditions for SIDF–SWARCH Notations and assumptions. We drop the index t in notations. Weassume that conditionally on state x, each factor n ≤ N has conditional density of the form gn = σ−1φn(y/σ). We

define for any function ψ ∈ RR ¯ψ = supR |ψ |, ψ(t) = max[−t,t] |ψ | and ψ(t) = min[−t,t] ψ . Note that φ(t) = φ(t) for

usual (symmetric and monotonous on R+) densities φ. When using these notations with one function φn,X for each stateand each factor, we take the extremum of the previous quantities over n ≤ N and X ∈ X .19 If βm = maxn,X βn,X

m (and β

is the minimum), we obtain bounds for autoregressive parameter (for any factor in any state)

0 < β0

= σ ≤ σt (y) ≤ σt (‖y‖2)

with σ 2t (y) = β0 + ∑

k βky2t−k . All the extrema before are well defined because Xt is discrete-valued, the parameter set is

compact and the number of factor is finite. Note that σt (‖y‖2) ≤ σt (‖Z‖2) so that the bounds are uniform on all possiblevalues of Bx .

Bounds on derivatives Define the relation, as ‖ ¯y‖ → ∞:

a ≺ b ⇔ a( ¯y) = O(b( ¯y)) ⇔ ∃K compact , k > 0/∀ ¯y /∈ K, ‖a( ¯y)‖ ≤ k.||b( ¯y)||.Since the dimension is finite, this relation does not depend on the norm. Any norm is submultiplicative with respect to≺, that is ‖y‖ = ‖Z.b‖ ≺ ‖Z‖‖b‖. When the values of a and b are quadratic forms, we often switch between norms anduse the following:

Lemma A4 Let f be a continuous function with Ef ( ¯y)b( ¯y) < ∞. If a ≺ b then Ef ( ¯y)a( ¯y) < ∞.

Lemma A5 ‖x‖n ‖y‖m ≺ ‖(x, y)‖n+m.

For proper functions z, h and ψ , define

f = f ◦ z ◦ h ◦ ψ (A1)

and compute (we drop the original point at which derivatives are computed)

d f = df ◦ dz ◦ dh ◦ dψ

d2 f = d2f (dz ◦ dh ◦ dψ, dz ◦ dh ◦ dψ) + df ◦ d2z(dh ◦ dψ, dh ◦ dψ)

+ df ◦ dz ◦ d2h(dψ, dψ) + df ◦ dz ◦ dh ◦ d2ψ

Take z : (y, σ 2) �→ yσ 2−1/2so that

• dz = 1/2σ−3|y|dσ 2 + σ−1 dy ≺ |y| + 1,• d2z = 3/2σ−5y dσ 2 d′σ 2 − 1/2σ−3 dσ 2 d′y − 1/2σ−3 dy d′σ 2 ≺ 1 + |y|.Let ψ : (A, θ) �→ (Ztπn(C(A)), . . . , Zt−Mπn(C(A)), θ)

• dψ ≺ ‖ ¯Z‖πn(dC(A)) + dθ ,

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

Statistics 117

• d2ψ ≺ ‖ ¯Z‖πn(d2C(dA, dA′)),

and h : (θ, ¯y) �→ (yt , σ2(yt , θ))

• dh ≺ 1 + ‖Zn‖t + ‖(Z)2‖t ≺ ‖ ¯Z‖2,

• d2h ≺ 1 + ‖Zn‖t ≺ ‖ ¯Z‖2.

Because dσ 2 = [0, . . . , 2βkyt−k, . . .] · d ¯y + [1, . . . , y2t−k, . . .] · dθ ,

dσ 2 ≺ ‖Yn‖2 ≺ ‖ ¯Z‖2 and d2σ 2 ≺ ‖Yn‖ ≺ ‖ ¯Z‖Finally,

d f ≺ df ‖ ¯Z‖4

d2 f ≺ d2f ‖ ¯Z‖8 + df (‖ ¯Z‖7 + ‖ ¯Z‖5 + ‖ ¯Z‖5)

≺ (d2f + df )‖ ¯Z‖8

We now apply the last bounds to each gn and log gn. We derive conditions on the φn, and also need the relations20

dσ−1 ≺ 1; d2σ−1 ≺ 1; d ln σ ≺ 1 and d ln σ ≺ 1. When taking f = σ or f = ln σ , we obtain

dσ−1 ≺ ‖ ¯Z‖4; d2 σ−1 ≺ ‖ ¯Z‖4; d ln σ ≺ ‖ ¯Z‖8 and d ln σ ≺ ‖ ¯Z‖8. �

Proof of moment conditions Upper letters denote multivariate quantities. n represents coordinate or factor index whileX denotes state index. In IDF21, we have

GB,θ (Yt |Yt , Xt ) =∏n

gnθn (YtB

nXt

|YtBnXt

, Xt ) =∏n

gn ◦ z ◦ h ◦ ψ(Zt , Xt , θn).

For IDF, B does not depend on X and we have also Qθ(X, X′) = ∏n qθn (X

n, X′n). We assume (B3) and (B7) and provethe (A) moment assumptions.

(A3b) Note that G = σ−N∏

n φn ≤ GB,θ ≤ (maxn σn maxn φn)N = G. Because G is constant and the bound holdsfor each term of the product, (A3) turns to tail condition and is true when each factor verifies both

• E(− ln σt−1(‖Zt‖))+ < ∞ ⇐ E(ln ‖Zt‖)+ < ∞ ⇐ E‖Zt‖ < ∞

• E(− ln φn(σ−1‖Zt‖2))+ < ∞ ⇐ E| ln φn(σ−1‖Zt‖2)| < ∞

(A7) Since ln GB,θ = ∑n ln gn

θn(Ybn

X, Y bnX, X) is the sum of such terms with f = ln gn(), we only need to prove

(A7) for each g = φn and g = log σ 2n . It is proved by our previous bounds on derivative. These are uniform in n and X

so that the finite sum verifies them too, if for each n (B7) holds.(A8b) Note in our computation that bounds based on the relation ≺ also hold for the relation ≺t where only Yt tends to

infinity, for any value of Yt , even those inside compacts. The proof is then direct: because each gn is bounded by σ−1φn wecan bound most terms of the derivative of the product GB,θ = ∏

n gnθn

(Ybn,X, Y bn,X, X) by deterministic terms. Hence

we only have to prove (A8b1) for the derivative of ∇gn for each n; and (A8b2) for each second derivative ∇2gn and pairsof first derivatives ∇gn1 ∇gn2 . Because the g are bounded, dg ≺ d ln g and d2g ≺ (d ln g)2 + d2 ln g. Moreover the termcoming from σ()−1 is constant w.r.t y′. For pairs of derivatives, we use 2.|∇gn1 .∇gn2 | ≤ (∇gn1 + ∇gn2 )2. This provesthat (A8) holds under (B7). �

Remark A4 When the ln φn are subpolynomial (Gaussian or Laplace law), the last implication shows that these conditionsare the classical moment conditions of each dynamic also applied to the other factors. Hence it automatically holds whenthey belong to the same parametric family in which (A3) is satisfied.

Appendix 4: Efficiency and Fisher Information

Proof of theorem 4 For matrices M , Let πn(M) = (Mn,1, . . . , Mn,N )′ be the transpose of the nth line of M . Note thatπn(M N) = N ′ πn(M). Let S[., .] denotes a bilinear form with its arguments. �

Lemma A6 (Derivative of Markovian functions for one factor) For any (conveniently differentiable) function g : RM+1

→ R, we compute the differentials of

A �→ g(Ytπn(B), . . . , Yt−M+1πn(B)) = g(Ztπn(C(A)), . . . , Zt−M+1πn(C(A))),

∇Ag.dA =∑

m=0...M−1

∂ym

g Zt−m πn(dC(A) dA),

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

118 N. Mayo

∇2Ag[dA1, dA2] =

∑m,m′

∂2

∂ym∂ym′g[Zt−mπn(dC(A) dA1); Zt−m′ πn(dC(A) dA2)]

+∑n,m

∂ym

gZt−m πn(d2C(A)[dA1, dA2]).

Note that πn(dC(A).dA) = πn(dA) has coordinate n equals 0 (since dA is antisymmetric), hence Zt−m.πn(dC(A).dA)

is a combination of (Zrt−m)r �=n. But the (Zr

t−m)r �=n are centred and independent of Zn, so that

E∂

∂zm

g Zt−mπn(dC(A).dA) = E∂

∂zm

g EZt−m πn(dC(A) dA) = 0

Now consider the additional term of the local expansion in Theorem 3:

T 1/2∇b˙lT (z1, . . . , zT ; θ∗) b =

∑m

∂zm

˙lT Zt−m πn(dC(A) dA) T 1/2 A + o(T 1/2 A)

This term is finally a o(T 1/2 A) because (1/T )∑

t gθ (Znt−m) Zr

t−m → 0 a.s. by the ergodic theorem. This shows thatthis term is negligible and that θMLE and θMALE are equivalent. In particular, θMALE is asymptotically efficient.

Lemma A7 (Pseudo score) Let sm(yt , . . . , yt−M+1) = (∂/∂yt−m) log L(yt |yt ) be the score obtained when using obser-vations as parameters in a Markovian model. Assume that

• ∀m > 0,∫

(∂/∂yt−m)Ldyt = (∂/∂yt−m)∫

L dyt

• ∀yt , L(yt |yt ) = o(|yt |) as |yt | → ∞Then E[sm(yt , . . . , yt−M+1)|yt ] = 0 and Cov (sm; yt−m) = −δm,0.

Proof For a fixed coordinate m > 0,

E[sm(yt , . . . , yt−M+1)|yt ] =∫

R

∂yt−m

L dμ(y) = ∂

∂yt−m

∫L dyt = 0,

E[s0(yt , . . . , yt−M+1)|yt ] =∫

R

∂yt

L dμ(y) = [L]yt ∈R = 0,

Cov (sm; yt−m) = Esm(yt , . . . , yt−M+1).yt−m

= Eyt−mE[sm(yt , . . . , yt−M+1)|yt ] = 0,

Cov (s0; yt ) =∫

R

∂yt

Lyt dμ(yt ) = [L.yt ]yt ∈R −∫

R

L dμ(yt ) = 0 − 1.

Proof of proposition 1 We now compute the likelihood for one observation in Markovian IDF model, and its derivatives(we denote B[., .] the arguments of a bilinear form B):

l = ln L(Yt ) =∑

n

ln Ln(Ytπn(B), . . . , Yt−M+1πn(B))

=∑

n

ln Ln(YtB∗′

πn(C(A)), . . . , Yt−M+1B∗′

πn(C(A)))

=∑

n

ln Ln(Ztπn(C(A)), . . . , Zt−M+1πn(C(A))),

dl.dA =∑

n,m=0...M−1

∂ym

ln Zt−mπn(dC(A) dA),

d2l[dA1, dA2] =∑

n,m,m′

∂2

∂ym ∂ym′ln [Zt−mπn(dC(A)dA1); Zt−m′πn(dC(A)dA2)]

+∑n,m

∂ym

lnZt−mπn(d2C(A)[dA1, dA2]).

Under assumptions (A), we can permute expectation and derivations (see [9]). In the Markovian setting moreover, theasymptotic Fisher Information is the Information for one observation. Hence for any antisymmetric matrix dA = (an,i ),

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

Statistics 119

it writes I [dA, dA] = Ed2l[dA, dA] = T1 + T2, with

T2 = E

∑n,m

∂ym

lnZt−mπn(dA2) =∑n,m

Cov

[∂

∂ym

ln; Zt−m

]πn(dA2)

= E

∑n,m

Cov

(∂

∂ym

ln; Znt−m

)(πn(dA2))n = −

∑n,m

(∑i

a2n,i

)Cov

(∂

∂ym

ln; Znt−m

)

because Cov ((∂/∂ym)ln; Zit−m) = 0 when i �= n, and (πn(dA2))n = − ∑

i a2n,i and

T1 =∑

n,m,m′E

∂2

∂ym.∂ym′lnπn(dC(A))′Z′

t−mZt−mπn(dC(A))

=∑

n,m,m′E

∂2

∂ym∂ym′ln

(∑i

an,iZit−m

) (∑i

an,iZit−m′

)

=∑

n,m,m′E

∂2

∂ym ∂ym′lnE

(∑i

an,iZit−m

) (∑i

an,iZit−m′

)

Because an,n = 0. The second expectation term moreover writes

E

(∑i

) (∑i

)= E

(∑i

an,iZit−m

) (∑i

an,iZit−m′

)=

∑i,i′

an,i .an,i′EZit−m.Zi′

t−m′

=∑i,i′

an,i .an,i′EZit−m EZi′

t−m′ +∑

i

a2n,i Cov(Zi

t−m; Zit−m′ ).

Finally, T1 = T11 + T12 with

T11 =∑

n,m,m′E

∂2

∂ym.∂ym′ln

(∑i

an,iE(Zi)

)2

,

T12 =∑

n,i,m,m′E

∂2

∂ym.∂ym′lna2

n,iρim−m′ =

∑n,i

a2n,i 1′(E∂2ln � ρi)1,

where ρi is the autocovariance matrix (of size [Mn, Mn]) of the factor Zi , � denotes the Hadamard product (term byterm, it retains ) and 1 is a vector of ones. Note that T11 ≥ 0 is a bias term that could cancel, so that strict positivity canonly come from the others. Denote �n = −E(∂2/∂ym.∂ym′ )ln. Finally, −d2l[dA, dA] ≥ ∑

n,i>n a2n,iwn,i with

wn,i = 1′(�n � ρi + �i � ρn) 1 +∑m

Cov

(∂

∂ym

ln; Znt−m

)+

∑m

Cov

(∂

∂ym

li ; Zit−m

).

So that I 0 ⇔ ∀i > n, wn,i > 0. Lemma A7 further simplifies the condition to

I 0 ⇔ ∀i > n, 1′(�n � ρi)1 + 1′(�i � ρn)1 > 2.

Cauchy–Schwartz inequality gives

1 = Cov 2[

∂y0ln; Zn

t

]≤ Var

(∂

∂y0ln

)Var (Zn

t ) ⇒ �n1,1 ≥ (σ n)2.

When factors have no autocorrelation (which includes ARCH factors), ρi = σ i Id and the condition turn to

I 0 ⇔ ∀i > n, (σ i)2 T r�n + (σ n)2 Tr �i > 2.

Finally,

∀i �= n,σ i

σ n�= 1 ⇒

(σ i

σn

)2

+(

σn

σ i

)2

> 2 ⇒ (σ i )2 σn1,1 + (σ n)2 σ i

1,1 > 2 ⇒ I 0. �

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14

120 N. Mayo

Appendix 5: Other Assumptions

(A4) and (A6) the result is a direct application of the following lemma:

Lemma A8 For any function f : y �→ f (y), let gY : b �→ f (Y b) defined for b in the unit ball. Then f is Ck ⇔∀Y ∈ RN ,

gY is Ck .

Proof ⇒ is obvious from the composition theorem. For ⇐, we prove that f is Ck around some y by lettingY = (y, 1, 0, . . . , 0), b : β �→ (

√1 − β2, β, 0, . . . , 0) and y′ : β �→ Y b(β) = √

1 − β2 x + β. Then for some open ballsaround β = 0, b(β) and y, b() and y′() are C∞ diffeomorphisms, so that f = gY (b−1 ◦ y′−1(β)) is Ck . �

Proposition A5 (Principal Component Analysis)

1. Diagonalization of a positive matrix• ∀� ∈ SN+(R)∃B ∈ O(RN)∃ ∈ DN+(R)/� = O ′ O.• The term in D are the eigenvalues of �. Let I = {n ∈ 1..N/ n,n has multiplicity 1}. Let OI be the lines of O the

index of which belongs to I, and I = ( i,i )i∈I .• (OI , I ) is unique up to the sign and permutations of the lines of OI . When the eigenvalues are ordered (such

order is unique on indices in I ), OI is unique up to the sign of its lines.2. Marginal PCA of a (2nd order) stationary process Y

• pca(Y ) = (O, ) is the diagonalization of the marginal covariance matrix � = Var(Y ) and pcaI (Y ) = (OI , I )

the identifiable part of the PCA.• Factors are the coordinates of Y in the basis O and are uncorrelated.

3. PCA consistency• The pcaI function is C1 (because the spectral projectors are polynomial in � and hence infinitely derivable. Since

the dimension of eigenspaces with index in I is 1, each of these eigenspaces is in bijection with an unit eigenvector(the sign of which is chosen arbitrary)).

• If (H2a) holds, the empirical covariance matrix is a consistent � estimator of � and Pca = pcaI (�) is a consistentestimator of pcaI (Y ).

• If (H2b) holds, � is a consistent and asymptotically normal estimator of �.• Pca = pcaI (�) is a normal and consistent estimator of pcaI (Y ) (the delta method applies).

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

05:

53 0

4 Ja

nuar

y 20

14