57
Large-N and Large-T Properties of Panel Data Estimators and the Hausman Test Seung Chan Ahn †‡ Department of Economics Arizona State University Hyungsik Roger Moon Department of Economics University of Southern California This Version: August 2003 Abstract This paper examines the asymptotic properties of the popular within, GLS estimators and the Hausman test for panel data models with both large numbers of cross-section (N) and time-series (T) observations. The model we consider includes the regressors with deterministic trends in mean as well as time invariant regressors. If a time-varying regressor is correlated with time invariant regressors, the time series of the time- varying regressor is not ergodic. Our asymptotic results are obtained con- sidering the dependence of such non-ergodic time-varying regressors. We nd that the within estimator is as ecient as the GLS estimator. Despite this asymptotic equivalence, however, the Hausman statistic, which is es- sentially a distance measure between the two estimators, is well dened and asymptotically χ 2 -distributed under the random eects assumption. We would like to thank Geert Ridder for helpful discussions. We also appreciatethe com- ments of seminarparticipants at Arizona State University, the University of British Columbia, and the University of California, Davis. Corresponding Author: Seung C. Ahn, Department of Economics, Arizona State Univer- sity, Tempe, AZ 85287; tel) 480-965-6574; email) [email protected]. The rst author gratefully acknowledges the nancial support of the College of Business and Dean’s Council of 100 at Arizona State University, the Economic Club of Phoenix, and the alumni of the College of Business. 1

Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

Large-N and Large-T Properties of

Panel Data Estimators and the

Hausman Test∗

Seung Chan Ahn†‡

Department of EconomicsArizona State University

Hyungsik Roger MoonDepartment of Economics

University of Southern California

This Version: August 2003

Abstract

This paper examines the asymptotic properties of the popular within,GLS estimators and the Hausman test for panel data models with bothlarge numbers of cross-section (N) and time-series (T) observations. Themodel we consider includes the regressors with deterministic trends inmean as well as time invariant regressors. If a time-varying regressoris correlated with time invariant regressors, the time series of the time-varying regressor is not ergodic. Our asymptotic results are obtained con-sidering the dependence of such non-ergodic time-varying regressors. Wefind that the within estimator is as efficient as the GLS estimator. Despitethis asymptotic equivalence, however, the Hausman statistic, which is es-sentially a distance measure between the two estimators, is well definedand asymptotically χ2-distributed under the random effects assumption.

∗We would like to thank Geert Ridder for helpful discussions. We also appreciate the com-ments of seminar participants at Arizona State University, the University of British Columbia,and the University of California, Davis.

†Corresponding Author: Seung C. Ahn, Department of Economics, Arizona State Univer-sity, Tempe, AZ 85287; tel) 480-965-6574; email) [email protected].

‡The first author gratefully acknowledges the financial support of the College of Businessand Dean’s Council of 100 at Arizona State University, the Economic Club of Phoenix, andthe alumni of the College of Business.

1

Page 2: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

1 Introduction

Error-components models have been wisely used to control for unobservablecross-sectional heterogeneity in panel data with a large number of cross-sectionunits (N) and a small number of time-series observations (T ). These mod-els assume that stochastic error terms have two components: an unobservabletime-invariant individual effect which captures the unobservable individual het-erogeneity and the usual random noise. The most popular estimation methodsfor error-components models are the within and the generalized least squares(GLS) estimators. An important advantage of using the within estimator (leastsquares on data transformed into deviations from individual means) is that it isconsistent even if regressors are correlated with the individual effect. However,a serious defect of the estimator is its inability to estimate the impact of time-invariant regressors.1 The GLS estimator is often used in the literature as atreatment of this problem, but it is not without its own defect: The consistencyof the GLS requires a strong assumption that no regressor is correlated with theeffect (random effects assumption). Use of the estimator thus requires a sta-tistical test that can empirically validate this strong assumption. A Hausmanstatistic (1978) is commonly used for this purpose (e.g., Hausman and Taylor,1981; Cornwell and Rupert, 1988; or Baltagi and Khanti-Akom, 1990).This paper studies the asymptotic properties of the within, GLS estimators

and the Hausman statistic for a general panel data error-components modelwith both large N and T . The GLS estimator has been known to be asymptoti-cally equivalent to the within estimator for the cases with infinite N and T (see,for example, Hsiao, Chapter 3, 1986; Matyas and Sevestre, Chapter 4, 1992;and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using a sequential limit method (T →∞ followed by N →∞) and somestrong assumptions such as fixed regressors. This result naturally raises severalquestions regarding the asymptotic properties of the Hausman test. First, doesthe equivalence result hold for more general cases? Second, does the equiva-lence result indicate that the Hausman statistic, which is essentailly a distancemeasure between the within and GLS estimators, should have a degeneratingor nonstandard asymptotic distribution under the random effects assumption?Third, does the equivalence result also imply that the Hausman test would havelow power to detect any violation of the random effects assumption when T islarge? This paper is concerned to answer these questions.Panel data with a large number of time-series observations have been increas-

ingly more available in recent years in many economic fields such as internationalfinance, finance, industrial organization, and economic growth. Furthermore,popular panel data, such as the Panel Study of Income Dynamics (PSID) and

1Estimation of the effect of a certain time invariant variable on a dependent variablecould be an important task in a broad range of empirical research. Examples would be thelabor studies about the effects of schooling or gender on individual workers’ earnings, andthe macroeconomic studies about the effect of a country’s geographic location (e.g., whetherthe country is located in Europe or Asia) on its economic growth. The within estimator isinappropriate for such studies.

2

Page 3: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

the National Longitudinal Surveys (NLS), contain increasingly more time-seriesobservations as they are updated regularly over the years. Consistent with thistrend, some recent studies have examined the large-N and large-T propertiesof the within and GLS estimators for error-component models.2 For example,Phillips and Moon (1999) and Kao (1999) establish the asymptotic normalityof the within estimator for the cases in which regressors follow unit root pro-cesses. Extending these studies, Choi (1998) considers a general random effectsmodel which contains both unit-root and covariance-stationary regressors. Forthis model, he derives the asymptotic distributions of both the within and GLSestimators.This paper is different from the previous studies in three respects. First, the

model we consider contains both time-varying and time-invariant regressors.The time-varying regressors are cross-sectionally heteroskedastic or homoge-neous. Analyzing this model, we study how cross-sectional heterogeneity andthe covariance structure between the time-varying and time-invariant regressors,as well as time trends in regressors, would affect the convergence rates of thepanel data estimators. Second, we examine how the large-N and large-T asymp-totic equivalence of the within and GLS estimators influences the asymptoticand finite-sample performances of the Hausman test. Third, and perhaps lessimportantly, we address a technical problem of obtaining limiting distributionswhen both N and T are large: When time-varying regressors are autocorrelatedwith arbitrary autocovariance structure and correlated with time-invariant re-gressors, they could fail to be ergodic. For such cases, we consider sufficientconditions under which limiting distributions of the panel data estimators andthe Hausman test can be obtained.This paper uses the joint limit approach developed by Phillips and Moon

(1999). Using this aproach, we obtain the asymptotic results that hold asN,T → ∞ regardless of the relative sizes of N and T , under some high levelassumptions (such as uniform convergence conditions). Thus, the theories de-veloped in this paper suppliment the previous approaches based on sequentiallimits.The main findings of this paper are as follows. First, consistent with the

previous studies, we find that the within and GLS estimators (of the coefficientson time-varying regressors) are asymptotically equivalent under quite generalconditions. Nonetheless, the Hausman statistic has well-defined asymptotic dis-tributions under the random effects assumption, or under its local alternatives.Second, the convergence rate of the Hausman statistic is closely related withthat of the between estimator (least square on data transformed into individ-ual means). Not surprisingly, the convergence rates of the within, GLS andbetween estimators depend on whether or not time-varying regressors contain

2Some other studies have considered different panel data models with large N and largeT . For example, Levin and Lin (1992, 1993), Quah (1994), Pesaran, Shin and Im (1997),and Higgin and Zakrajsek (1999) develop unit-root tests for data with large N and large T .Alvarez and Arellano (1998) and Hahn and Kuersteiner (2000) examine the large-N and large-T properties of generalized method of moments (GMM) and within estimators for stationarydynamic panel data models.

3

Page 4: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

trends. But we find that the convergence rate of the between estimator, as wellas the Hausman statistic, depends on two other factors: (i) whether means oftime-varying regressors are cross-sectionally heterogenous or homogenous, and(ii) how the time-varying and time-invariant regressors are correlated. Finally,we find that the power of the Hausman test is generally positively related withthe size of T except for some special cases.This paper is organized as follows. Section 2 introduces the panel model of

interest here, and defines the within, between, GLS estimators and the Hausmantest. For several simple illustrative models, we derive the asymptotic distribu-tions of the panel data estimators and the Hausman test statistic. We showthat the convergence rates of the estimators and the Hausman test statistic aresensitive to the unknown data generating structure. Section 3 report the re-sults from our Monte Carlo experiments. In Section 4, we provide our generalasymptotic results. Concluding remarks follow in section 5. All the technicalderivations and proofs are presented in the Appendix.

2 Preliminaries

2.1 Estimation and Specification Test

The model under discussion here is given:

yit = β0xit + γ0zi + ζ + εit = δ0wit + ζ + εit; εit = ui + vit, (1)

where i = 1,..., N denotes cross-sectional (individual) observations, t = 1,...,

T denotes time, wit = (xit, zi)0, and δ = (β0, γ0)0. In model (1), xit is a k × 1

vector of time-varying regressors, zi is a g × 1 vector of time-invariant regres-sors, ζ is an overall intercept term, and the error εit contains a time-invariantindividual effect ui and random noise vit. We consider the case of both largenumbers of individual and time series observations, so asymptotic properties ofthe estimators and statistics for model (1) apply as N,T → ∞. The ordersof convergence rates of some estimators depend on whether or not the modelcontains an overall intercept term. This problem will be addressed later.We assume that data are distributed independently (but not necessarily

identically) across different i, and that the vit are independently and identicallydistributed (i.i.d.) with var(vit) = σ2v. We further assume that ui, xi1, ..., xiTand zi are strictly exogenous with respect to vit; that is, E(vit | ui, xi1, ..., xiT )= 0,for any i and t. This assumption rules out the cases in which the setof regressors includes lagged dependent variables or predetermined regressors.Detailed assumptions about the regressors xi1, ..., xiT , zi will be introduced later.For convenience, we adopt the following notational rule: For any p×1 vector

ait, we denote ai =1T

Pt ait; eait = ait − ai; a = 1

N

Pi ai; eai = ai − a. Thus,

for example, for wit = (x0it, z0i)0, we have wi = (x0i, z

0i)0; ewit = (ex0it, 01×g)0;

w = (x0, z); ewi = ((xi − x)0, (zi − z)0)0.When the regressors are correlated with the individual effect, the OLS es-

timator of δ is biased and inconsistent. This problem has been traditionally

4

Page 5: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

addressed by the use of the within estimator (OLS on data transformed intodeviations from individual means):

bβw = (Pi,t exitex0it)−1Pi,t exitey0it.Under our assumptions, the variance-covariance matrix of the within estimatoris given:

V ar(bβw) = σ2ε(Pi,t exitex0it)−1. (2)

Although the within method provides a consistent estimate of β, a seriousdefect is its inability to identify γ, the impact of time-invariant regressors. Apopular treatment of this problem is the random effects (RE) assumption underwhich the ui are random and uncorrelated with the regressors:

E(ui | xi1, ..., xiT , zi) = 0. (3)

Under this assumption, all of the parameters in model (1) can be consistentlyestimated. For example, a simple but consistent estimator is the between esti-mator (OLS on data transformed into individual means):

bδb = (bβ0b, bγ0b)0 = (Pi ewi ew0i)−1Pi ewieyi.However, as Balestra and Nerlove (1966) suggest, under the RE assumption,an efficient estimator is the GLS estimator of the following form:bδg = [Pi,t ewit ew0it + Tθ2T Pi ewi ewi0]−1[Pi,t ewiteyi + Tθ2T Pi ewieyi],where θT =

pσ2v/(Tσ

2u + σ2v). The variance-covariance matrix of this estimator

is given:V ar(bδg) = σ2v[

Pi,t ewit ew0it + Tθ2T Pi ewi ewi0]−1. (4)

For notational convenience, we assume that σ2u and σ2v are known, while inpractice they must be estimated.3

An important advantage of the GLS estimator over the within estimator isthat it allows researchers to estimate γ. In addition, the GLS estimator of βis more efficient than the within estimator of β, because [V ar(bβw) − V ar(bβg)]is positive definite so long as θT > 0. Despite these desirable properties, it isimportant to notice that the consistency of the GLS estimator crucially dependson the RE assumption (3). Accordingly, the legitimacy of the RE assumptionshould be tested to justify the use of GLS. In the literature, a Hausman test(1978) has been widely used for this purpose (e.g., Hausman and Taylor, 1981;Cornwell and Rupert, 1988; or Baltagi and Khanti-Akom, 1990). The statisticused for this test is a distance measure between the within and GLS estimatorsof β:

HMNT ≡ (bβw − bβg)0[V ar(bβw)− V ar(bβg)]−1(bβw − bβg). (5)

3The conventional estimates are given:bσ2v =Pi,t(eyit − exitbβw)2/[N(T − 1)]; bσ2u =Pi,t(eyi − ewibδols)2/NT − bσ2v,where bδols is the OLS estimator of δ.

5

Page 6: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

For the cases in which T is fixed and N →∞, the RE assumption warrants thatthe Hausman statistic HMNT is asymptotically χ

2-distributed with degrees offreedom equal to k. This result is a direct outcome of the fact that for fixed T ,the GLS estimator bβg is asymptotically more efficient than the within estimatorbβw, and that the difference between the two estimators is asymptotically normal;specifically, as N →∞,

√NT (bβw − bβg) =⇒ N(0, plimN→∞NT [V ar(bβw)− V ar(bβg)]), (6)

where “=⇒” means “converges in distribution.”An important condition that guarantees (6) is that θT > 0. If θT = 0,

then bβw and bβg become identical and the Hausman statistic is not defined.Observing θT → 0 as T → ∞, we can thus easily conjecture that bβw and bβgshould be asymptotically equivalent as T → ∞. This observation naturallyraises two issues related with the asymptotic properties of the Hausman test asT →∞. First, since the Hausman statistic is asymptotically χ2−distributed forany fixed T under the RE assumption, we can conjecture that it should remainasymptotically χ2−distributed even if T →∞. Thus, we wish to understand thetheoretical link between the asymptotic distribution of the Hausman statisticand the equivalence of the within and GLS estimators. Second, it is a well-known fact that the GLS estimator bβg is a weighted average of the within andbetween estimators bβw and bβb (Maddala, 1971). Thus, observing the form ofthe Hausman statistic, we can conjecture that the Hausman test should hasthe power to detect any violation of the RE assumption that causes biases inbβb. However, the weight given to bβb in bβg decreases with T . So, we wish tounderstand how the power of the Hausman test would be related to the size ofT . We will address these two issues in the following sections.What makes it complex to investigate the asymptotic properties of the Haus-

man statistic is that its convergence rate crucially depends on data generatingprocesses. The following subsection considers several simple cases to illustratethis point.

2.2 Preliminary Results

This section considers several simple examples demonstrating that the conver-gence rate of the Hausman statistic crucially depend on (i) whether or not thetime-varying regressors are cross-sectionally heteroskedastic, and (ii) how thetime-varying and time-invariant regressors are correlated.For model (1), we can easily show thatbβw − β = A−1NTaNT ; (7)bβb − β = (BNT − CNTH−1N C 0NT )

−1[bNT − CNTH−1N cNT ]; (8)

bβg − β = [ANT + Tθ2T (BNT − CNTH−1N C0NT )]

−1 (9)

×[ANT (bβw − β) + Tθ2T (BNT − CNTH−1N C 0NT )(bβb − β)];

6

Page 7: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

bβw − bβg = [ANT + Tθ2T (BNT − CNTH−1N CNT )]

−1

×Tθ2T (BNT − CNTH−1N C 0NT )[(bβw − β)− (bβb − β)]; (10)

V ar(bβw)− V ar(bβg) = A−1NT − [ANT + Tθ2T (BNT − CNTH−1N C 0NT )]−1, (11)

where,

ANT =Pi,t exitex0it;BNT =Pi exiex0i;CNT =Pi exiez0i;HN =Pi eziez0i;

aNT =Pi,t exitvit; bNT =Pi exi(ui + vi); cNT =Pi ezi(ui + vi).

Equation (10) provides some insight into the convergence rate of the Hausman

test statistic. Note that (bβw−bβg) depends on both (bβw−β) and (bβb−β). Appar-ently, the between estimator bβb exploits only N between-individual variations,

while the within estimator bβw is computed based on N(T −1) within-individualvariations. Accordingly, (bβb−β) converges to a zero vector in probability muchslower than (bβw − β) does. Thus, we can conjecture that the convergence rate

of (bβw − bβg) will depend on that of (bβb − β), not (bβw − β). Indeed, we belowjustify this conjecture.In this subsection, we only consider a simple model which has a single time-

varying regressor (xit) and a single time-invariant regressor (zi). Accordingly, allof the terms defined in (7)-(11) are scalars. We consider asymptotics under theRE assumption (3). The power properties of the Hausman test will be discussedin the following subsection. To save space, we only consider the estimators ofβ and the Hausman test. The asymptotic distributions of the estimators ofγ will be discussed in Section 4. Throughout the examples below, we assumethat the zi are i.i.d. over different i with N(0,σ

2z). In addition, we introduce

a notation eit to denote a white noise component in the time-varying regressorxit. We assume that the eit are i.i.d. over different i and t with N(0,σ

2e), and

are uncorrelated with the zi.

CASE A: We here consider a case in which in which the time-varyingregressor xit is stationary without trend. Specifically, we assume:

xit = Θa,i +Ψa,t + eit, (12)

where Θa,i and Ψa,t are fixed individual-specific and time-specific effects, re-spectively. Define Θa =

1N

PiΘa,i and Ψa =

1N

PiΨi; and let

pa,1 = limN→∞Θa; pa,2 = limN→∞1

N

Pi

¡Θa,i −Θa

¢2;

qa,1 = limN→∞Ψa; qa,2 = limT→∞1

T

Pt

¡Ψa,t −Ψa

¢2.

We can allow them to be random without changing our results, but at the costof analytical complexity. We consider two possible cases: one in which theparameters Θa,i are heterogeneous, and the other in which they are constant

7

Page 8: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

over different individuals. Allowing the Θa,i to be different across differentindividuals, we allow the means of xit to be cross-sectionally heteroskedastic.In contrast, if the Θa,i are constant over different i, the means of xit becomecross-sectionally homogeneous. As we show below, the convergence rates of thebetween estimator and Hausman test statistic are different in the two cases.To be more specific, consider the three terms BNT , CNT , and bNT defined

below (8). A straightforward algebra reveals that

BNT =P

i(Θa,i −Θa)2 + 2Pi(Θa,i −Θa)(ei − e) +

Pi(ei − e)2;

CNT =P

i(Θa,i −Θa)(zi − z)+P

i(ei − e)(zi − z);

bNT =Pi(Θa,i −Θa)ui +

Pi(ei − e)ui

+Pi(Θa,i −Θa)vi +

Pi(ei − e)vi.

It can be shown that the terms including (Θa,i−Θa) will be the dominant factorsdetermining the asymptotic properties of BNT , CNT , and bNT . However, if theparameters Θa,i are constant over different individuals so that Θa,i − Θa =0, none of BNT , CNT , and bNT depends on (Θa,i − Θa). For this case, theasymptotic properties of the three terms depend on (ei−e). This result indicatesthat the asymptotic distribution of the between estimator bβb, which is a functionof BNT , CNT , and bNT , will depend on whether the parameters Θa,i are cross-sectionally heteroskedastic or homoskedastic.4

We now consider the asymptotic distributions of the within, between, GLSestimators and the Hausman statistic under the two alternative assumptionsabout the parameters Θa,i.

CASE A.1: Assume that the Θa,i vary across different i; that is, pa,2 6= 0.With this assumption, we can easily show that as (N,T →∞):5

√NT (bβw − β) =⇒ N

µ0,

σ2v(qa,2 + σ2e)

¶; (13a)

√N(bβb − β) =⇒ N

µ0,

σ2upa,2

¶; (14)

√NT (bβg − β) =

√NT (bβw − β)

+1√T

σ2vσ2u

pa,2(qa,2 + σ2e)

√N(bβb − β) + op

µ1√T

¶; (15)

4Somewhat interestingly, however, the distinction between these two cases becomes unim-portant when the model has no intercept term (ζ = 0) and is estimated with this restriction.For such a case, BNT , CNT and bNT depend on xi instead of exi. With xi, the terms (Θi−Θ)and (ei− e) in BNT , CNT , and bNT are replaced by Θi and ei, respectively. Thus, the termscontaining the Θi remains as a dominating factor whether or not the Θi are heterogenous.

5These result are obtained utilizing the fact that limits of 1NT

ANT ,1NBN ,

1NCN ,and

1NHN

are finite, while 1√NT

aNT ,1√NbN , and

1√NcNT are asymptotically normal. More detailed

calcuations can be found from an earlier version of this paper.

8

Page 9: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

√NT 2(bβw − bβg) = −σ

2v

σ2u

pa,2(qa,2 + σ2e)

√N(bβb − β) + op(1)

=⇒ N

µ0,

σ4vσ2u

pa,2(qa,2 + σ2e)

2

¶; (16)

plimN,T→∞NT 2[V ar(bβw)− V ar(bβg)] = σ4vσ2u

pa,2(qa,2 + σ2e)

2. (17)

Two remarks follow. First, consistent with previous studies, we find from (15)

that the within and GLS estimators, bβw and bβg, are √NT−equivalent in thesense that (bβw − bβg) is op(√NT ). This is so because the second term in the

right-hand side of (15) is Op(1/√T ). Nonetheless, from (??), we can see that

(bβw−bβg) is Op(√NT 2) and asymptotically normal. These results indicate thatthe within and GLS estimators are equivalent to each other by the order of

√NT ,

but not by the order of√NT 2. Second, from (??) and (17), we can see that the

Hausman statistic is asymptotically χ2−distributed with the convergence rateequal to

√NT 2.6 In particular, (??) indicates that the asymptotic distribution

of the Hausman statistic is closely related with the asymptotic distribution ofthe between estimator bβb.CASE A.2: Now, we consider the case in which the Θa,i are constant

over different i (Θa); that is, pa,2 = 0. It can be shown that the asymptoticdistributions of the within and GLS estimators are the same under both CASEA.1 and CASE A.2. However, the asymptotic distributions of the betweenestimator bβb and the Hausman statistic are different under CASEs A1. andA2.7 Specifically, for CASE A.2, we can show that as (N,T →∞),r

N

T(bβb − β) =⇒ N

µ0,σ2uσ2e

¶; (18)

√NT 3(bβw − bβg) = −σ

2v

σ2u

σ2e(qa,2 + σ2e)

rN

T(bβb − β) + op(1)

=⇒ N

µ0,

σ4vσ2u

σ2e(qa,2 + σ2e)

2

¶; (19)

plimN,T→∞NT 3[V ar(bβw)− V ar(bβg)] = σ4vσ2e

σ2u(qa,2 + σ2e)2, (20)

6As we see clearly from (19) and (17), the Hausman statistic does not depend on√NT 2,

because it cancels out. Nonetheless, we say that the convegence rate of the Hausman teststatistic equals

√NT 2, because the asymptotic χ2 result for the statistic is obtained based on

the fact that (bβw − bβg) is √NT 2−consistent.7If the model contains no intercept term (ζ = 0) and it is estimated with this restriction,

all of the results (13a) - (17) are still valid with Θa replacing pa,2. Thus, the convergencerates of the within, between, GLS estimators and the Hausman statistic are the same underboth CASEs A.1 and A.2.

9

Page 10: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

Several comments follow. First, observe that differently from CASE A.1, the

between estimator bβb is now qNT −consistent. An interesting result is obtained

when N/T → c < ∞. For this case, the between estimator is inconsistent, al-though it is still asymptotically unbiased. This implies that the between estima-tor is an inconsistent estimator for the analysis of cross-sectionally homogeneouspanel data unless N is substantially larger than T . Second, the convergence rateof (bβw−bβg), as well as that of the Hausman statistic, is different between CASEA.1 and A.2. Notice that the convergence rate of (bβw− bβg) is √NT 3 for CASEA.2, while it is

√NT 2 for CASE A.1. Thus, (bβw − bβg) converges in proba-

bility to zero much faster in CASE A.2 than in CASE A.1. Nonetheless, theHausman statistic is asymptotically χ2-distributed in both cases, though withdifferent convergence rates.

We can obtain the similar results as in CASE A, even if the time-varyingregressor xit contains a time trend. For example, consider a case in which thetime-varying regressor xit contains a time trend of order m:

xit = Θitm + eit, (21)

where the parameters Θi are fixed.8 Not surprisingly, for this case, we can show

that the within and GLS estimators are superconsistent and Tm√NT−equivalent.

However, the convergence rates of the between estimator bβb and the Hausmantest statistic crucially depend on whether the parameters Θi are heterogenousor homogeneous. When, the parameters Θi are heterogeneous over differenti, the between estimator bβb is Tm√N−consistent, while the convergence rateof the Hausman statistic equals Tm

√NT 2. In contrast, somewhat surprisingly,

when the parameters Θi are heterogeneous over different i, the estimator bβbis no longer superconsistent. Instead, it is

qNT −consistent as in CASE A.2.

The convergence rate of the Hausman statistic changes to T 2m√NT 3.9 This

example demonstrates that the convergence rates of the between and the Haus-man statistic crucially depends on whether means of time-varying regressors arecross-sectionally heterogenous or not.

CASE B: So far, we have considered the cases in which the time-varyingregressor xit and the time invariant regressor zi are uncorrelated. We nowexamine the cases in which this assumption is relaxed. The degree of the cor-relation between the xit and zi may vary over time. As we demonstrate below,the asymptotic properties of the panel data estimators and the Hausman teststatistic depend on how the correlation varies over time. The basic model weconsider here is given by

xit = Πizi/tm + eit, (22)

8We can consider a more general case: for example, xit = aitm + Θi + Θt + bizi + eit.

However, the same asymptotic results apply to this general model. This is so because thetrend term (tm) dominates asymptotics.

9Detailed asymptotic results can be found from an earlier version of this paper.

10

Page 11: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

where the Πi are individual-specific fixed parameters, and m is a non-negativereal number.10 Observe that because of the presence of the Πi, the xit are noti.i.d. over different i.11 The correlation between xit and zi decreases over timeif m > 0. In contrast, m = 0 implies that the correlation remains constant overtime. For CASE B, the within and GLS estimators are always

√NT−consistent

regardless of the size of m. Thus, we only report the asymptotic results for thebetween estimator bβb and the Hausman statistic.We examine three possible cases: m ∈ ( 12 ,∞], m = 1

2 , and m ∈ [0, 12). Wedo so because, depending on the size of m, one (or both) of the two terms eitand Πizi/t

m in xit becomes a dominating factor in determining the convergence

rates of the between estimator bβb and the Hausman statistic HMTN .

CASE B.1: Assume thatm ∈ (12 ,∞]. This is the case where the correlationbetween xit and zi fades away quickly over time. Thus, one could expect thatthe correlation between xit and zi (through the term Πizi/t

m) would not playany important role in asymptotics. Indeed, a straightforward algebra, whichis not reported here, justifies this conjecture: The term eit in xit dominatesΠizi/t

m asymptotics, and thus, this is essentially the same case as CASE A.2.12

CASE B.2: We now assume m = 12 . For this case, define Π =

1N

PiΠi;

pb,1 = limN→∞Π; pb,2 = limN→∞1

N

Pi

¡Πi −Π

¢2,

and qb = limT→∞ 1T1−m

R 10r−mdr = 1

1−m for m ≤ 12 . With this notation, a

little algebra shows that as (N,T →∞),rN

T(bβb − β) =⇒ N

µ0,

σ2upb,2q2bσ

2z + σ2e

¶.

Observe that the asymptotic variance of the between estimator bβb depends onboth the terms σ2e and pb,2q

2bσ

2z. That is, both the terms eit and Πizi/t

m in

xit are important in the asymptotics of the between estimator bβb. This implies10we can consider a more general model:

xit = Θb,i +Ψb,t +Πizi/tm + eit,

where the Θb,i and Ψi are individual- and time-specific fixed parameters, respectively. Whenthe parameters Θb,i are cross-sectional heterogeneous, the asymptotic results are essentiallythe same as those we obtain for CASE A.1. This is so because the terms Θb,i dominateand the terms Πizi/t

m become irrelevant in asymptotics. Thus, we set Θb,i = 0 for all i.In addition, we set Ψb,t = 0 for all t, because presence of the time effects is irrelevant forconvergence rates of panel data estimators and the Hausman statistic.11We here assume that the Πi are cross-sectionally heteroskedastic. For the cases in which

the Πi are the same for all i, bβb does not depend on Πizi/tm, and we obtain exactly the same

asymptotic results as those for CASE A.2. This is due to the fact that the individual meanof the time-varying regressor xi becomes a linear function of the time invariant regressor zi ifthe Πi are the same for all i.12We can obtain this result using the fact that limT→∞ 1√

T

Pt t−m = 0, if m > 1

2.

11

Page 12: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

that the correlation between the xit and zi, when it decreases reasonably slowlyover time, matters for the asymptotic distribution of the between estimator bβb.Nonetheless, the convergence rate of bβb is the same as that of bβb for CASEs A.2and B.1. We can also show

√NT 3(bβw − bβg) = −σ

2v

σ2u

pb,2q2bσ

2z + σ2e

σ2e

rN

T(bβb − β) + op(1)

=⇒ N

µ0,

σ4vσ2u

pb,2q2bσ

2z + σ2e

σ4e

¶;

plimN,T→∞NT 3[V ar(bβw)− V ar(bβg)] = σ4vσ2u

pb,2q2bσ

2z + σ2e

σ4e,

both of which imply that the Hausman statistic is asymptoticallyχ2−distributed.

CASE B.3: Finally, we consider the case in which m ∈ [0, 12), where thecorrelation between xit and zi decays over time slowly. Note that the correlationremains constant over time if m = 0. We can showr

N

T 2m(bβb − β) =⇒ N

µ0,

σ2upb,2q2bσ

2z

¶.

Observe that the asymptotic distribution of bβb no longer depends on σ2e. This

implies that the term eit in xit dominates Πizi/tm in the asymptotics for bβb.

Furthermore, the convergency rate of bβb now depend on m. Specifically, so longas m < 1

2 , the convergence rate increases as m decreases. In particular, whenthe correlation between xit and zi remains constant over time (m = 0), the

between estimator bβb is √N−consistent as in CASE A.1. Finally, the followingresults indicate that the convergence rate of the Hausman statistic HMNT alsodepends on m:

√NT 2m+2(bβw − bβg) = −σ

2v

σ2u

pb,2q2bσ

2z

σ2e

rN

T 2m(bβb − β) + op (1)

=⇒ N

µ0,

σ4vσ2u

pb,2q2bσ

2z

σ4e

¶;

plimN,T→∞NT 2m+2[V ar(bβw)− V ar(bβg)] = σ4vσ2u

pb,2q2bσ

2z

σ4e.

2.3 Power Properties of the Hausman Test

In this section, we consider the asymptotic power properties of the Hausmantest for the special cases discussed in the previous subsection. To do so, weneed to specify a sequence of local alternative hypotheses for each case. Amongmany, we consider the alternative hypotheses under which the conditional meanof ui is a linear function of the regressors exi and ezi.

12

Page 13: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

We list in Table 1 our local alternatives and asymptotic results for Cases A.1-B.3. For all cases, we assume that var(ui|exi, ezi) = σ2u for all i. The parametersλx and λz are nonzero real numbers. Observe that for Cases A.2-B.3, we use√T exi or Tmexi instead of exi. We do so because, for those cases, plimT→∞exi = 0.

The third column indicates the convergence rates of the Hausman test, whichare the same as those obtained under the RE assumption. Under the localalternatives, the Hausman statistic asymptotically follow a noncentral χ2 dis-tribution. The noncentral parameters of the test for individual cases are listedin the fourth column of Table 1.

<Table 1>Case Local alternatives Convergence rate Noncentral parameter

A.1 E(ui|exi, ezi) = exi λx√N + ezi λz√N13

√NT 2

pa,2λ2x

σ2u

A.2 E(ui|exi, ezi) = √T exi λx√N + ezi λz√N

√NT 3

σ2eλ2x

σ2u

B.1 E(ui|exi, ezi) = √T exi λx√N + ezi λz√N

√NT 3

σ2eλ2x

σ2u

B.2 E(ui|exi, ezi) = √T exi λx√N + ezi λz√N

√NT 3

(4pb,2σ2z+σ2e)λ

2x

σ2u

B.3 E(ui|exi, ezi) = Tmexi λx√N + ezi λz√N

√NT 2m+2

³( 11−m)

2pb,2σ

2z

´λ2x

σ2u

A couple of comments follow. First, the noncentral parameter of the Haus-man statistic does not depend on λz for any cases. This indicates that theHausman test has no power to detect nonzero correlation between the effect uiand the time invariant regressor zi. In fact, we can easily show that this resultholds regardless of the size of T . This restrictive result is due to our choice of lo-cal alternatives. To see this, assume that E(ui|exi, ezi) = eziλz; that is, the exi arenot correlated with ui, conditional on zi. For this case, the between estimatorsof β and γ, bβb and bγb, are equivalent to least squares on the model

eyi = β0exi + (γ + λz)0ezi + (u∗i + evi),

where u∗i = ui − E(ui|exi, ezi). From this, we can easily see that bβb and bγb areasymptotically unbiased estimators of β and (γ + λz), respectively. That is,bγb is not an asymptotically unbiased estimator of γ. As we have discussed inSection 2.2, the asymptotic distribution of the Hausman statistic depends onthat of bβb, not of bγb. Thus, the Hausman test does not have power to detect theviolations of the random effects assumption that do not bias bβb (regardless ofthe size of T ). Accordingly, under our local alternative assumptions, Hausmantest possesses no power to detect nonzero correlations between zi and ui. Thisproblem arises of course because we assume that the conditional mean of the

13This sequence of local alternatives can be replaced by

E(ui|xi, zi) = λo + xiλx√N+ zi

λz√N,

where λo is any constant scalar. The asymptotic results remain the same with this replace-ment.

13

Page 14: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

effect ui is a linear function of exi and ezi. When the conditional mean of theeffect is a nonlinear function of exi and ezi, the Hausman test can possess powerto detect nonzero correlations between ui and zi.

14

Second and more importantly, the results for Case A show that the large-Tand large-N power properties of the Hausman test may depend on (i) whatcomponents of xit are correlated with the effect ui and (ii) whether the meanof xit are cross-sectionally heterogeneous. For Case A, the time-invariant andtime-varying parts of xit, Θi and eit, can be viewed as permanent and transitorycomponents, respectively. For fixed T , it does mot matter to the Hausman testwhich of these two components of xit is correlated with the individual effectui. The Hausman test has power to detect any kind of correlations betweenxit and ui. In contrast, for the cases with large T , the same test can havepower for specific correlations only. To see why, observe that for Case A.1,the noncentral parameter of the Hausman statistic depends only on the varia-tions in the permanent component Θi, not on those in the transitory componenteit. This implies that for Case A.1 (where the permanent components Θi arecross-sectionally heterogeneous), the Hausman test has power for nonzero cor-relation between the effect ui and the permanent component Θi, but no powerfor nonzero-correlation between the effect and the temporal component eit. Incontrast, for Case A.2 (where the permanent components Θi are the same for alli), the noncentral parameter depends on the variations in eit. That is, for CaseA.2, the Hausman test does have power to detect nonzeo-correlation betweenthe effect and the temporal component of xit.

15.Similar results are obtained from the analysis of Case B. The results reported

in the fourth column of Table 1 show that when the correlation between xit andzi decays quickly over time (m ≤ 1

2 ), the Hausman test has power to detectnonzero-correlation between the individual effect and the transitory component

14Even if the conditional mean of ui is linear in ewi, the Hausman test may have powerto detect non-zero λz , if λx and λz are not functionally independent. For example, considera model with scalar xit and zi. Suppose that xit and zi have a common factor fi; that is,xit = fi + eit and zi = fi + ηi. (This is the case discussed below Assumption 5.) AssumeE(ui | fi, ηi, ei) = cηi. Assume that fi, ηi and eit are normal, mutually independent, andi.i.d. over different i and t with zero means, and variances σ2f , σ

2η,and σ2e, respectively. Note

that under given assumptions, xit is not correlated with ui, while zi is. For this case, however,we can show that

E(ui | xi, zi) = xiλx + ziλz ,where d = (σ2f + σ2e/T )(σ

2f + σ2η) − σ4f , λx = −cσ2fσ2η/d, λz = c(σ2f + σ2e/T )σ

2η/d. Observe

that λx 6= 0, if λz 6= 0 (c 6= 0). Thus, λx is functionally related to λz . In addition, it is easy

to show that plimN→∞bβb = β + λx 6= β, if λz 6= 0 (c 6= 0). Thus, nonzero λz biases bβb.15This point can be better presented if we choose the following sequence of local alternative

hypotheses for Case A:

E (ui|exi, ezi) = E(exi)λx,1√N+ (exi − E (exi)) λx,2√

N+ ezi λz√

N.

Observe that for CASE A, E(exi) = Θi, and (exi − E (exi)) = eei. Under the local alternativehypotheses, it can be shown that the noncentral parameter of the Hausman test depends oneither λx,1 or λx,2, but not both. When E(exi) 6= 0 (Case A.1), the noncentral parameterof the test depends only on λx,1. In contrast, when E(exi) = 0, the noncentral parameterdepends only on λx,2.

14

Page 15: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

eit, even if T is large. In contrast, when m > 12 , the same test has no power

to detect such nonzero correlations when T is large. These results indicate thatthe asymptotic power of the Hausman test can depend on the size of T. Thesefindings will be more elaborated in section 3

3 Monte Carlo Experiments

So far, we have considered several simple cases to demonstrate how the con-vergence rates of the popular panel data estimators and the Hausman test aresensitive to data generating processes. For these simple cases, all of the rele-vant asymptotics can be obtained in a straightforward manner. In the followingsection, we will show that the main results obtained from this section applyto more general cases in which regressors are serially dependent with arbitrarycovariance structures.

4 Main Results

This section derives for the general model (1) the asymptotic distributions ofthe within, between, GLS estimators and the Hausman statistic. In Section2, we have considered independently several simple models in which regressorsare of particular characteristics. The general model we consider in this sectioncontains all of the different types of regressors analyzed in Section 2. Moredetailed assumptions are introduced below.From now on, the following notation is repeatedly used. The expression

“→p” means “converges in probability,” while “⇒” means “converges in distri-bution” as in Section 2.2. For any matrix A, the norm kAk signifies ptr(AA0).When B is a random matrix with E kBkp <∞, then kBkp denotes (E kBkp)1/p.We use EF(·) to denote the conditional expectation operator with respect to asigma field F . We also define kBkF,p = (EF kBkp)1/p . The notation xN ∼aN indicates that there exists n and finite constants d1 and d2 such thatinfN≥n xN

aN≥ d1 and supN≥n xN

aN≤ d2. We also use the following notation for

relevant sigma-fields: Fxi = σ(xi1, ..., xiT ); Fzi = σ (zi); Fz = σ (Fz1 , ...,FzN );Fwi = σ (Fxi ,Fzi); and Fw = σ (Fw1 , ...,FwN ) . The xit and zi are now k × 1and g × 1 vectors, respectively.As in Section 2, we assume that the regressors (x0i1, ..., x

0iT , z

0i)0 are indepen-

dently distributed across different i. In addition, we make the following theassumption about the composite error terms ui and vit:

Assumption 1 (about ui and vit): For some q > 1,

(i) the ui are independent over different i with supiE |ui|4q <∞.

15

Page 16: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

(ii) The vit are i.i.d. with mean zero and variance σ2v across different i and t,

and are independent of xis, zi and ui, for all i, t, and s. Also, kvitk4q ≡ κvis finite.

Assumption 1(i) is a standard regularity condition for error-components models.Assumption 1(ii) indicates that all of the regressors and individual effect arestrictly exogenous with respect to the error terms vit.

16

We now make the assumptions about regressors. We here consider three dif-ferent types of time-varying regressors: We partition the k × 1 vector xit intothree subvectors, x1,it, x2,it, and x3,it, which are k1 × 1, k2 × 1, and k3 × 1, re-spectively. The vector x1,it consists of the regressors with deterministic trends.We may think of three different types of trends: (i) cross-sectionally heteroge-neous nonstochastic trends in mean (but not in variance or covariances); (ii)cross-sectionally homogeneous nonstochastic trends; and (iii) stochastic trends(trends in variance) such as unit-root time series. In Section 2, we have consid-ered the first two cases while discussing the cases related with (21). The lattercase is materially similar to CASE 2.2, except that the convergence rates ofestimators and test statistics are different under these two cases. Thus, we hereonly consider the case (i). We do not cover the cases of stochastic trends (iii),leaving the analysis of such cases to future study.The two subvectors x2,it and x3,it are random regressors with no trend in

mean. The partition of x2,it and x3,it is made based on their correlatedness withzi. Specifically, we assume that the x2,it are not correlated with zi, while thex3,it are. In addition, in order to accommodate CASEs A.1 and A.2, we alsopartition the subvector x2,it into x21,it and x22,it, which are k21×1 and k22×1,respectively. Similarly to CASE A.1, the regressor vector x21,it is heterogeneousover different i, as well as different t, with different means Θ21,it. In contrast,x22,it is homogeneous cross-sectionally with means Θ22,t for all i for given t(Case A.2). We also incorporate CASEs B.1, B.2 and B.3 into the model bypartitioning x3,it into x31,it, x32,it, and x33,it, which are k31 × 1, k32 × 1, andk33×1, respectively, depending on how fast their correlations with zi decay overtime. The more detailed assumptions on the regressors xit and zi follow:

Assumption 2 (about x1,it):

(i) For some q > 1, κx1≡ supi,t kx1,it −Ex1,itk4q <∞.

(ii) Let xh,1,it be the hth element of x1,it. Then, Exh,1,it ∼ tmh,1 for all i and

h = 1, ..., k1, where mh,1 > 0.

Assumption 3 (about x2,it): For some q > 1,

(i) E(x21,it) = Θ21,it and E(x22,it) = Θ22,t, where supi,t kΘ21,itk, supt kΘ22,tk< ∞, and Θ21,it 6= Θ21,jt if i 6= j.

16As discussed in Section 2.1, this assumption rules out weakly exogenous or predeterminedregressors.

16

Page 17: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

(ii) κx2 ≡ supi,t kx2,it − Ex2,itk4q <∞.Assumption 4 (about x3,it): For some q > 1,

(i) E (x3,it) = Θ3,it, where supi,t kΘ3,itk <∞.

(ii) E³supi,t

°°x3,it −EFzix3,it°°8qFzi ,4q´ <∞.(iii) Let xh,3k,it be the h

th element of x3k,it, where k = 1, 2, 3. Then, conditionalon zi,

(iii.1)¡EFzixh,31,it −Exh,31,it

¢ ∼ t−mh,31 a.s., where 12 < mh,31 ≤ ∞ for

h = 1, ..., k31 (here, mh,31 =∞ implies that EFzixh,31,it−Exh,31,it =

0 a.s.);

(iii.2)¡EFzixh,32,it −Exh,32,it

¢ ∼ t− 12 a.s. for h = 1, ..., k32 ;

(iii.3)¡EFzixh,33,it −Exh,33,it

¢ ∼ t−mh,33 a.s., where 0 ≤ mh,33 <12 for

h = 1, ..., k33 .

Assumption 5 (about zi): zii is i.i.d. over i with E(zi) = Θz, and kzik4q <∞ for some q > 1.

Panel data estimators of individual coefficients have different convergencerates depending on the types of the corresponding regressors. To address thesedifferences, we define:

Dx,T = diag (D1T ,D2T ,D3T ) ;

DT = diag (Dx,T , Ig) ,

where

D1T = diag¡T−m1 , ..., T−mk1

¢;

D2T = diag (D21T ,D22T ) = diag³Ik21 ,

√TIk22

´;

D3T = diag (D31T ,D32T ,D33T )

= diag³√TIk31 ,

√TIk32 , T

m1,33 , ..., Tmk33,33

´.

Observe that D1T , D2T , and D3T are conformable to regressor vectors x1,it,x2,it, and x3,it, respectively, while DT and Ig are to xit and zi, respectively.The diagonal matrix DT is chosen so that plimN→∞ 1

N

PiDT ewi ew0iDT is well

defined and finite. For future use, we also define

Gx,T = diag (D1T , Ik21 , Ik22 , Ik3) ;

Jx,T = diag (Ik1 , Ik21 ,D22T ,D3T ) ,

so thatDx,T = Gx,TJx,T .

Using this notation, we make the following regularity assumptions on the un-conditional and conditional means of regressors:

17

Page 18: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

Assumption 6 (convergence as T → ∞): Defining t = [Tr], we assume thatthe following restrictions hold as T →∞.

(i) Let τ1 (r) = diag (rm1,1 , ..., rmk1,1) ,where mh,1 is defined in Assumption2. Then,

D1TE (x1,it)→ τ1 (r)Θ1,i

uniformly in i and r ∈ [0, 1], for some Θ1,i = (Θ1,1,i, ...,Θk1,1,i)0 with

supi kΘ1,ik <∞.(ii) Θ21,it → Θ21,i and Θ3,it → Θ3,i uniformly in i with supi kΘ21,ik <∞ and

supi kΘ3,ik <∞.(iii) Uniformly in i and r ∈ [0, 1],

D31T¡EFzix31,it −Ex31,it

¢→ 0k31×1 a.s.;

D32T¡EFzix32,it −Ex32,it

¢→ 1√rIk32g32,i (zi) a.s.;

D33T¡EFzix33,it −Ex33,it

¢→ τ33 (r) g33,i (zi) a.s.,

where

g32,i = (g1,32,i, ..., gk32,32,i)0; g33,i = (g1,33,i, ..., gk33,33,i)

0,

and g32,i (zi) and g33,i(zi) are zero-mean functions of zi with

0 < E supikg3k,i (zi)k4q <∞, for some q > 1,

and g3k,i 6= g3k,j for i 6= j, and τ33 (r) = diag (r−m1,33 , ...r−mk33,33).

(iv) There exist τ (r) and Gi (zi) such that

kD3T¡EFzix3,it −Ex3,it

¢ k ≤ eτ(r) eGi(zi),where

Rτ (r)4q dr <∞ and E supi Gi (zi)

4q <∞ for some q > 1.

(v) Uniformly in (i, j) and r ∈ [0, 1];D31T (Ex31,it − Ex31,jt)→ 0k31×1,

D32T (Ex32,it − Ex32,jt)→ 1√rIk32

³µg32i − µg32j

´,

D33T (Ex33,it − Ex33,jt)→ τ33 (r)³µg33i − µg33j

´,

with supi kµg32ik, supi kµg33ik <∞.

Some remarks would be useful to understand Assumption 6. First, to havean intuition about what the assumption implies, we consider, as an illustrative

18

Page 19: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

example, the simple model in CASE 3 in Section 2.2, in which x3,it = Πizi/tm+

eit, where eit is independent of zi and i.i.d. across i. For this case,

D3T¡EFzix3,it −Ex3,it

¢= D3TΠi (zi −Ezi) /tm;

D3T (Ex3,it − Ex3,jt) = D3T (ΠiEzi −ΠjEzj) /tm.Thus,

g3k,i (zi) = Πi (zi −Ezi) ;µg3k,i = ΠiEzi.

Second, Assumption 6(iii) makes the restriction that E supi kg3k,i (zi)k4q isstrictly positive, for k = 2, 3. This restriction is made to warrant that g3k,i (zi) 6=0 a.s. If g3k,i (zi) = 0 a.s.,

17then

D3kTEFzi (x3k,it − Ex3k,it) ∼ τ3k (r) g32,i (zi) = 0 a.s.,

and the correlations between x3,it and zi no longer play any important role inasymptotics. Assumption 6(iii) rules out such cases.Assumption 6 is about the asymptotic properties of means of regressors

as T → ∞. We also need additional regularity assumptions on the means ofregressors that apply as N →∞. Define

H1 =

Z 1

0

τ1(r)dr; H32 =

µZ 1

0

1√rdr

¶Ik32 ; H33 =

Z 1

0

τ33(r)dr;

and

E

⎡⎣⎛⎝H32g32,i (zi)H33g33,i (zi)zi −Ezi

⎞⎠⎛⎝H32g32,i (zi)H33g33,i (zi)zi −Ezi

⎞⎠0⎤⎦=⎛⎝Γg32,g32,i Γg32,g33,i Γg32,z,iΓ0g32,g33,i Γg33,g33,i Γg33,z,iΓ0g32,z,i Γ0g33,z,i Γzz,i

⎞⎠ .With this notation, we assume the followings:

Assumption 7 (convergence as N → ∞): Define eΘ1,i = Θ1,i − 1N

PiΘ1,i;eΘ21,i = Θ21,i − 1

N

PiΘ21,i; µg32,i = µg32,i − 1

N

Pi µg32,i; and µg33,i = µg33,i −

1N

Pi µg33,i . As N →∞,

(i) 1N

Pi

⎛⎜⎜⎝H1eΘ1,ieΘ21,iH32µg32,iH33µg33,i

⎞⎟⎟⎠⎛⎜⎜⎝H1eΘ1,ieΘ21,iH32µg32,iH33µg33,i

⎞⎟⎟⎠0

⎛⎜⎜⎝ΓΘ1,Θ1 ΓΘ1,Θ21 ΓΘ1,µ32 ΓΘ1,µ33Γ0Θ1,Θ21 ΓΘ21,Θ21 ΓΘ21,µ32 ΓΘ21,µ33Γ0Θ1,µ32 Γ0Θ21,µ32 Γµ32,µ32 Γµ32,µ33Γ0Θ1,µ33 Γ0Θ21,µ33 Γ0µ32,g33 Γµ33,µ33

⎞⎟⎟⎠.(ii) 1

N

Pi

⎛⎝Γg32,g32,i Γg32,g33,i Γg32,z,iΓ0g32,g33,i Γg33,g33,i Γg33,z,iΓ0g32,z,i Γ0g33,z,i Γzz,i

⎞⎠→⎛⎝Γg32,g32 Γg32,g33 Γg32,zΓ0g32,g33 Γg33,g33 Γg33,zΓ0g32,z Γ0g33,z Γz,z

⎞⎠.(iii) The limit of 1

N

PiΘ1,iΘ

01,i exists.

17An example is the case in which x3,it = eitΠizi/tm,where eit is independent of zi withmean zero.

19

Page 20: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

Apparently, by Assumptions 6 and 7, we assume the sequential convergenceof the means of regressors as T → ∞ followed by N → ∞. However, this byno means implies that our asymptotic analysis is a sequential one. Instead, theuniformity conditions in Assumption 6 allow us to obtain our asymptotic resultsusing the joint limit approach that applies as (N,T → ∞) simultaneously.18Joint limit results can be obtained under an alternative set of conditions thatassume uniform limits of the means of regressors sequentially asN →∞ followedby T →∞. Nonetheless, we adopt Assumptions 6 and 7, because they are muchmore convenient to handle the trends in regressors x1,it and x3,it for asymptotics.The following notation is for conditional or unconditional covariances among

time-varying regressors. Define

Γi (t, s) = [Γjl,i (t, s)]jl,

where Γjl,i (t, s) = E¡xj,it −EFzixj,it

¢ ¡xl,is −EFzixl,is

¢, for j, l = 2, 3. Essen-

tially, the Γi is the unconditional mean of the conditional variance-covariancematrix of (x02,it, x

03,it)

0. We also define the unconditional variance-covariancematrix of (x01,it, x

02,it, x

03,it)

0 by

Γi (t, s) = [Γjl,i (t, s)]jl,

where Γjl,i (t, s) = E (xj,it −Exj,it) (xl,is −Exl,is) , for j, l = 1, 2, 3. Observe

that Γ22,i (t, s) = Γ22,i (t, s), since x2,it and zi are independent. With thisnotation, we make the following assumption on the convergence of variancesand covariances:

Assumption 8 (convergence of covariances): As (N,T →∞),(i) 1

N

Pi1T

Pt

Ps

µΓ22,i (t, s) Γ23,i (t, s)Γ023,i (t, s) Γ33,i (t, s)

¶→µΓ22 Γ23Γ023 Γ33

¶.

(ii) 1N

Pi1T

Pt Γi (t, t)→ Φ.

Note that the variance matrix [Γjl]j,l=2,3 is the cross section average of the long-

run variance-covariance matrix of¡x02,it, x

03,it

¢0. For future use, we partition the

two limits in the assumption conformably to (x021,it,x022,it,x

031,it,x

032,it,x

033,it)

0 asfollows:

µΓ22 Γ23Γ023 Γ33

¶=

⎛⎜⎜⎜⎜⎝Γ21,21 Γ21,22 Γ21,31 Γ21,32 Γ21,33Γ021,22 Γ22,22 Γ22,31 Γ22,32 Γ22,33Γ021,31 Γ022,31 Γ31,31 Γ31,32 Γ31,33Γ021,32 Γ022,32 Γ031,32 Γ32,32 Γ32,33Γ021,33 Γ022,33 Γ031,33 Γ032,33 Γ33,33

⎞⎟⎟⎟⎟⎠ ;

Φ =

⎛⎝ Φ11 Φ12 Φ13Φ012 Φ22 Φ23Φ013 Φ023 Φ33

⎞⎠ .18For the details on the relationship between the sequential and joint approaches, see Apos-

tol (1974, Theorems 8.39 and 9.16) for the cases of double indexed real number sequences,and Phillips and Moon (1999) for the cases of random sequences.

20

Page 21: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

Assumption 9 Let F∞z = σ (z1, ..., zN , ...) . For a generic constant M that isindependent of N and T, the followings hold:

(i) supi,T°° 1T

Pt (x1,it −Ex1,it)

°°4< M,

(ii) supi,T

°°° 1√T

Pt (x2,it −Ex2,it)

°°°4< M,

(iii) supi,T

°°° 1√T

Pt

¡x3,it −EFzix3,it

¢°°°4< M.

Assumption 9 assumes that the fourth moments of the sums of the regressorsx1,it, x2,it, and x3,it are uniformly bounded. This assumption is satisfied undermild restrictions on the moments of xit and on the temporal dependence ofx2,it and x3,it. For sufficient conditions for Assumption 9, refer to Ahn andMoon(2001).

Finally, we make a formal definition of the random effects assumption, whichis a more rigorous version of (3).

Assumption 10 (random effects): Conditional on Fw, uii=1,...,N is i.i.d. withmean zero, variance σ2u and finite κu ≡ kuikFw,4.To investigate the power property of the Hausman test, we also need to

define an alternative hypothesis which states a particular direction of modelmisspecification. Among many alternatives, we here consider a simpler one.Specifically, we consider an alternative hypothesis under which the conditionalmean of ui is a linear function of DT ewi. Abusing the conventional definitionof fixed effects (that indicates nonzero-correlations between wi = (x0it, z

0i)0 and

ui), we refer to this alternative as the fixed effects assumption:

Assumption 11 (fixed effects): Conditional on Fw, the uii=1,...,N is i.i.d.

with mean ew0iDTλ and variance σ2u, where λ is a (k+g)×1 nonrandom nonzerovector.

Here, DT ewi = [(Dx,T exi)0, ezi] can be viewed as a vector of detrended regressors.Thus, Assumption 11 indicates non-zero correlations between the effect ui anddetrended regressors. The term ew0iDTλ can be replaced by λo +w0iDTλ, whereλo is any constant scalar. We use the term ew0iDTλ instead of λo+w0iDTλ simplyfor convenience.A sequence of local versions of the fixed effects hypothesis is given:

Assumption 12 (local alternatives to random effects): Conditional on Fw, thesequence uii=1,...,N is i.i.d. with mean ew0iDTλ/√N , variance σ2u, and κ4u =

EFw (ui −EFwui)4 < ∞, where λ 6= 0(k+g)×1 is a nonrandom vector in Rk+g.

Under this Assumption, E (DT ewiui) = 1√NE³DT ewi ew0

iDT

´λ → 0(k+g)×1, as

(N,T →∞). Observe that these local alternatives are of the forms introducedin Table 1

21

Page 22: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

The following lemmas provide some results that are useful to derive theasymptotic distributions of the within, between, and GLS estimators of β andγ.

Lemma 1 Under Assumptions 1-8, we obtain the following results as (N,T →∞). For some positive semidefinite matrices Ψx and Ξ (defined in the Ap-pendix),

(a) 1N

Pi1T

PtGx,T xitx

0itGx,T →p Ψx;

(b) 1√N

Pi1√T

PtGx,T xitvit ⇒ N

¡0,σ2vΨx

¢;

(c) 1N

PiDT ewi ew0iDT →p Ξ;

(d) 1√N

PiDT wivi →p 0(k+g)×1.

Lemma 2 Under Assumptions 1-8 and Assumption 12 (local alternatives torandom effects), as (N,T →∞),

1√N

PiDT ewieui ⇒ N

¡Ξλ,σ2uΞ

¢.

Lemma 3 Under Assumptions 1-8 and Assumption 11 (fixed effects),

1

N

PiDT ewieui →p Ξλ,

as (N,T →∞) .The following assumption is required for identification of the within and

between estimators of β and γ.

Assumption 13 The matrices Ψx and Ξ are positive definite.

Two remarks on this assumption follow. First, this assumption is also sufficientfor identification of the GLS estimation. Second, while the positive definitenessof the matrix Ξ is required for identification of the between estimators, it is nota necessary condition for the asymptotic distribution of the Hausman statisticobtained below. We can obtain the same asymptotic results for the Hausmantest even if we alternatively assume that within estimation can identify β (pos-itive definite Ψx) and between estimation can identify γ given β (the part ofΞ corresponding to ezi is positive definite).19 Nonetheless, we assume that Ξ isinvertible for convenience.We now consider the asymptotic distributions of the within, between and

GLS estimators of β and γ:

19This claim can be checked with the following simple example. Consider a simple modelwith one time-varying regressor xit and one time invariant regressor zi. Assume that xit =azi + eit, where the eit are i.i.d. over different i and t. For this model, it is straightforwardto show that the matrix Ξ fails to be invertible. Nonetheless, under the random effectsassumption, the Hausman statistic can be shown to follow a χ2 distribution with the degreeof freedom equal to one.

22

Page 23: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

Theorem 4 (asymptotic distribution of the within estimator): Under Assump-tions 1-8 and Assumption 13, as (N,T →∞),

√NTG−1x,T (bβw − β)⇒ N

¡0,σ2vΨ

−1x

¢.

Theorem 5 (asymptotic distribution of the between estimator): Suppose thatAssumption 1-8 and 13 hold. As (N,T →∞),(a) under Assumption 10 (random effects),

D−1T√N

µβb − βγb − γ

¶=

ÃD−1x,T

√N³βb − β

´√N (γb − γ)

!⇒ N

¡0,σ2uΞ

−1¢ ;(b) under Assumption 12 (local alternatives to random effects),

D−1T√N

µβb − βγb − γ

¶=

ÃD−1x,T

√N³βb − β

´√N (γb − γ)

!⇒ N

¡Ξλ,σ2uΞ

−1¢ .Theorem 6 (asymptotic distribution of the GLS estimator of β): Suppose thatAssumptions 1-8 and 13 hold.(a) Under Assumption 12 (local alternatives to random effects),

√NTG−1x,T

³βg − β

´=√NTG−1x,T

³βw − β

´+ op (1) ,

as (N,T → ∞) .(b) Suppose that Assumption 11 (fixed effects) holds. Partition λ = (λ0x,λ

0z)0

conformably to the sizes of xit and zi. Assume that λx 6= 0k×1. If N/T → c <∞and the included regressors are only of the x22,it- and x3,it-types (no trends andno cross-sectional heteroskedasticity in xit), then

√NTG−1x,T

³βg − β

´=√NTG−1x,T

³βw − β

´+ op (1) .

Theorem 7 (asymptotic distribution of the GLS estimator of γ): Suppose that

Assumptions 1-8 and 13 hold. Define l0z =µ0g×k

...Ig

¶. Then, the following

statements hold as (N,T →∞) .(a) Under Assumption 12 (local alternatives to random effects),

√N¡γg − γ

¢=

Ã1

N

Xi

ziz0i

!−1Ã1√N

Xi

ziui

!+ op (1)

⇒ N³(l0zΞlz)

−1l0zΞλ, σ

2u (l

0zΞlz)

−1´.

(b) Under Assumption11 (fixed effects),¡γg − γ

¢→p (l0zΞlz)

−1l0zΞλ.

23

Page 24: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

Several remarks follow. First, all of the asymptotic results given in Theo-rems 4-7 except for Theorem 6(b) hold as (N,T →∞), without any particularrestriction on the convergence rates of N and T . The relative size of N and Tdoes not matter for the results, so long as both N and T are large. Second,one can easily check that the convergence rates of the panel data estimates ofindividual β coefficients (on the x2,it− and x3,it−type regressors) reported inTheorems 4-7 are consistent with those from Section 2.2. Third, Theorem 5shows that under Assumption 10 (random effects), the between estimator of γ,bγb, is √N−consistent regardless of the characteristics of time-varying regres-sors. Fourth, both the between estimators of β and γ are asymptotically biasedunder the sequence of local alternatives (Assumption 12). Fifth, as Theorem6(a) indicates, the within and GLS estimators of β are asymptotically equivalentnot only under the random effects assumption, but also under the local alter-natives. Furthermore, the GLS estimator of β is asymptotically unbiased underthe local alternatives, while the between estimator of β is not. The asymp-totic equivalence between the within and GLS estimation under the randomeffects assumption is nothing new. Previous studies have shown this equiva-lence based on a naive sequential limit method (T → ∞ followed by N → ∞)and some strong assumptions such as fixed regressors. Theorem 6(a) and (b)confirm the same equivalence result, but with more rigorous joint limit approachas (N,T →∞) simultaneously. It is also intriguing to see that the GLS andwithin estimators are equivalent even under the local alternative hypotheses.Sixth, somewhat surprisingly, as Theorem 6(b) indicates, even under the

fixed effects assumption (Assumption 11), the GLS estimator of β could beasymptotically unbiased (and consistent) and equivalent to the within counter-part, (i) if the size (N) of the cross section units does not dominate excessivelythe size (T ) of time series in the limit (N/T → c <∞), and (ii) if the model doesnot contain trended or cross-sectionally heterogenous time-varying regressors.This result indicates that when the two conditions are satisfied, the biases inGLS caused by fixed effects are generally much smaller than those in between.If at least one of these two conditions is violated, that is, if N/T → ∞, or ifthe other types of regressors are included, the limit of (βg − βw) is determinedby how fast N/T → ∞ and how fast the trends in the regressors increase ordecrease.20

Finally, Theorem 7(a) indicates that under the local alternative hypothe-ses, the GLS estimator γg is

√N−consistent and asymptotically normal, but

asymptotically biased. The limiting distribution of γg, in this case, is equivalentto the limiting distribution of the OLS estimator of γ in the panel model withthe known coefficients of the time-varying regressors xit (OLS on eyit − β0exit =γ0ezi + (ui + evit)). Clearly, the GLS estimator γg is asymptotically more effi-cient than the between estimator γb. On the other hand, under the fixed effect

assumption, unlike the GLS estimator of β, bβg, the GLS estimator γg is notconsistent as (N,T →∞). The asymptotic bias of γg is given in Theorem 7(b).20In this case, without specific assumptions on the convergence rates of N/T and the trends,

it is hard to generalize the limits of the difference of the within and the GLS estimators.

24

Page 25: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

Lastly, the following theorem finds the asymptotic distribution of the Haus-man test statistic under the random effect assumption and the local alternatives:

Theorem 8 Suppose that Assumptions 1-8 and 13 hold. Corresponding to thesize of (x0i, z

0i)0, partition Ξ and λ, respectively, as follows:

Ξ =

µΞxx ΞxzΞ0xz Ξzz

¶; λ =

µλxλz

¶.

Then, as (N,T →∞) ,(a) under Assumption 10 (random effects),

HMNT ⇒ χ2k;

(b) under Assumption 12 (local alternatives to random effects),

HMNT ⇒ χ2k(η),

where η = λ0x(Ξxx − ΞxzΞ−1zz Ξ0xz)λx/σ2u is the noncentral parameter.Theorem 8 shows that under the random effects assumption, the Hausmanstatistic is asymptotically χ2−distributed with degrees of freedom equal to k(the number of the time-varying regressors). Furthermore, Theorem 8 (ii) showsthat the Hausman statistic has significant local power to detect any correlationbetween the time-varying regressors xit and the effect ui. It is easy to checkthat Theorem 8 (ii) is a generalization of the results reported in Table 1 insection 2.3. The noncentrality parameter η does not depend on λz, indicatingthat the Hausman test has no power to detect nonzero correlations betweentime invariant regressors zi and the individual effect ui in the direction of ourlocal alternative hypotheses (Assumption 12). This result holds even if T isfinite and fixed. As discussed earlier, this is due to the fact that the conditionalmean of the effect ui is a linear function of regressors under our local alterna-tive hypotheses. When the conditional mean is not linear, the Hausman testcould have a power to detect nonzero correlations of the effect ui with the timeinvariant regressors. However, the power of the Hausman test to such correla-tions is generally limited. This is so because the Hausman test can detect suchcorrelations only if they can cause a large bias in the between estimator of β,the coefficient vector on time-varying regressors (see Ahn and Low, 1996).

5 Conclusion

This paper has considered the large-N and large-T asymptotic properties of thepopular panel data estimators and the Hausman test. We find that the conver-gence rates of the between estimator and the Hausman test statistic are closelyrelated. In particular, we find that the convergence rates of the between estima-tor crucially depend on whether the data are cross-sectionally heteroskedasticor homoskedastic. Despite the different convergence rates, under the random

25

Page 26: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

effects assumption, the between estimator is consistent and asymptotically nor-mal, and the Hausman test is asymptotically χ2−distributed.In this paper, we have restricted our attention to the asymptotic proper-

ties of the existing estimators and tests when panel data contain both largenumbers of cross section and time series observations. Apparently, thus, thispaper does not invent any new estimator or test. However, this paper makesseveral contributions to the literature. First, for the cases with both large Nand T , we provide a theoretical link between the asymptotic equivalence of thewithin and GLS estimator and the asymptotic distribution of the Hausman test.Second, we have shown that cross-sectional heterogeneity in the time-varyingregressors can play an important role in asymptotics. Previous studies haveoften assumed that data are cross-sectionally i.i.d.. Our findings suggest thatfuture studies should pay attention to cross-sectional heterogeneity. Third, wefind that the power of the Hausman test depends on T. In particular, whenthe time-varying regressors contain individual-specific permanent componentsor trends, the test’s power for nonzero correlation between unobservable indi-vidual effect and such components is positively related with T , while its powerfor nonzero correlation between the effect and transitory components of thetime-varying regressors is negatively related. This implies that the Hausmantest results obtained using an entire data set could be different from the resultsobtained using only a subset of the same data. Our result indicate that thedifferent test results from a data set with large T and its subset with small Tmay provide some information about how the individual effect might be corre-lated with time-varying regressors. The rejection by large data but accptanceby small data would indicate that the effect is correlated with the permanentcomponents of the time-varying regressor, but the degrees of the correlationsare low. In contrast, the accetance by large data but rejection by small datamay indicate that the effect is correlated with the temporal components of thetime-varying regressors.An obvious extension of our paper is the instrumental variables estimation

of Hausman and Taylor (1981), Amemiya and MaCurdy (1986), and Breusch,Mizon and Schmidt (1989). For an intermediate model between fixed effects andrandom effects, these studies propose several instrumental variables estimatorsby which both the coefficients on time-varying and time invariant regressors canbe consistently estimated. It would be interesting to investigate the large N andlarge T properties of these instrumental variables estimators and the Hausmantests based on these estimators.

26

Page 27: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

6 Appendix A: Preliminary Results

We here provide some preliminary lemmas that are useful to prove the mainresults in Section 4. From now on, we use the notation M to denote a genericconstant, if no explanation follows.

Lemma 9 Let fi,T , fi, and gi are integrable functions in probability space (Ω,F , P ) .If fi,T → fi a.s. uniformly in i as T → ∞, and there exists gi such that|fi,T | ≤ gi for all i and T with E supi gi < ∞, then Efi,T → Efi uniformlyin i as T →∞.ProofLet hi,T = |fi,T − fi| . Under given assumptions, 0 ≤ supi hi,T ≤ 2 supi gi

and hi,T → 0 uniformly in i as T →∞. Then, by Fatou’s Lemma

2E supigi = E

µlim infT→∞

µ2 sup

igi − sup

ihi,T

¶¶≤ lim inf E

µ2 sup

igi − sup

ihi,T

¶= 2E sup

igi− lim sup

T→∞E sup

ihi,T ,

from which we can deduce

lim supT→∞

E supihi,T ≤ 0.

Then, since

0 ≤lim supT→∞

supi|E (fi,T − fi)| ≤lim sup

T→∞supiEhi,T ≤lim sup

T→∞E sup

ihi,T ≤ 0,

we have the required result: Efi,T → Efi uniformly in i as T →∞. ¥

The following lemma is a uniform version of the Toeplitz lemma.

Lemma 10 Let ait be a sequence of real numbers such that ait → ai uniformlyin i as t→∞ with supi |ai| < M. Then, (a) 1

T

Pt ait → ai uniformly in i, and

(b) 1T

Pt a2it → a2i uniformly in i.

ProofFrom the uniform convergence of ait, for a given ε > 0, we can choose t0

such that t ≥ t0 implies thatsupi|ait − ai| < ε.

Then, Part (a) follows because t ≥ t0 implies

supi

¯¯ 1T X

t

(ait − ai)¯¯ ≤ 1

T

Xt

supi|ait − ai| < ε.

27

Page 28: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

For Part (b), notice that

supi

¯a2it − a2i

¯ ≤ supi|ait + ai| sup |ait − ai|

≤ M sup |ait − ai|→ 0

as t→∞, where the last inequality holds because supi |ai| <∞ and sup |ait − ai|→ 0. Then, by Part (a), we can obtain the desired result. ¥

Lemma 11 Suppose that Xi,T and Xi are sequences of random vectors. Sup-pose that Xi,T → Xi in probability (or almost surely) uniformly in i as T →∞,and 1

N

PiXi → X in probability (or almost surely) as N → ∞. Then, as

(N,T →∞) ,1

N

Xi

Xi,T → X

in probability (or almost surely).

ProofWe only prove the lemma for the case of convergence in probability, because

the almost sure convergence case can be proven by the similar fashion. Since1N

PiXi →p X as N → ∞ and X →p Xi uniformly in i as T → ∞, for given

ε, δ > 0, we can choose N0 and T0 such that

P

(°°°°° 1N Xi

Xi −X°°°°° > ε

2

)≤ δ

2;

P

½supikXi,T −Xik > ε

2

¾≤ δ

2,

whenever N ≥ N0 and T ≥ T0. Now, suppose that N ≥ N0 and T ≥ T0. Then,

P

(°°°°° 1N Xi

Xi,T −X°°°°° > ε

)

≤ P

(°°°°° 1N Xi

(Xi,T −Xi)°°°°° > ε

2

)+ P

(°°°°° 1N Xi

Xi −X°°°°° > ε

2

)

≤ P

½supik(Xi,T −Xi)k > ε

2

¾+

δ

2= δ. ¥

Lemma 12 Assume that QiT (k × k) is a sequence of independent random ma-trices across i satisfying

supi,TE kQiT k 1 kQiTk > M→ 0, (23)

as M →∞, where 1(·) is the indicator function that equals one if the argumentin the parenthesis is correct, and otherwise equals zero. Then, as (N,T →∞),

1

N

Xi

(QiT −EQiT )→p 0.

28

Page 29: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

Proof of Lemma 12For any ε > 0, we need to show that

P

(°°°°° 1N Xi

(QiT −EQiT )°°°°° > ε

)→ 0,

which follows from the conditional Markov inequality if we can show that

E

°°°°° 1N Xi

(QiT − EQiT )°°°°°→ 0 a.s.. (24)

To show (24), define:

PiT = QiT1 kQiT k ≤M ;RiT = QiT 1 kQiTk > M .Then,

E

°°°°° 1N Xi

(QiT −EQiT )°°°°°

≤ E

°°°°° 1N Xi

(PiT −EPiT )°°°°°+ E

°°°°° 1N Xi

(RiT −ERiT )°°°°°

≤⎛⎝E °°°°° 1N X

i

(PiT −EPiT )°°°°°2⎞⎠ 1

2

+ 21

N

Xi

E kRiTk .

By definition,

1

N

Xi

E kRiT k ≤ supi,TE kRiTk ≤ sup

i,TE kQiT k 1 kQiTk > M . (25)

Also, ⎛⎝E °°°°° 1N Xi

(PiT −EPiT )°°°°°2⎞⎠ 1

2

=

Ã1

N2

Xi

E kPiT −EPiT k2! 1

2

≤ 1√N

µsupi,TE kPiT k2

¶ 12

≤ 1√NM .(26)

Choose M = Na, where 0 < a < 12 . Then, (25) , (26) → 0 a.s.. Consequently,

we have (24). ¥

7 Appendix B: Proofs of Main Results

Before we start proving the lemmas and theorems in Section 4, we introducesome additional lemmas that are used repeatedly below. Recall that wit =¡x01,it, x

02,it, x

03,it, z

0i

¢0. We also repeatedly use the diagonal matrix DT defined

in Section 4.

29

Page 30: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

Lemma 13 Suppose that Assumptions 1-8 hold. Define Ξ = Ξ1 + Ξ2, where

Ξ1 = diag

⎛⎝0k1 , 0k21 ,⎛⎝ Γ22,22 Γ22,31 Γ22,32Γ022,31 Γ31,31 Γ31,32Γ022,32 Γ031,32 Γ32,32

⎞⎠ , 0k33 , 0kz⎞⎠ ;

Ξ2 =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

ΓΘ1,Θ1 ΓΘ1,Θ21 0 0 ΓΘ1,µ32 ΓΘ1,µ33 0Γ0Θ1,Θ21 ΓΘ21,Θ21 0 0 ΓΘ21,µ32 ΓΘ21,µ33 00 0 0 0 0 0 00 0 0 0 0 0 0

Γ0Θ1,µ32 Γ0Θ21,µ32 0 0Γg32,g32+Γµ32,µ32

Γg32,g33+Γµ32,µ32

Γg32,z

Γ0Θ1,µ33 Γ0Θ21,µ33 0 0Γ0g32,g33+Γ0µ32,µ32

Γg33,g33+Γµ33,µ33

Γg33,z

0 0 0 0 Γ0g32,z Γ0g33,z Γz,z

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠.

Then, under Assumption 12, as (N,T →∞), the followings hold.(a) 1

N

PiDT wiw

0iDT →p Ξ.

(b) supN,T sup1≤i≤N E kDT wik4 < M, for some constant M <∞.(c) 1√

N

PiDT wiui ⇒ N

¡Ξλ,σ2uΞ

¢.

(d) 1√N

PiDT wivi →p 0.

ProofPart (a)To find the limit of

1

N

Xi

DT wiw0iDT =

1

N

Xi

DT (wi − w) (wi − w)0DT ,

we defineE∗i wi =

¡Ex01,i, Ex

02,i, EFzi x

03,i, z

0i

¢0;

E∗w = (Ex01, Ex02, EFz x

03, z

0) .

With this notation, we write

1

N

Xi

DT (wi − w) (wi − w)0DT

=1

N

Xi

DT [(wi −E∗i wi) + (E∗i wi −E∗w) + (E∗w − w)]

× [(wi −E∗i wi) + (E∗i wi −E∗w) + (E∗w − w)]0DT=

1

N

Xi

(I1,i,NT+I2,i,NT+I3,NT )(I1,i,NT+I2,i,NT+I3,NT )0 , say. (27)

We complete the proof by showing the following:

1

N

Xi

I1,i,NT I01,i,NT →p Ξ1; (28)

30

Page 31: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

1

N

Xi

I2,i,NT I02,i,NT →p Ξ2; (29)

I3,NT →p 0; (30)

and

1

N

Xi

I1,i,NT I02,i,NT ,

1

N

Xi

I1,i,NT I03,NT ,

1

N

Xi

I2,i,NT I03,NT→p 0. (31)

Proof of (28): Write

1

N

Xi

I1,i,NT I01,i,NT

=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

(I1,1)¡I1,21 I1,22 I1,31 I1,32 I1,33

¢0⎛⎜⎜⎜⎜⎝

I 01,21I 01,22I 01,31I 01,32I 01,33

⎞⎟⎟⎟⎟⎠⎛⎜⎜⎜⎜⎝I 021,21 I21,22 I21,31 I21,32 I21,33I 021,22 I22,22 I22,31 I22,32 I22,33I 021,31 I 022,31 I31,31 I31,32 I31,32I 021,32 I 022,32 I 031,32 I32,32 I32,33I 021,33 I 022,33 I 031,33 I 032,33 I33,33

⎞⎟⎟⎟⎟⎠ 0

0 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠,

where the partition is made conformable to the size of¡x01,it, x

021,it, x

022,it, x

031,it, x

032,it, x

033,it, z

0i

¢0.

We now consider each element in 1N

Pi I1,i,NT I

01,i,NT . For I1,1, notice that by

Assumption 9(a),°°°°° 1N Xi

(x1,i −Ex1,i) (x1,i −Ex1,i)0°°°°°

≤ 1

N

Xi

kx1,i −Ex1,ik2 ≤ 1

N

Xi

1

T

Xt

kx1,it −Ex1,itk2 = Op (1) .

Since each element in the diagonal matrix D1T tends to zero, we have

I1,1 =1

N

Xi

D1T (x1,i − Ex1,i) (x1,i −Ex1,i)0D1T

= D1TOp (1)D1T = op (1) .

Next we consider the second diagonal block of 1N

Pi I1,i,NT I

01,i,NT . Define

qiT =√T

µx2,i −Ex2,ix3,i −EFzi x3,i

¶;

QiT = qiT q0iT .

31

Page 32: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

Then, by Assumption 9 (b) and (c), for some constant q > 1,

supi,TE kQiTk2q = sup

i,TE kqiTk4q <∞,

which verifies the condition (23) of Lemma 12. In consequence, from Lemma 12and Assumption 8(i), we have

T

N

Xi

Ã(x2,i−Ex2,i)(x2,i−Ex2,i)0 (x2,i−Ex2,i)(x3,i−EFzi x03,i)0¡x3,i−EFzi x3,i

¢(x2,i−Ex2,i)0

¡x3,i−EFzi x3,i

¢¡x3,i−EFzi x3,i

¢0!

=1

N

Xi

1

T

Xt

Xs

⎛⎜⎜⎝(x2,it −Ex2,it)× (x2,is −Ex2,is)0

(x2,it −Ex2,it)× ¡x3,is −EFzix3,is¢0¡

x3,it −EFzix3,it¢

× (x2,is −Ex2,is)0¡x3,it −EFzix3,it

¢× ¡x3,is −EFzix3,is¢0

⎞⎟⎟⎠=

1

N

Xi

QiT →p limN

1

N

Xi

EQiT =

µΓ22 Γ23Γ023 Γ33

¶,

as (N,T →∞) . Now, recall that

D2T = diag (D21T ,D22T ) = diag³Ik21 ,

√TIK22

´;

D3T = diag (D31T ,D32T ,D33T ) = diag³√TIk31 ,

√TIk32 , (T

mh,33)h

´,

where 0 ≤ mh,33 <12 for all h = 1, ..., k33 . Therefore, as (N,T →∞) ,⎛⎜⎜⎜⎜⎝

I 021,21 I21,22 I21,31 I21,32 I21,33I 021,22 I22,22 I22,31 I22,32 I22,33I 021,31 I 022,31 I31,31 I31,32 I31,32I 021,32 I 022,32 I 031,32 I32,32 I32,33I 021,33 I 022,33 I 031,33 I 032,33 I33,33

⎞⎟⎟⎟⎟⎠→p

⎛⎜⎜⎜⎜⎝0 0 0 0 00 Γ22,22 Γ22,31 Γ22,32 00 Γ022,31 Γ31,31 Γ31,32 00 Γ022,32 Γ031,32 Γ32,32 00 0 0 0 0

⎞⎟⎟⎟⎟⎠ .Finally, by the Cauchy-Schwarz inequality, the off-diagonal block component¡

I1,21 I1,22 I1,31 I1,32 I1,33¢→p 0,

as (N,T →∞) .Proof of (29): Recall that

E∗i wi =¡Ex01,i, Ex

02,i, EFzi x

03,i, z

0i

¢0;

E∗w = (Ex01, Ex02, EFz x

03, z

0) .

Write

I2,i,NT

32

Page 33: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

D1T (Ex1,i −Ex1)Ex21,i −Ex21

0D31T

¡¡EFzi x31,i −Ex31,i

¢− (EFz x31 −Ex31) + (Ex31,i −Ex31)¢D32T

¡¡EFzi x32,i −Ex32,i

¢− (EFz x32 −Ex32) + (Ex32,i −Ex32)¢D33T

¡¡EFzi x33,i −Ex33,i

¢− (EFz x33 −Ex33) + (Ex33,i −Ex33)¢zi − z

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠.

To obtain the required result, we use Lemma 11. By Assumption 6, as T →∞,we have

I2,i,NT → I2,i,N a.s. uniformly in i, (32)

where

I21,i,N =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

H1Θ1,iΘ21,i00

H32£g32,i (zi)− 1

N

Pi g32,i (zi)

¤+H32

hµg32,i − 1

N

Pi µg32,i

iH33

£g33,i (zi)− 1

N

Pi g33,i (zi)

¤+H33

hµg33,i − 1

N

Pi µg32,i

izi − 1

N

Pi zi

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠.

Write

1

N

Xi

I2,i,NI02,i,N

=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

II1,1 II1,21 0 0 II1,32 II1,33 II1,zII 01,21 II21,21 0 0 II21,32 II21,33 II21,z0 0 0 0 0 0 00 0 0 0 0 0 0II 01,32 II 021,32 0 0 II32,32 II32,33 II32,zII 01,33 II 021,33 0 0 II 032,33 II33,33 II33,zII 01,z II 021,z 0 0 II 032,z II 033,z IIz,z

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠. (33)

By Assumption 7(i), as N →∞,µII1,1 II1,21II 01,21 II21,21

¶→µΓΘ1,Θ1 ΓΘ1,Θ21Γ0Θ1,Θ21 ΓΘ21,Θ21

¶. (34)

By the weak law of large numbers (WLLN) and Assumption 7, as N →∞, wehave ⎛⎝ II32,32 II32,33 II32,z

II 032,33 II33,33 II33,zII 032,z II 033,z IIz,z

⎞⎠→ p

⎛⎝ Γg32,g32 + Γµ32,µ32 Γg32,g33 + Γµ32,µ32 Γg32,zΓ0g32,g33 + Γ

0µ32,µ32

Γg33,g33 + Γµ33,µ33 Γg33,zΓ0g32,z Γ0g33,z Γz,z

⎞⎠ . (35)

33

Page 34: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

FromAssumption 7 andWLLNwith the assumption thatEg32,i (zi) =Eg33,i (zi)= 0, it follows thatµ

II1,32 II1,33II21,32 II21,33

¶→p

µΓΘ1,µ32 ΓΘ1,µ33ΓΘ21,µ32 ΓΘ21,µ33

¶, (36)

as N →∞. In addition, by WLLN with Assumption 5,µII1,zII21,z

¶→p

µ00

¶, (37)

as N →∞. The results (33)-(37) indicate that1

N

Xi

I2,i,NI02,i,N →p Ξ2, (38)

as N →∞. Finally, by Lemma 11, (32) and (38) imply (29).Proof of (30): Notice that

E

°°°°° 1N Xi

D1T1

T

Xt

(x1,it −Ex1,it)°°°°°2

=1

N2

Xi

E

°°°°°D1T 1T Xt

(x1,it −Ex1,it)°°°°°2

≤ 1

NkD1T k2 sup

i,tE

°°°°° 1T Xt

(x1,it −Ex1,it)°°°°°2

→ 0,(39)

where the last convergence result holds by Assumption 9(a). Similarly, by As-sumption 9(b),

E

°°°°° 1N Xi

D2kT1

T

Xt

(x2k,it −Ex2k,it)°°°°°2

≤ 1

Nsupi,tE

°°°°° 1√T

Xt

(x2k,it −Ex2k,it)°°°°°2

→ 0,

(40)for k = 1, 2. Finally, by Assumption 9(c), we have

E

°°°°° 1N Xi

D3kT1

T

Xt

¡x3k,it −EFzix3k,it

¢°°°°°2

≤ 1

Nsupi,T

°°°° 1√TD3kT

°°°°2E°°°°° 1√

T

Xt

¡x3k,it −EFzix3k,it

¢°°°°°2

→ 0, (41)

for k = 1, 2, 3. The results (39), (40) and (41) imply that I3,NT →p 0.Proof of (31): Notice that by (28) and (30),°°°°°Ã1

N

Xi

I1,i,NT

!I 03,NT

°°°°°2

=

°°°°° 1N Xi

I1,i,NT

°°°°°2

kI3,NTk2 = Op (1) op (1) ;

34

Page 35: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

°°°°°Ã1

N

Xi

I2,i,NT

!I 03,NT

°°°°°2

=

°°°°° 1N Xi

I2,i,NT

°°°°°2

kI3,NTk2 = Op (1) op (1) .

Thus, as (N,T →∞) ,Ã1

N

Xi

I1,i,NT

!I 03,NT ,

Ã1

N

Xi

I2,i,NT

!I 03,NT →p 0.

We now consider the (k, l)th term of 1N

Pi I1,i,NT I

02,i,NT . By the Cauchy-

Schwarz inequality,⎡⎣Ã 1N

Xi

I1,i,NT I02,i,NT

!k,l

⎤⎦2 ≤ Ã 1N

Xi

£(I1,i,NT )k

¤2!Ã 1N

Xi

£(I2,i,NT )l

¤2!,

whereAk,l and ak denote the (k, l)th and the kth elements of matrixA and vector

a, respectively. In view of (28) and (29) (the limits of 1N

Pi I1,i,NT I

01,i,NT and

1N

Pi I2,i,NT I

02,i,NT ), we can see that all of the elements in

1N

Pi I1,i,NT I

02,i,NT

converge to zero, except for

1

N

Xi

D32T¡x32,i −EFzi x32,i

¢ ¡EFzi x32,i − EFz x32

¢0D32T .

Thus, we can complete the proof by showing that this term converges in prob-ability to a conformable zero matrix.Let

Q1,iT = D32T¡x32,i −EFzi x32,i

¢;

Q2,iT = D32T¡EFzi x32,i −EFz x32

¢;

Q2,i = H32

Ãg32,i (zi)− 1

N

Xi

g32,i (zi) + µg32,i −1

N

Xi

µg32,i

!.

By the Cauchy-Schwarz inequality and Lemma 11,°°°°° 1N Xi

Q1,iT (Q2,iT −Q2,i)0°°°°°2

≤Ã1

N

Xi

kQ1,iT k2!Ã

1

N

Xi

kQ2,iT −Q2,ik2!

≤Ã1

N

Xi

kQ1,iT k2!supikQ2,iT −Q2,ik2

= Op (1) o (1) = op (1) ,

where the last line holds since

E

Ã1

N

Xi

kQ1,iT k2!< M

35

Page 36: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

by Assumption 9(c) and Assumption 6. Thus,

1

N

Xi

Q1,iTQ02,iT =

1

N

Xi

Q1,iTQ02,i + op (1) . (42)

Next, notice that EQ1,iTQ02,i = 0. Then, by the Cauchy-Schwarz inequality,

E

°°°°°Ã1

N

Xi

vec¡Q1,iTQ

02,i

¢!°°°°°2

= E

°°°°°Ã1

N

Xi

(Q2,i ⊗Q1,iT )!°°°°°

2

=1

N2

Xi

E¡Q02,iQ2,iQ

01,iTQ1,iT

¢≤ 1

N2

Xi

hE kQ2,ik4E kQ1,iT k4

i1/2.

By Assumption 9(c),

supi,TE kQ1,iT k4 < M,

and by Assumption 6(iii) and (v),

E kQ2,ik4 < M .Thus,

1

N2

Xi

hE kQ2,ik4E kQ1,iT k4

i1/2→ 0,

which implies

E

°°°°°Ã1

N

Xi

vec¡Q1,iTQ

02,i

¢!°°°°°2

→ 0.

By the Chebychev’s inequality, then, we have

1

N

Xi

Q1,iTQ02,i →p 0,

as (N,T →∞) . Finally, in view of (42) we have the desired result that as(N,T →∞) ,

1

N

Xi

Q1,iTQ02,iT →p 0. ¥

Part (b)From (27) , we write

supN,T

sup1≤i≤N

E kDT (wi − w)k4

= supN,T

sup1≤i≤N

E kI1,i,NT + I2,i,NT + I3,i,NTk4

≤ M1

⎛⎜⎝ supN,T sup1≤i≤N E kI1,i,NT k4+supN,T sup1≤i≤N E kI2,i,NT k4+supN,T sup1≤i≤N E kI3,i,NT k4

⎞⎟⎠ , (43)

36

Page 37: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

for some constant M1. Thus, we can complete the proof by showing that eachof the three terms in the right-hand side of the inequality (43) is bounded.For some constant M2,

supN,T

sup1≤i≤N

E kI1,i,NTk4

≤ M2

⎛⎜⎜⎜⎝supi,T E kD1T (x1,i −Ex1,i)k4

+supi,T E°°°D2T√

T

√T (x2,i −Ex2,i)

°°°4+supi,T E

°°°D3T√T

√T¡x3,i −EFzi x3,i

¢°°°4⎞⎟⎟⎟⎠ . (44)

By Assumption 9 and the definitions of D1T ,D2T , and D3T , each term in (44)is finite. Thus,

supN,T

sup1≤i≤N

E kI1,i,NT k4 <∞.

Next, we consider the second term in the right-hand side of the inequality(43). For some constant M3,

supN,T

sup1≤i≤N

E kI2,i,NT k4

≤ M3

⎛⎜⎜⎜⎜⎜⎜⎜⎝

supN,T sup1≤i≤N kD1T (Ex1,i −Ex1)k4+supN,T sup1≤i≤N kEx21,i −Ex21k4+supi,T E

°°D3T ¡EFzi x3,i −Ex3,i¢°°4+supN,T sup1≤i≤N E kD3T (EFz x3 −Ex3)k4+supN,T sup1≤i≤N kD3T (Ex3,i −Ex3)k

+supN sup1≤i≤N E kzi − zk4

⎞⎟⎟⎟⎟⎟⎟⎟⎠. (45)

For the required result, we need to show that all of the terms in the right-handside of the inequality (45) are bounded. Notice that for some finite constantM4,

kD1T (Ex1,i −Ex1)k4

→°°°°°H1

ÃΘ1,i − 1

N

Xi

Θ1,i

!°°°°° (uniformly in i as T →∞)≤ 2 kH1k sup

ikΘ1,ik < M4,

and

kEx21,i −Ex21k4

→°°°°°Θ21,i − 1

N

Xi

Θ21,i

°°°°°4

(uniformly in i as T →∞)

≤ 2 supikΘ21,ik4 < M4.

37

Page 38: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

So,

supN,T

sup1≤i≤N

kD1T (Ex1,i −Ex1)k4 , supN,T

sup1≤i≤N

kEx21,i −Ex21k4 <∞. (46)

By Assumption 6(iii), the followings hold uniformly in i almost surely as T →∞ : °°D31T ¡EFzi x31,i −Ex31,i¢°°4 → 0,°°D32T ¡EFzi x31,i −Ex31,i¢°°4 → kH32g32,i (zi)k4 ,°°D33T ¡EFzi x33,i −Ex33,i¢°°4 → kH33g33,i (zi)k4 .

Notice that since°°D3T ¡EFzix3,it −Ex3,it¢°°4 ≤ τ (r)

4G (zi)

4by Assumption

6(iv), we have supi°°D3kT ¡EFzi x3k,i −Ex3k,i¢°°4 ≤ ³R 10 τ (r)4 dr´ supi G (zi)4

for k = 1, 2, and 3. Therefore, by Lemma 9, we have

E°°D3kT ¡EFzi x3k,i −Ex3k,i¢°°4

→ E

°°°°g3k,i (zi)Z 1

0

τ3k (r) dr

°°°°4 (uniformly in i)≤

µE sup

ikg3k,i (zi)k4

¶Ã°°°°Z 1

0

τ3k (r) dr

°°°°4!<∞, for k = 1, 2, 3,

where τ31 (r) = 0. So,

supi,TE°°D3T ¡EFzi x3,i −Ex3,i¢°°4 <∞. (47)

Similarly, it follows that

supN,T

sup1≤i≤N

E kD3T (EFz x3 −Ex3,i)k4 <∞. (48)

In addition, notice that

supN,T

sup1≤i≤N

kD3T (Ex3,i −Ex3)k

≤ supN,T

sup1≤i≤N

°°°°°°⎛⎝ D31T (Ex31,i −Ex31)D32T (Ex32,i −Ex32)−H32µg32,iD33T (Ex33,i −Ex33)−H33µg33,i

⎞⎠°°°°°° (49)

+ supN

sup1≤i≤N

°°°°°°⎛⎝ 0H32µg32,iH33µg33,i

⎞⎠°°°°°° .By Assumption 6(v), as T →∞,

supN

sup1≤i≤N

°°°°°°⎛⎝ D31T (Ex31,i −Ex31)D32T (Ex32,i −Ex32)−H32µg32,iD33T (Ex33,i −Ex33)−H33µg33,i

⎞⎠°°°°°°38

Page 39: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

= supN

sup1≤i≤N

1

N

NXj=1

°°°°°°°°⎛⎜⎜⎝

D31T (Ex31,i −Ex31,j)D32T (Ex32,i −Ex32,j)−H32

³µg32,i − µg32,j

´D33T (Ex33,i −Ex33,j)−H33

³µg33,i − µg33,j

´⎞⎟⎟⎠°°°°°°°°

≤ supi,j

°°°°°°°°⎛⎜⎜⎝

D31T (Ex31,i −Ex31,j)D32T (Ex32,i −Ex32,j)−H32

³µg32,i − µg32,j

´D33T (Ex33,i −Ex33,j)−H33

³µg33,i − µg33,j

´⎞⎟⎟⎠°°°°°°°°→ 0.

Therefore, the first term (49) is finite. The second term in (49) is also finite sincesupi

°°µg32,i°° , supi °°µg33,i°° <∞ as assumed in Assumption 6(v). Therefore,

supN,T

sup1≤i≤N

kD3T (Ex3,i −Ex3)k <∞.

In addition, by Assumption 5,

supN

sup1≤i≤N

E kzi − zk4 <∞. (50)

Therefore, from (46)− (50) we havesupN,T

sup1≤i≤N

E kI2,i,NT k4 <∞.

Finally, since I3,NT =1N

Pi I1,i,NT ,

supN,T

E kI3,NT k4

≤ supN,T

E

Ã1

N

Xi

kI1,i,NTk!4

(by triangle inequality)

≤ supN,T

sup1≤i≤N

E kI1,i,NTk4 (by Holder’s inequality)< ∞. ¥

Part (c)Recall that under Assumption 12, conditional on Fw, ui is independently

distributed with meanw0iDT√N

λ, variance σ2u, and κ4u = EFw (ui −EFwui)4 <∞,where λ is a nonrandom vector inRk+g. So, E (DT wiui) = 1√

NE³DT wiw

0iDT

´λ.

Define Qi,T = DT wi (ui −EFwui) ; and let ι ∈ Rk+g with kιk = 1. Then, wecan complete the proof by showing that as (N,T →∞) ,

1√N

Xi

ι0Qi,T ⇒ N¡0,σ2uι

0Ξι¢. (51)

This is so because this condition, together with the Cramer-Wold device, As-sumption 12 and Part (a), implies that

1√N

Xi

DT wiui =1√N

Xi

DT wiEFwui +1√N

Xi

Qi,T

39

Page 40: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

=1

N

Xi

DT wiw0iDTλ+

1√N

Xi

Qi,T

⇒ N¡Ξλ,σ2uΞ

¢.

Now we start the proof of (51) . Let s2i,T = E (ι0Qi,T )

2and S2N,T =

Pi s2i,T .

Under Assumption 12, we have

s2i,T = E (ι0Qi,T )2

= ι0Eh³EFw (ui −EFwui)2

´DT wiw

0iDT

= σ2uE [ι0DT wiw0iDT ι] .

By Part (a),1

N

Xi

ι0DT wiw0iDT ι→p ι0Ξι > 0,

as (N,T →∞) . By Part (b),

supN,T

sup1≤i≤N

E kι0DT wik2 1 kι0DT wik > M→ 0,

as M →∞, and so kι0DT wik2 is uniformly integrable in N,T. Then, by Vitali’slemma, it follows that

1

NS2N,T → σ2uι

0Ξι > 0,

as (N,T →∞) . Thus, for our required result of (51), it is sufficient to showXi

ι0Qi,TSN,T

⇒ N (0, 1) , (52)

as (N,T →∞). Let Pi,NT = ι0Qi,T

SN,T.Note that, under Assumption 10, E (Pi,NT ) =

0 andPiEP

2i,NT = 1. According to Theorem 2 of Phillips and Moon (1999),

the weak convergence in (52) follows if we can show thatXi

EP 2i,NT 1©¯P 2i,NT

¯> ε

ª→ 0 for all ε > 0, (53)

as (N,T →∞). SincesupN,T

sup1≤i≤N

E kQi,T k4

≤ kιk8 supN,T

sup1≤i≤N

E³kDT wik4

³EFw (ui −EFwui)4

´´≤ κ4u sup

N,Tsup

1≤i≤NE kDT wik4 <∞,

the Lindeberg-Feller condition (53) follows, and we have all the desired results.¥

40

Page 41: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

Part (d)Since vi is independent of wi, we have

E

°°°°° 1√N

Xi

DT wivi

°°°°°2

= E

°°°°° 1√N

Xi

DT wivi

°°°°°2

=σ2vTtr

Ã1

N

Xi

EDT wiw0iDT

!

≤ σ2vTsupN,T

sup1≤i≤N

E kDT wik2 → 0,

as (N,T →∞) , where the last convergence holds because Part (b) warrants

supN,T

sup1≤i≤N

E kDT wik2 <∞. ¥

Proof of Lemma 1Part (a)By definition, we have

1

N

Xi

1

T

Xt

GxT (xit − xi) (xit − xi)0GxT

=1

N

Xi

1

T

Xt

GxT (xit −Eixit +Eixit −Eixi +Eixi − xi)

× (xit −Eixit +Eixit −Eixi +Eixi − xi)0GxT ,=

1

N

Xi

1

T

Xt

(III1,iT+III2,iT+III3,iT )(III1,iT+III2,iT+III3,iT )0, say.

First, we show that

1

N

Xi

1

T

Xt

III1,iT III01,iT

=1

N

Xi

1

T

Xt

⎛⎝D1T (x1,it −Ex1,it)x2,it −Ex2,itx3,it −Ex3,it

⎞⎠⎛⎝D1T (x1,it − Ex1,it)x2,it −Ex2,itx3,it −Ex3,it

⎞⎠0

→ p

⎛⎝ 0 0 00 Φ22 Φ230 Φ023 Φ33

⎞⎠ . (54)

For this, set

QiT =1

T

Xt

⎛⎝ D1T (x1,it −Ex1,it)x2,it −Ex2,itx3,it −Ex3,it

⎞⎠⎛⎝ D1T (x1,it −Ex1,it)x2,it −Ex2,itx3,it −Ex3,it

⎞⎠0

.

41

Page 42: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

By the Cauchy-Schwarz inequality,

kQiT k2 ≤

⎡⎢⎣1T

Xt

°°°°°°⎛⎝D1T (x1,it −Ex1,it)x2,it −Ex2,itx3,it −Ex3,it

⎞⎠°°°°°°2⎤⎥⎦2

≤ M1

Ã1

T

Xt

½ kD1T (x1,it −Ex1,it)k4 + kx2,it −Ex2,itk4+ kx3,it −Ex3,itk4

¾!.

Notice that by Assumptions 2(i), 3(ii),

supi,tE kx1,it −Ex1,itk4 , sup

i,tE kx2,it −Ex2,itk4 <∞, (55)

and by Assumption 4(ii) and Assumption 6(iii),

supi,tE kx3,it −Ex3,itk4

≤ M1

µsupi,tEFzi

°°x3,it −EFzix3,it°°4¶+M1E°°EFzix3,it −Ex3,it°°4

< ∞. (56)

Thus, we havesupi,TE kQiT k2 <∞.

From this,

supi,TE kQiT k 1 kQiT k > M ≤

supi,T E kQiTk2M

→ 0,

as M →∞. By Lemma 12 and Assumption 8(ii), then,1

N

Xi

1

T

Xt

III1,iT III01,iT

=1

N

Xi

QiT →p limN,T

1

N

Xi

EQiT =

⎛⎝ 0 0 00 Φ22 Φ230 Φ023 Φ33

⎞⎠ .Next, by Assumption 6(i) and (ii), Lemma 10, and Lemma 11, we have

1

N

Xi

1

T

Xt

III2,iT III02,iT

=1

N

Xi

1

T

Xt

⎛⎝D1T (Ex1,it −Ex1,i)Ex2,it −Ex2,iEx3,it −Ex3,i

⎞⎠⎛⎝D1T (Ex1,it −Ex1,i)Ex2,it −Ex2,iEx3,it −Ex3,i

⎞⎠0

→⎛⎝ R 10 ¡τ1 − R τ1¢ ¡limN 1

N

PiΘ1,iΘ

01,i

¢ ¡τ1 −

Rτ1¢0dr 0 0

0 0 00 0 0

⎞⎠ .(57)42

Page 43: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

In addition, we have

E

°°°°° 1N Xi

1

T

Xt

III3,iT III03,iT

°°°°°= E

°°°°°° 1NXi

1

T

Xt

⎛⎝ D1T (x1,it −Ex1,it)(x2,it −Ex2,it)(x3,it −Ex3,it)

⎞⎠°°°°°°2

≤ 1

N2

Xi

1

T

Xt

E

°°°°°°⎛⎝ D1T (x1,it −Ex1,it)(x2,it −Ex2,it)(x3,it −Ex3,it)

⎞⎠°°°°°°4

→ p0, (58)

as (N,T →∞), as shown in (55) and (56). Thus,1

N

Xi

1

T

Xt

III3,iT III03,iT →p 0.

From the Cauchy-Schwarz inequality, (54) , (57) and (58) , we have

1

N

Xi

1

T

Xt

III1,iT III02,iT → p0;

1

N

Xi

1

T

Xt

III1,iT III03,iT →p 0;

1

N

Xi

1

T

Xt

III2,iT III03,iT → p0,

as (N,T →∞), Combining all of these, we have1

N

Xi

1

T

Xt

GxT (xit − xi) (xit − xi)0GxT

→ p

⎛⎝ R 10

¡τ1 −

Rτ1¢ ¡limN

1N

PiΘ1,iΘ

01,i

¢ ¡τ1 −

Rτ1¢0dr 0 0

0 Φ22 Φ230 Φ023 Φ33

⎞⎠≡ Ψx.

as (N,T →∞).Part (b)First, let Qi,T =

1√T

PtGxT (xit − xi) vit; and let ι ∈ Rk with kιk = 1. If

we can show that as (N,T →∞) ,1√N

Xi

ι0Qi,T ⇒ N¡0,σ2vι

0Ψxι¢, (59)

then, the Cramer-Wold device implies our desired result. Now let s2i,T =

E (ι0Qi,T )2and S2NT =

Pi s2i,T . Using similar arguments for (54) — (58) , it

43

Page 44: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

is possible to show that

1

NS2NT = σ2vι

0E

Ã1

N

Xi

1

T

Xt

GxT (xit − xi) (xit − xi)0GxT!ιk

→ σ2vι0Ψxι > 0, (60)

as (N,T →∞) . So, the asymptotic normality in (59) holds ifXi

ι0Qi,TSNT

⇒ N (0, 1) , (61)

as (N,T →∞) . Let Pi,NT = ι0Qi,T

SN,T. Then, E (Pi,NT ) = 0 and

PiEP

2i,NT = 1.

Thus, by the central limit theorem of the double indexed process (e.g., seeTheorem 2 in Phillips and Moon, 1999), we can claim that (61) holds, if we canshow that X

i

EP 2i,NT 1©¯P 2i,NT

¯> ε

ª→ 0 for all ε > 0, (62)

as (N,T →∞).Now, in view of (60) , condition (62) holds if

supi,TE kι0Qi,T k4 ≤ sup

i,TE kQi,T k4 <∞. (63)

Note for some constant M1 that

supi,TE kQi,T k4

= supi,TE

°°°°° 1√T

Xt

Gx,T (xit − xi) vit°°°°°4

= supi,TE

⎡⎢⎣tr⎛⎜⎝ 1T2

Xt,s,p,q

µGx,T (xit − xi) (xis − xi)0Gx,T⊗Gx,T (xip − xi) (xiq − xi)0Gx,T

¶×E (vitvisvipviq)

⎞⎟⎠⎤⎥⎦

≤ M1 supi,T

¯¯EÃ1T2

Xt,s

tr

µGx,T (xit − xi) (xis − xi)0Gx,T⊗Gx,T (xit − xi) (xis − xi)0Gx,T

¶!×E ¡v2itv2is¢

¯¯

+M1 supi,T

¯¯EÃ1T2

Xt,s

tr

µGx,T (xit − xi) (xis − xi)0Gx,T⊗Gx,T (xis − xi) (xit − xi)0Gx,T

¶!×E ¡v2itv2is¢

¯¯

+M1 supi,T

¯¯EÃ1T2

Xt,s

tr

µGx,T (xit − xi) (xit − xi)0Gx,T⊗Gx,T (xis − xi) (xis − xi)0Gx,T

¶!×E ¡v2itv2is¢

¯¯ . (64)

44

Page 45: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

Using the fact that tr (A⊗B) = tr (A) tr (B) and the Cauchy-Schwarz inequal-ity, we have

1

T 2

Xt

Xs

tr

µGx,T (xit − xi) (xis − xi)0Gx,T⊗Gx,T (xit − xi) (xis − xi)0Gx,T

¶=

1

T 2

Xt

Xs

©tr£Gx,T (xit − xi) (xis − xi)0Gx,T

¤ª2≤

"1

T

Xt

kGx,T (xit − xi)k2#2.

Similarly,

1

T 2

Xt

Xs

tr

µGx,T (xit − xi) (xis − xi)0Gx,T⊗Gx,T (xis − xi) (xit − xi)0Gx,T

≤"1

T

Xt

kGx,T (xit − xi)k2#2,

and

1

T 2

Xt

Xs

tr

µGx,T (xit − xi) (xit − xi)0Gx,T⊗Gx,T (xis − xi) (xis − xi)0Gx,T

=

Ã1

T

Xt

tr¡Gx,T (xit − xi) (xit − xi)0Gx,T

¢!2

=

"1

T

Xt

kGx,T (xit − xi)k2#2.

Thus, the right hand side of (64) is less than or equal to

3M1κ4v supi,T

sup1≤t≤T

E

"1

T

Xt

kGx,T (xit − xi)k2#2.

Note that

supi,T

sup1≤t≤T

E

"1

T

Xt

kGx,T (xit − xi)k2#2

≤ supi,T

sup1≤t≤T

E

"1

T

Xt

kGx,Txitk2#2

≤ M2

Ãsupi,T sup1≤t≤T E kD1Tx1,itk4 + supi,T sup1≤t≤T E kx2,itk4

+supi,T sup1≤t≤T E kx3,itk4!,

45

Page 46: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

for some constant M2 <∞. By Assumptions 2 and 6(i), for t = [Tr] and finiteconstant M2, we have

supi,T

sup1≤t≤T

E kD1Tx1,itk4

≤ supi,T

sup1≤t≤T

M3

³kD1Tk4E k(x1,it −Ex1,it)k4 + kD1TEx1,itk4

´→ sup

isupr∈[0,1]

kτ1 (r)Θ1,ik4 <∞.

Next, similarly, by Assumptions 3(i) and (ii),

supi,T

sup1≤t≤T

E kx2,itk4≤supi,T

sup1≤t≤T

M3

hEkx2,it −Ex2,itk4+kEx2,itk4

i<∞,

and by Assumptions 4(i) and (ii) and 6(iii) and (iv),

supi,T

sup1≤t≤T

E kx3,itk4 ≤ supi,T

sup1≤t≤T

M3

³E°°x3,it −EFzix3,it°°4 + kEx3,itk4´

≤ M2

⎡⎢⎣ E³supi,t

°°x3,it −EFzix3,it°°4´+supi,tE

°°EFzix3,it −Ex3,it°°4+supi,t kEx3,itk4

⎤⎥⎦ <∞.Therefore,

supi,T

sup1≤t≤T

E

"1

T

Xt

kGx,T (xit − xi)k2#2< M,

which yields (63). ¥Part (c)By Lemma 13(a). ¥

Part (d)By Lemma 13(d). ¥

Proof of Lemma 2By Lemma 13(c). ¥

Proof of Lemma 3Write

1

N

PiDT ewieui= 1N P

iDT ewi (ui −EFwui) +µ 1N PiDT ewiw0iDT¶λ. (65)

Notice that by Assumption 11,

E

ÃEFw

°°°° 1N PiDT ewi (ui −EFwui)°°°°2

!=

σ2uNtrE

µ1

N

PiDT ewiw0iDT¶ .

46

Page 47: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

But, by Lemma 13(b),

trE

µ1

N

PiDT ewiw0iDT¶ < M,

for some constant M. This implies that

E

°°°° 1N PiDT ewi (ui −EFwui)°°°°2 → 0,

as (N,T →∞). Thus, by Chebyshev’s inequality,1

N

PiDT ewi (ui − EFwui)→p 0, (66)

as (N,T →∞). Then, Lemma 13(a), (65) and (66) imply our desired result. ¥Proof of Theorem 4By Lemma 1(a) and (b). ¥

Proof of Theorem 5By Lemma 1(c), (d), and Lemma 2. ¥

Before we prove the rest of the theorems given in Section 4, we introducethe following notation:

A1 =1N

Pi1T

Pt (xit − xi) (xit − xi)0 ;

A2 =1N

Pi1T

Pt (xit − xi) (vit − vi) ;

A3 =1N

Pi xix

0i; A4 =

1N

Pi xiui; A5 =

1N

Pi xivi;

B3 =1N

Pi ziz

0i; B4 =

1N

Pi ziui; B5 =

1N

Pi zivi;

C = 1N

Pi xiz

0i;

F1 = A3 − CB−13 C 0; F2 = A4 +A5 − CB−13 (B4 +B5) .

(67)

Proof of Theorem 6Using the notation given in (67) , we can express the GLS estimator βg by

√NTG−1x,T

³βg − β

´=

£Gx,TA1Gx,T + θ2TGx,T

©A3 − CB−13 C0

ªGx,T

¤−1×√NTGx,T

©A2 + θ2T

£(A4 +A5)− CB−13 (B4 +B5)

¤ª, (68)

where θT =pσ2v/(Tσ

2u + σ2v).

Part (a)First, consider

θ2TGx,T©A3 − CB−13 C 0

ªGx,T

= θ2TGx,TD−1x,T

½ 1N

PiDx,T xix

0iDx,T

− ¡ 1N PiDx,T xiz0i

¢ ¡1N

Pi ziz

0i

¢−1 ¡ 1N

Pi zix

0iDx,T

¢ ¾×D−1x,TGx,T .

47

Page 48: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

By Lemma 1,½ 1N

PiDx,T xix

0iDx,T

− ¡ 1N PiDx,T xiz0i

¢ ¡1N

Pi ziz

0i

¢−1 ¡ 1N

Pi zix

0iDx,T

¢ ¾ = Op (1) ,and by definition,

Gx,TD−1x,T = J

−1x,T = O (1) ,

as (N,T →∞) . Thus,θ2TGx,T

©A3 − CB−13 C 0

ªGx,T = Op

¡θ2T¢= op (1) . (69)

Next, consider√NTGx,T

©θ2T£(A4 +A5)− CB−13 (B4 +B5)

¤ª= θ2T

√TGx,TD

−1x,T

( 1√N

PiDx,T xi(ui + vi)

− ¡ 1N PiDx,T xiz0i

¢¡1N

Pi ziz

0i

¢−1³ 1√N

Pi zi (ui + vi)

).

By Lemma 1, under the local alternatives to random effects (Assumption 12),( 1√N

PiDx,T xi(ui + vi)

− ¡ 1N PiDx,T xiz0i

¢¡1N

Pi ziz

0i

¢−1³ 1√N

Pi zi (ui + vi)

)= Op (1) .

By definition,

θ2T√TGx,TD

−1x,T = O

µ1√T

¶.

Thus,

√NTGx,T

©θ2T£(A4 +A5)−CB−13 (B4 +B5)

¤ª=Op

µ1√T

¶= op(1) . (70)

Substituting (69) and (70) into (68), we have

√NT (bβg − β) = [Gx,TA1Gx,T + op(1)]

−1[√NTGx,TA2 + op(1)]

=√NT (bβw − β) + op(1).

The last equality results from Lemma 1(a), (b) and Theorem 4. ¥Part (b)Similarly to Part (a), we can easily show that under the assumptions given

in Part (b), the denominator in (68) is

1

NT

Xi

Xt

Gx,T (xit − xi) (xit − xi)0Gx,T + op (1) . (71)

Consider the second term of the numerator of (68):

θ2√TGx,T

( 1√N

Pi xi(ui + vi)

− ¡ 1N Pi xiz0i

¢¡1N

Pi ziz

0i

¢−1³ 1√N

Pi zi (ui + vi)

). (72)

48

Page 49: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

Notice that by Lemmas 1 and 3, under the fixed effect assumption (Assumption11), the first term of (72) is

θ2√NTGx,T

1

N

Xi

xi(ui + vi)

= θ2√NTGx,TD

−1x,T

1

N

Xi

Dx,T xi(ui + vi)

= θ2√NTGx,TD

−1x,T Ξxλ+ op (1) ,

where we partition

Ξ =

µΞxx ΞxzΞzx Ξzz

¶conformably to the sizes of xit and zi, and set Ξx = (Ξxx,Ξxz) . Similarly, byLemmas 1 and 3, under the fixed effect assumption, the second term of (72) is

θ2T√NTGx,TD

−1x,T

⎧⎨⎩Ã1

N

Xi

Dx,T xiz0i

!Ã1

N

Xi

ziz0i

!−1Ã1

N

Xi

zi (ui + vi)

!⎫⎬⎭= θ2T

√NTGx,TD

−1x,T

©ΞxzΞ

−1zz Ξzλ+ op (1)

ª.

Therefore, the limit of (72) is³θ2T√NTGx,TD

−1x,T

´ h³Ξxx − ΞxzΞ−1zz Ξzx

... 0

´λ+ op (1)

i.

Recall that it is assumed that NT → c <∞. Also, recall that under the restric-tions given in the theorem,Gx,T = diag (Ik22 , Ik3) andDx,T = diag

³√TIk22 ,D3T

´.

Then, letting λmax (A) denote the maximum eigenvalue of matrixA, we can have

λmax

³θ2T√NTGx,TD

−1x,T

´= O (1)

rN

Tλmax

µ1√TIk22 ,D

−13T

¶→ 0.

So, under the assumptions of Part (b), the probability limit of the numeratorof (68) is

1√NT

Xi

Xt

Gx,T xitvit + op (1) . (73)

Combining (71) and (73), we can obtain Part (b). ¥

Proof of Theorem 7Using the notation in (67) , we can express the GLS estimator γg by

γg − γ

49

Page 50: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

=

"B3 − C 0

µ1

θ2TA1 +A3

¶−1C

#−1

×"(B4 +B5)− C 0

µ1

θ2TA1 +A3

¶−1µ1

θ2TA2 + (A4 +A5)

¶#. (74)

Part (a)Notice that

C 0µ1

θ2TA1 +A3

¶−1C

= C 0Dx,T

µ1

Tθ2T

√TJx,TGx,TA1Gx,TJx,T

√T +Dx,TA3Dx,T

¶−1Dx,TC

= Tθ2T (C0Dx,T )

J−1x,T√T

ÃGx,TA1Gx,T

+J−1x,T√TDx,TA3Dx,T

J−1x,T√T

!−1J−1x,T√T(Dx,TC)

= Op

ÃJ−2x,TT

!= op (1) .

The third equality holds because the limit of Gx,TA1Gx,T is positive definite(by Lemma 1(a) and Assumption 13), Tθ2T = O(1), and

Dx,TC, Gx,TA1Gx,T , Dx,TA3Dx,T = Op (1)

(by Lemma 1(a), (c)). The last equality results from the fact that

J−1x,T√T= o (1) .

Thus, as (N,T →∞) , the denominator of (74) is

B3 − C0µ1

θ2TA1 +A3

¶−1C = B3 + op (1) . (75)

Next, under both the random effects assumption (Assumption 10) and thelocal alternatives (Assumption 12), it follows from Lemmas 1 and 2 that thesecond term in the numerator of (74) is

C 0µ1

θ2A1 +A3

¶−1µ1

θ2√NA2 +

√N (A4 +A5)

¶= C 0Dx,T

µ1

Tθ2√TJx,TGx,TA1Gx,TJx,T

√T +Dx,TA3Dx,T

¶−1×Ã√

T

Tθ2Jx,T√NTGx,TA2 +

√NDx,T (A4 +A5)

!

50

Page 51: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

= Tθ2 (C0Dx,T )J−1x,T√T

ÃGx,TA1Gx,T +

J−1x,T√TDx,TA3Dx,T

J−1x,T√T

!−1

×Ã1

Tθ2√NTGx,TA2 +

J−1x,T√T

√NDx,T (A4 +A5)

!

= Op

ÃJ−1x,T√T

!= op (1) ,

as (N,T →∞) . Also, by Lemma 1(d), as (N,T →∞) ,√NB5 =

1√N

Xi

zivi = op (1) .

Therefore, the numerator of (74) is

√N

Ã(B4+B5)− C0

µ1

θ2TA1+A3

¶−1µ1

θ2TA2 + (A4+A5)

¶!=√NB4 + op (1) , (76)

as (N,T →∞) . In view of (74), (75) and (76) , we have

√N¡γg − γ

¢=

Ã1

N

Xi

ziz0i

!−1Ã1√N

Xi

ziui

!+ op (1) ,

as (N,T →∞) .Finally, by Lemma 1(c) and Lemma 2, as (N,T →∞) ,

√N¡γg − γ

¢=

Ã1

N

Xi

ziz0i

!−1Ã1√N

Xi

ziui

!⇒ N

³(l0zΞlz)

−1(l0zΞλ) , (l

0zΞlz)

−1´,

as required. ¥Part (b)Under the assumptions in Part (b), as shown for the denominator of (74) ,

B3 − C0µ1

θ2TA1 +A3

¶−1C =

1

N

Xi

ziz0i + op (1)→p l

0zΞlz, (77)

as (N,T →∞) . Next, consider the numerator of (74) ,"(B4 +B5)− C 0

µ1

θ2A1 +A3

¶−1µ1

θ2A2 + (A4 +A5)

¶#= (B4 +B5)

51

Page 52: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

−Tθ2 (C 0Dx,T )J−1x,T√T

ÃGx,TA1Gx,T +

J−1x,T√TDx,TA3Dx,T

J−1x,T√T

!−1

×Ã

1√NTθ2

√NTGx,TA2 +

J−1x,T√TDx,T (A4 +A5)

!.

By Lemma 1(d), (b), and (d),

B5 = op (1) ,√NTGx,TA2 = Op (1) ,

Dx,TA5 = Op (1) ,

respectively. Under the fixed effect assumption (Assumption 11), Lemma 3implies that

Dx,TA4 = Op (1) ,

as (N,T →∞) . Since1√NTθ2T

,J−1x,T√T= o (1) ,

and

Tθ2 (C0Dx,T )J−1x,T√T

ÃGx,TA1Gx,T +

J−1x,T√TDx,TA3Dx,T

J−1x,T√T

!−1= op (1)

(as shown in Part (a)), we have

(B4 +B5)− C 0µ1

θ2A1 +A3

¶−1µ1

θ2A2 + (A4 +A5)

¶= B4 + op (1) . (78)

But, according to Lemma 3,

B4 =1

N

Xi

ziui →p l0zΞλ. (79)

Therefore, (74), (77), (78) and (79) imply

γg →p γ + (l0zΞlz)

−1l0zΞλ,

as (N,T →∞) . ¥

Proof of Theorem 8Using the notation in (67), we can express the Hausman test statistic by

HMNT =h¡A1 + θ2TF1

¢−1√NT

¡A2 + θ2TF2

¢−A−11 √NTA2i0×hσ2vA

−11 − σ2v

¡A1 + θ2TF1

¢−1i−1×h¡A1 + θ2TF1

¢−1√NT

¡A2 + θ2TF2

¢−A−11 √NTA2i .52

Page 53: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

Observe that for any conformable matrices P and Q, we have

(P +Q)−1 − P−1 = −P−1QP−1 + (P +Q)−1QP−1QP−1.

Using this fact, we write¡A1 + θ2TF1

¢−1 −A−11 = −θ2TA−11 F1A−11 + θ4TR1, (80)

where R1 =¡A1 + θ2TF1

¢−1F1A

−11 F1A

−11 . Define

Q = ¡A1 + θ2TF1¢−1√

NT¡A2 + θ2TF2

¢−A−11 √NTA2.Then,

Q=¡A1 + θ2TF1

¢−1√NT

¡A2 + θ2TF2

¢−A−11 √NT ¡A2 + θ2TF2¢

+A−11√NTθ2TF2

=n¡A1 + θ2TF1

¢−1 −A−11 o√NT ©A2 + θ2TF2ª+A−11

√NTθ2TF2

= −A−11¡θ2TF1

¢A−11√NT

©A2 + θ2TF2

ª+A−11

√NTθ2TF2

+θ4TR1√NT

©A2 + θ2TF2

ª= −θ2T

√NT

£A−11 F1A

−11 A2−A−11 F2

¤−θ2T√NT∙ θ2TA−11 F1A

−11 F2

−θ2TR1©A2 + θ2TF2

ª= −θ2T

√NT

£A−11 F1A

−11 A2 −A−11 F2

¤− θ4T√NTR2, (81)

where R2 = A−11 F1A

−11 F2 −R1

©A2 + θ2TF2

ª.

In view of (80) and (81) , we now can rewrite the Hausman statistic as

HMNT = Q0hσ2vA

−11 − σ2v

¡A1 + θ2TF1

¢−1i−1Q= θT

√NT

£A−11 F1A

−11 A2 −A−11 F2 + θ2TR2

¤0× £σ2vA−11 F1A−11 − σ2vθ

2TR1

¤−1×θT√NT

£A−11 F1A

−11 A2 −A−11 F2 + θ2TR2

¤;

or equivalently,

HMNT

= θT√NT

"G1

³J−1x,TDx,TF1Dx,TJ

−1x,T

´G1Gx,TA2

−G1J−1x,TDx,TF2 + θ2TG−1x,TR2

#0×hσ2vG1

³J−1x,TDx,TF1Dx,TJ

−1x,T

´G1 + σ2vθ

2T

³G−1x,TR1G

−1x,T

´i−1×θT√NT

"G1

³J−1x,TDx,TF1Dx,TJ

−1x,T

´G1Gx,TA2

−G1J−1x,TDx,TF2 + θ2TG−1x,TR2

#,

53

Page 54: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

where G1 = G−1x,TA

−11 G−1x,T .

Notice that

θ2T

³G−1x,TR1G

−1x,T

´= θ2TG

−1x,T

¡A1 + θ2F1

¢−1G−1x,T

×Gx,TF1Gx,TG1Gx,TF1Gx,TG1.Lemma 1(a) and Assumption 13 imply

G1 = Op (1) .

In addition, by Lemma 1(c)

Gx,TF1Gx,T = J−1x,TDx,TF1Dx,TJ

−1x,T = Op (1) , (82)

since J−1x,T = O (1) . Thus,

σ2vθ2T

³G−1x,TR1G

−1x,T

´= Op

¡θ2T¢= op (1) . (83)

Now, consider

θT√NTθ2TG

−1x,TR2

= θT√Th√Nθ2TG

−1x,T

¡A−11 F1A

−11 F2

¢+√Nθ2TG

−1x,TR1

¡A2 + θ2TF2

¢i= θT

√T

⎡⎣θ2T ³G−1x,TA−11 G−1x,T (Gx,TF1Gx,T )³G−1x,TA

−11 G

−1x,T

´Gx,T

√NF2

+θ2T

³G−1x,TR1G

−1x,T

³√NGx,TA2 + θ2T

√NGx,TF2

´ ⎤⎦.Under the local alternatives (Assumption 12), we may deduce from Lemmas 1and 2 that

G1, Gx,TF1Gx,T , Gx,T√NF2 = Op (1) ;

θ2T

³G−1x,TR1G

−1x,T

´,√NGx,TA2 = op (1) .

Since θT√T = O (1) , we have

θT√NTθ2TG

−1x,TR2 = O (1)

£θ2TOp (1) + op (1)

¤= op (1) . (84)

Using the results (83) and (84) , we now can approximate the Hausmanstatistic as follows:

HMNT

=θTσv

√NT

h³J−1x,TDx,TF1Dx,TJ

−1x,T

´(G1Gx,TA2)− J−1x,TDx,TF2 + op (1)

i0׳J−1x,TDx,TF1Dx,TJ

−1x,T + op (1)

´−1×θTσv

√NT

h³J−1x,TDx,TF1Dx,TJ

−1x,T

´(G1Gx,TA2)− J−1x,TDx,TF2 + op (1)

i=

θTσv

√NT (Dx,TF2)

0(Dx,TF1Dx,T )

−1 θTσv

√NT (Dx,TF2)

+op (1) .

54

Page 55: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

Here, the last line holds because under the local alternative hypotheses,

J−1x,TDx,TF1Dx,TJ−1x,T = Op (1) ,

by (82), andθTσvG1

³√NTGx,TA2

´= Op (θT ) = op (1) ,

by Lemma 1(a), (b) and Assumption 13.Finally, by Lemma 1(c), as (N,T →∞) ,

Dx,TF1Dx,T

= Dx,TA3Dx,T −Dx,TCB−13 C0Dx,T

=1

N

Xi

Dx,T xix0iDx,T

−Ã1

N

Xi

Dx,T xiz0i

!Ã1

N

Xi

ziz0i

!−1Ã1

N

Xi

Dx,T xiz0i

!0→ pΞxx − ΞxzΞ−1zz Ξzx > 0.

Also, Lemmas 1(d) and 2 imply that under the local alternative hypotheses, as(N,T →∞) ,

√NDx,TF2

=√NDx,TA4 +

√NDx,TA5 −Dx,TCB−13

√N (B4 +B5)

=1√N

NXi=1

Dx,T xiui

−Ã1

N

NXi=1

Dx,T xiz0i

!Ã1

N

NXi=1

ziz0i

!−1Ã1√N

NXi=1

ziui

!+ op (1)

⇒ ¡Ik −ΞxzΞ−1zz

¢N¡Ξλ,σ2uΞ

¢.

As (N,T →∞) , θTσv

√T → 1

σu. Therefore, under the hypothesis of random ef-

fects,HMNT ⇒ χ2k,

a χ2 distribution with the degrees of freedom equal to k. In contrast, under thelocal alternative hypotheses,

HMNT ⇒ χ2k (η) ,

where η is the noncentral parameter.

55

Page 56: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

References

[1] Ahn, S.C., and S. Low, 1996, A reformulation of the Hausman test forregression models with cross-section-time-series data, Journal of Econo-metrics,71, 291-307.

[2] Ahn, S.C., and H.R. Moon, 2001, Large-N and Large-T Properties of PanelData Estimators and the Hausman Test, CLEO Discussion Papers, C01-20,University of Southern California.

[3] Alvarez, J., and M. Arellano, 1998, The time series and cross-sectionasymptotics of dynamic panel data estimators, mimeo, CEMFI, Spain.

[4] Amemiya T., and T.E. MaCurdy, 1986, Instrumental-variables estimationof an error components model, Econometrica, 54, 869-880.

[5] Andrews, D., 1991, Heteroskedasticity and Autocorrelation Consistent Co-variance Matrix Estimation, Econometrica, 59, 817-858

[6] Apostol, T., 1974, Mathematical Analysis, 2nd Ed., Addison-Welsey, Read-ing, MA.

[7] Balestra, P. and M. Nerlove, 1966, Pooling cross-section and time-seriesdata in the estimation of a dynamic model: The demand for natural gas,Econometrica, 34, 585- 612.

[8] Baltagi, B. and S. Khanti-Akom, 1990, On efficient estimation with paneldata: An empirical comparison of instrumental variables estimators, Jour-nal of Applied Econometrics, 5, 401-406.

[9] Baltagi, B., 1995, Econometric Analysis of Panel Data, John Wiley & SonsLtd., New York, New York.

[10] Bosq, 1996, Nonparametric Statistics for Stochastic processes, Springer-Verlag, New York.

[11] Breusch, T.S., G.E. Mizon, and P. Schmidt, 1989, Efficient estimation usingpanel data, Econometrica, 57, 695-700.

[12] Choi, I., 1998, Asymptotic analysis of a nonstationary error componentmodel, mimeo, Kookmin University, Korea.

[13] Cornwell, C. and P. Rupert, 1988, Efficient estimation with panel data:An empirical comparison of instrumental variables estimators, Journal ofApplied Econometrics, 3, 149-155.

[14] Hahn, J. and G. Kuersteiner, 2000, Asymptotically unbiased inference fora dynamic panel model with fixed effects when both n and T are large,mimeo.

56

Page 57: Large-N and Large-T Properties of Panel Data Estimators ...miniahn/archive/larget.pdf · and Baltagi, Chapter 2, 1995). This asymptotic equivalence result has been ob-tained using

[15] Hausman, J.A., 1978, Specification tests in econometrics, Econometrica,46, 1251-1272.

[16] Hausman, J.A. and W.E. Taylor, 1981, Panel data and unobservable indi-vidual effects, Econometrica, 49, 1377-1398.

[17] Hall and Heyde, 1980, Martingale Limit Theory and Its Application, Aca-demic Press, San Diego.

[18] Higgins, M. and E. Zakrajsek, 1999, Purchasing power parity: Three stakesthrough the heart of the unit root null, NBER working paper.

[19] Hsiao, C., 1986, Analysis of Panel Data, New York: Cambridge UniversityPress.

[20] Im, K.S., M.H. Pesaran, and Y. Shin, 1997, Testing for unit roots in Het-erogeneous panels, mimeo.

[21] Kao, C., 1999, Spurious regression and residual-based tests for cointegra-tion in panel data, Journal of Econometrics, 90, 1-44.

[22] Levin, A., C-F Lin, 1992, Unit root tests in panel data: Asymptotic andfinite-sample properties, mimeo, University of California at San Diego.

[23] Levin, A., C-F Lin, 1993, Unit root tests in panel data: New results,mimeo,University of California at San Diego.

[24] Maddala, G.S., 1971, The use of variance components models in poolingcross section and time series data, Econometrica, 39, 341-358.

[25] Matyas, L. and P. Sevestre, 1992, The econometrics of Panel Data, Dor-drecht: Kluwer Academic Publishers.

[26] Phillips, P.C.B., and H. Moon, 1999, Linear regression limit theory fornonstationary panel data, Econometrica, 67, 1057-1111.

[27] Quah, D., 1994, Exploiting Cross-section variations for unit root inferencein dynamic data, Economics Letters, 44, 9-19.

[28] van der Vaart and Wellner, 1996, Weak Convergence and Empirical Pro-cesses, Springer-Verlag, New York.

57