51
http://www.econometricsociety.org/ Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA LIANGJUN SU School of Economics, Singapore Management University, Singapore 178903, Singapore ZHENTAO SHI The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China PETER C. B. PHILLIPS Cowles Foundation for Research in Economics, Yale University, New Haven, CT 06520, U.S.A., University of Auckland, University of Southampton, and Singapore Management University The copyright to this Article is held by the Econometric Society. It may be downloaded, printed and reproduced only for educational or research purposes, including use in course packs. No downloading or copying may be done for any commercial purpose without the explicit permission of the Econometric Society. For such commercial purposes contact the Office of the Econometric Society (contact information may be found at the website http://www.econometricsociety.org or in the back cover of Econometrica). This statement must be included on all copies of this Article that are made available electronically or in any other format.

mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

http://www.econometricsociety.org/

Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264

IDENTIFYING LATENT STRUCTURES IN PANEL DATA

LIANGJUN SUSchool of Economics, Singapore Management University, Singapore 178903,

Singapore

ZHENTAO SHIThe Chinese University of Hong Kong, Shatin, Hong Kong SAR, China

PETER C. B. PHILLIPSCowles Foundation for Research in Economics, Yale University, New Haven, CT

06520, U.S.A., University of Auckland, University of Southampton, andSingapore Management University

The copyright to this Article is held by the Econometric Society. It may be downloaded,printed and reproduced only for educational or research purposes, including use in coursepacks. No downloading or copying may be done for any commercial purpose without theexplicit permission of the Econometric Society. For such commercial purposes contactthe Office of the Econometric Society (contact information may be found at the websitehttp://www.econometricsociety.org or in the back cover of Econometrica). This statement mustbe included on all copies of this Article that are made available electronically or in any otherformat.

Page 2: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264

IDENTIFYING LATENT STRUCTURES IN PANEL DATA

BY LIANGJUN SU, ZHENTAO SHI, AND PETER C. B. PHILLIPS1

This paper provides a novel mechanism for identifying and estimating latent groupstructures in panel data using penalized techniques. We consider both linear and non-linear models where the regression coefficients are heterogeneous across groups buthomogeneous within a group and the group membership is unknown. Two approachesare considered—penalized profile likelihood (PPL) estimation for the general nonlin-ear models without endogenous regressors, and penalized GMM (PGMM) estima-tion for linear models with endogeneity. In both cases, we develop a new variant ofLasso called classifier-Lasso (C-Lasso) that serves to shrink individual coefficients tothe unknown group-specific coefficients. C-Lasso achieves simultaneous classificationand consistent estimation in a single step and the classification exhibits the desirableproperty of uniform consistency. For PPL estimation, C-Lasso also achieves the oracleproperty so that group-specific parameter estimators are asymptotically equivalent toinfeasible estimators that use individual group identity information. For PGMM esti-mation, the oracle property of C-Lasso is preserved in some special cases. Simulationsdemonstrate good finite-sample performance of the approach in both classification andestimation. Empirical applications to both linear and nonlinear models are presented.

KEYWORDS: Classification, cluster analysis, dynamic panel, group Lasso, high di-mensionality, nonlinear panel, oracle property, panel structure model, parameter het-erogeneity, penalized least squares, penalized GMM, penalized profile likelihood.

1. INTRODUCTION

PANEL DATA ARE WIDELY USED IN EMPIRICAL ANALYSIS in many disciplinesacross the social and medical sciences. Such data usually cover individual unitssampled from different backgrounds and with different individual characteris-tics so that an abiding feature of the data is its heterogeneity, much of whichis simply unobserved. Neglecting latent heterogeneity in the data can lead tomany difficulties, including inconsistent estimation and misleading inference,as is well explained in the literature (e.g., Hsiao (2014, Chapter 6)). It is there-fore widely acknowledged that an important feature of good empirical model-ing is to control for heterogeneity in the data as well as for potential hetero-geneity in the response mechanisms that figure within the model. Since het-erogeneity is a latent feature of the data and its extent is unknown a priori,respecting the potential influence of heterogeneity on model specification is a

1The authors thank a co-editor and three anonymous referees for many constructive commentson the previous version of the paper. They also thank Stéphane Bonhomme, Xiaohong Chen, HanHong, Cheng Hsiao, Arthur Lewbel, Joon Park, and Yixiao Sun for discussions on the subjectmatter and comments on the paper. Su acknowledges support from the Singapore Ministry ofEducation for Academic Research Fund (AcRF) under the Tier-2 Grant MOE2012-T2-2-021 andthe funding support provided by the Lee Kong Chian Fund for Excellence. Phillips acknowledgesNSF support under Grants SES-0956687 and SES-1285258. Address correspondence to: LiangjunSu.

© 2016 The Econometric Society DOI: 10.3982/ECTA12560

Page 3: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2216 L. SU, Z. SHI, AND P. C. B. PHILLIPS

serious challenge in empirical research. Even in the simplest linear panel datamodels, the challenge is manifest and clearly stated: do we allow for heteroge-neous slope coefficients in regression as well as heterogeneous error variances?

While it may be clearly stated, this challenge to the empirical researcher isby no means easily addressed. While allowing for cross-sectional slope hetero-geneity in regression may help to avert misspecification bias, it also sacrificesthe power of cross-section averaging in the estimation of response patternsthat may be common across individuals, or more subtly, certain groups of indi-viduals in the panel. In the absence of prior information on such grouping andwith data where every new individual to the panel may bring new idiosyncraticelements to be explained, the challenge is demanding and almost universallyrelevant.

Traditional panel data models frequently deal with this challenge by avoid-ance. Complete slope homogeneity is assumed for certain specified commonparameters in the panel. Under this assumption, the regression parameters arethe same across individuals and unobserved heterogeneity is modeled throughindividual-specific effects which typically enter the model additively. This ap-proach is an exemplar of a convenient assumption that facilitates estimationand inference. Nevertheless, this assumption has been frequently questionedand rejected in empirical studies; see Hsiao and Tahmiscioglu (1997), Lee, Pe-saran, and Smith (1997), Durlauf, Kourtellos, and Minkin (2001), Phillips andSul (2007a), Browning and Carro (2007, 2010, 2014), and Su and Chen (2013),among others.

Despite general agreement that slope heterogeneity is endemic in empiri-cal work with panels, few methods are available to allow for heterogeneity inthe slopes when the extent of the heterogeneity is unknown. Some researchersassume complete slope heterogeneity where regression coefficients are com-pletely different for different individuals; see the survey by Baltagi, Bresson,and Pirotte (2008) and Hsiao and Pesaran (2008). Others consider panel struc-ture models where individuals belong to a number of homogeneous groupswithin a broadly heterogeneous population, and the regression parameters arethe same within each group but differ across groups. Two essential questionsremain: how to determine the unknown number of groups (dubbed conver-gence clubs in the economic growth literature); and how to identify the mem-bership of each individual. These are longstanding questions of statistical clas-sification in panel data. No completely satisfactory solution has yet been found,although various approaches have been adopted in empirical research. For in-stance, Bester and Hansen (2016) considered a panel structure model whereindividuals are grouped according to some external classification, geographiclocation, or observable explanatory variables; Ando and Bai (2015) considereda multifactor asset-pricing model with group-specific pervasive factors wherethe group membership is known. Here the group structure is assumed to becompletely known to the researcher, an approach that is common in practicalwork because of its convenience. In spite of its convenience, this approach to

Page 4: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2217

panel inference is inevitably misleading when the number of groups and indi-vidual identities are incorrectly classified.

Several approaches have been proposed to determine an unknown groupstructure in modeling unobserved slope heterogeneity in panels. The first ap-proach applies finite mixture models. For example, Sun (2005) considered aparametric finite mixture linear panel data model, and Kasahara and Shimotsu(2009) and Browning and Carro (2014) studied identification in discrete choicepanel data models for a fixed number of groups using nonparametric discretemixture distributions. The second approach is based on the K-means algorithmin statistical cluster analysis. Lin and Ng (2012) and Sarafidis and Weber (2015)considered linear panel data models where the slope coefficients have latentgroup structure. They modified the K-means algorithm to estimate the mod-els but did not provide any inference theory. Bonhomme and Manresa (2015,BM hereafter) considered a linear panel data model where the additive fixedeffects have group structure and applied the K-means algorithm to estimatethe model and study its asymptotic properties. Ando and Bai (2016) extendedBM’s approach to allow for group structure among the interactive fixed ef-fects. In addition, Phillips and Sul (2007a) developed an algorithm for deter-mining group clusters that relies on the estimation of evaporating trend func-tions to determine convergence clusters. Hahn and Moon (2010) argued thatthe group structure has sound foundations in game theory or macroeconomicmodels where multiplicity of Nash equilibria is expected and they considerednonlinear panel data models where the parameter of interest is common toindividuals whereas the fixed effects have finite support.

The present paper proposes a new method for econometric estimation andinference in panel models when the regression parameters are heterogeneousacross groups, individual group membership is unknown, and classification isto be determined empirically. It is an automated data-determined procedureand does not require the specification of any modeling mechanism for the un-known group structure. The methods proposed here have several novel aspectsin relation to earlier research and they contribute to both the Lasso and econo-metric classification literatures in various ways, which we outline in the follow-ing paragraphs.

First, our approach is motivated by a key advantage of Lasso technologyin coping with parameter sparsity. This advantage is particularly useful whenthe set of unknown parameters is potentially large but may also embody cer-tain sparse features. In a typical panel structure model, the effective numberof unknown regression parameters {βi� i = 1� � � � �N} is not of order O(N) asit would be if these parameters were all incidental, but rather of some orderO(K0), where K0 denotes the number of unknown groups within which the pa-rameters are homogeneous. Hence, in many empirical applications, the set ofunknown parameters in a panel structure model surely exhibits the desirablesparsity feature, making the use of Lasso technology highly appealing.

Second, the procedures developed in the present paper contribute to thefused Lasso literature in which sparsity arises because some parameters take

Page 5: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2218 L. SU, Z. SHI, AND P. C. B. PHILLIPS

the same value. The fused Lasso was proposed by Tibshirani, Saunders, Ros-set, Zhu, and Knight (2005) and was designed for problems with features thatcan be ordered in some meaningful way. It has been used to detect multiplestructural changes in the time series setting; see, for example, Harchaoui andLévy-Leduc (2010), Chan, Yau, and Zhang (2014), and Qian and Su (2015).The method cannot be used to classify individuals into different groups be-cause there is no natural ordering across individuals and so a different algo-rithm to locate common individuals is required. The present paper develops anew variant of the Lasso method that does not rely on the order of individualsin the data and which therefore contributes to the fused Lasso technology.

Third, standard Lasso technology involves an additive penalty term to theleast squares, GMM, or log-likelihood objective function, and when multiplepenalty terms are used, they enter the objective function additively. To achievesimultaneous group classification and estimation in a single step, our variantof Lasso involves N additive penalty terms, each of which takes a multiplicativeform as a product of K0 penalty terms. To the best of our knowledge, this pa-per is the first to propose a mixed additive-multiplicative penalty form that canserve as an engine for simultaneous classification and estimation. The methodworks by using each of the K0 penalty terms in the multiplicative expressionto shrink the individual-level regression parameter vectors to a particular un-known group-level parameter vector, thereby producing a joint shrinkage pro-cess to unknown quantities. This process is distinct from the prototypical Lassomethod that shrinks an individual parameter to the known value zero and thegroup Lasso method that shrinks a parameter vector to a known vector of ze-ros (see Yuan and Lin (2006)). To emphasize its role as a classifier and forfuture reference, we describe our new Lasso method as the classifier-Lasso orC-Lasso.

Fourth, we develop a double asymptotic limit theory for the C-Lasso thatdemonstrates its capacity to achieve simultaneous classification and estimationin a single step. As mentioned in the Abstract, the paper develops two classes ofestimators for panel structure models—penalized profile likelihood (PPL) andpenalized GMM (PGMM). The former is applicable to both linear and non-linear panel models without endogeneity and with or without dynamic struc-tures, while the latter is applicable to linear panel models with endogeneityor dynamic structures. Both broaden the scope of applicability of our method,as early literature only considers linear panels without endogeneity. In eithercase, we show uniform classification consistency in the sense that all individualsbelonging to a certain group can be classified into the same group correctly uni-formly over both individuals and group identities with probability approaching1 (w.p.a.1). Conversely, all individuals that are classified into a certain groupbelong to the same group uniformly over both individuals and group identitiesw.p.a.1. Such a uniform result allows us to establish an oracle property of thePPL estimator that, like the BM K-means estimator, is asymptotically equiva-lent to the corresponding infeasible estimator of the group-specific parameter

Page 6: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2219

that is obtained by knowing all individual group identities. Unfortunately, ourPGMM estimator generally does not have the oracle property. But the uni-form classification consistency property allows us to develop a limit theory forpost-C-Lasso estimators that are obtained by pooling all individuals in an es-timated group to estimate the group-specific parameters, and these estimatorsare asymptotically as efficient as the oracle ones in both the PPL and PGMMcontexts.

Fifth, C-Lasso enables empirical researchers to study panel structures with-out a priori knowledge of the number of groups, without the need to specify anyancillary regression models to model individual group identities, and with noneed to make any distributional assumptions. When the number K0 of groupsis unknown, a BIC-type information criterion is proposed to determine thenumber of groups for both PPL and PGMM estimation and it is shown thatthis procedure selects the correct number of groups consistently.

The rest of the paper is organized as follows. We study C-Lasso PPL esti-mation and inference of panel structure models in Section 2. PGMM estima-tion and inference is addressed in Section 3. Section 4 reports Monte Carlosimulation findings. Section 5 contains two empirical applications. Section 6concludes. Proofs of the main results in the paper are given in Appendices Aand B. Additional materials may be found in the Supplemental Material (Su,Shi, and Phillips (2016)).

For any real matrix A, we write the transpose A′, the Frobenius norm ‖A‖,and the Moore–Penrose inverse as A+. When A is symmetric, we use μmax(A)and μmin(A) to denote the largest and smallest eigenvalues, respectively. Ipand 0p×1 denote the p×p identity matrix and p× 1 vector of zeros, and 1{·} is

the indicator function. The operatorP→ denotes convergence in probability,

D→convergence in distribution, and plim probability limit. We use (N�T)→ ∞ tosignify that N and T pass jointly to infinity.

2. PENALIZED PROFILE LIKELIHOOD ESTIMATION

This section considers panel structure models without endogeneity. It is con-venient to assume first that the number of groups is known and later considerthe determination of the number of unknown groups.

2.1. Panel Structure Models

Given a panel data set {(yit� xit)} for i= 1� � � � �N and t = 1� � � � �T , it is pro-posed to use fixed effects quasi-maximum likelihood to estimate the unknownparameters by solving the minimization problem

(2.1) min{βi�μi}

1NT

N∑i=1

T∑t=1

ψ(wit;βi�μi)�

Page 7: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2220 L. SU, Z. SHI, AND P. C. B. PHILLIPS

Here −ψ(wit;βi�μi) denotes the logarithm of the pseudo-true conditionaldensity function of yit given xit , the history of (yit� xit), and (μi�βi), where μiare scalar individual effects and βi are p× 1 vectors of parameters of interest.Traditionally, econometric work has assumed that the βi are common for allcross-sectional units, leading to a homogeneous panel with individual hetero-geneity modeled through μi alone. At the other extreme, the βi are assumedto differ across individuals and each is estimated at a slow rate without poolingacross section. The present paper allows the true values of βi, denoted β0

i , tofollow a group pattern of the general form

(2.2) β0i =

K0∑k=1

α0k1{i ∈G0

k

}�

Here α0j �= α0

k for any j �= k,⋃K0

k=1G0k = {1�2� � � � �N}, and G0

k ∩G0j = ∅ for any

j �= k. Let Nk = #G0k denote the cardinality of the set G0

k. In the economicgrowth literature (e.g., Phillips and Sul (2007a)), K0 corresponds to the num-ber of convergence clubs and countries (indexed by i) within the same kth thatclub share the same (slope) parameter vector α0

k. In the market entry-exit ex-ample (e.g., Hahn and Moon (2010)), K0 denotes the number of pure Nashequilibria and markets (indexed by i) selecting the same equilibrium over timethat exhibit the same parameter vector.

For now, we assume that the number of groups, K0, is known and fixed butthat each individual’s group membership is unknown. In addition, followingSun (2005), Lin and Ng (2012), and BM, we implicitly assume that individualgroup membership does not vary over time. Let α ≡ (α1� � � � �αK0) and β ≡(β1� � � � �βN). We denote the true values of μi, αk,βi, α, and β asμ0

i , α0k,β0

i , α0,

and β0, respectively. The econometric task is to infer each individual’s groupidentity and to estimate the group-specific parameters α0

k. Some examples ofmodels that fall within this framework and the scope of our methodology areas follows.

EXAMPLE 1—Linear Panel: The linear panel structure model is generatedaccording to

(2.3) yit = β0′i xit +μ0

i + εit�where xit is a p × 1 vector of exogenous or predetermined variables, μi isan individual fixed effect, βi is a p × 1 vector of slope parameters, and εitis the idiosyncratic error term with mean zero. Gaussian quasi-maximum like-lihood estimation (QMLE) of βi and μi is achieved by minimizing (2.1) withψ(wit;βi�μi)= 1

2(yit −β′ixit −μi)2 and wit = (yit� x′

it)′.

EXAMPLE 2—Linear Panel With Quantile Restrictions: Consider the modelin (2.3) with the quantile restriction: P(εit ≤ 0|xit�β0

i �μ0i ) = τ; see, for ex-

ample, Kato, Gavao, and Montes-Rojas (2012). We can estimate βi and μi

Page 8: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2221

by minimizing (2.1) with ψ(wit;βi�μi) = ρτ(yit − β′ixit − μi), where ρτ(u) =

{τ − K(−u/h)}u is a smoothed version of the usual check function with Kbeing a CDF-type kernel function and h a bandwidth parameter.

EXAMPLE 3—Binary Choice Panel: The dynamic binary choice panel datamodel is characterized by yit = 1{β0′

i xit + μ0i − εit ≥ 0}, where xit , μi, and εit

are as defined in Example 1. In this case, −ψ(wit;βi�μi)= yit lnF(yit −β′ixit −

μi) + (1 − yit) ln[1 − F(yit − β′ixit − μi)], where wit = (yit� x

′it)

′, and F(·) de-notes the conditional CDF (standard logistic or normal) of εit given xit and thehistory of (xit� yit).

EXAMPLE 4—Tobit Panel: The Tobit panel is characterized by yit = max(0�β0′i xit + μ0

i + εit), where xit , μi, and εit are defined as in the above examples.For clarity, assume that εit ’s are independent and identically distributed (i.i.d.)N(0�σ2

ε) given xit and the history of (xit� yit). In this case, −ψ(wit;βi�μi�σ2ε)=

1{yit = 0} lnF((yit −β′ixit −μi)/σ2

ε)+ 1{yit > 0} ln[f ((yit −β′ixit −μi)/σ2

ε)/σε],where wit = (yit� x′

it)′, and f and F denote the standard normal PDF and CDF,

respectively. The presence of the common parameter σ2ε can be addressed by

extending the asymptotic analysis below.

2.2. Penalized Profile Likelihood Estimation of α and β

Following Hahn and Newey (2004) and Hahn and Kuersteiner (2011), theprofile log-likelihood function is

(2.4) Q1�NT (β)= 1NT

N∑i=1

T∑t=1

ψ(wit;βi� μi(βi)

)�

where μi(βi)= arg minμi1T

∑T

t=1ψ(wit;βi�μi). Motivated by the literature ongroup Lasso (e.g., Yuan and Lin (2006)), we propose to estimate β and α byminimizing the following PPL criterion function:

(2.5) Q(K0)1NT�λ1

(β�α)=Q1�NT (β)+ λ1

N

N∑i=1

K0∏k=1

‖βi − αk‖�

where λ1 = λ1NT is a tuning parameter. Minimizing the above criterion func-tion produces classifier-Lasso (C-Lasso) estimates β and α of β and α, respec-tively. Let βi and αk denote the ith and kth columns of β and α, respectively,that is, α≡ (α1� � � � � αK) and β≡ (β1� � � � � βN).

The penalty term in (2.5) takes a novel mixed additive-multiplicative formthat does not appear in the literature. Traditional Lasso includes additivepenalty terms to an objective function by differentiating zeros from non-zero-valued parameters to select relevant regressors. In contrast, the C-Lasso hasN

Page 9: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2222 L. SU, Z. SHI, AND P. C. B. PHILLIPS

additive terms, each of which takes a multiplicative form as the product of K0

separate penalties. The multiplicative component is needed because for each i,β0i can take any one of the K0 unknown values, α0

1� � � � �α0K0

. We do not know apriori to which point βi should shrink, and all K0 possibilities must be allowed.Each of the K0 penalty terms in the multiplicative expression permits βi toshrink to a particular unknown group-level parameter vector αk. The summa-tion component is needed because we need to pull information from all Ncross-sectional units in order to identify {β0

i } and {α0k} jointly. Our approach

differs from the prototypical Lasso method of Tibshirani (1996) that shrinks aparameter to zero as well as the group Lasso method of Yuan and Lin (2006)that shrinks a parameter vector to a zero vector. The main purpose in the latterpapers is to select relevant variables, while C-Lasso is designed to determinegroup membership for each individual. As emphasized in the Introduction,both problems enjoy the same motivation of parameter sparsity despite theirdifferent nature. C-Lasso has the additional motivation of classification of un-known parameters into a priori unknown groups each with their own unknownparameters.

Note that the objective function in (2.5) is not convex in β even though itis (conditionally) convex in αk when one fixes αj for j �= k. The SupplementalMaterial provides an iterative algorithm to obtain the estimates α and β�

2.3. Assumptions

Let μi(βi) ≡ arg minμi Ψi(βi�μi), where Ψi(βi�μi) ≡ 1T

∑T

t=1 E[ψ(wit;βi�μi)]. Note that μ0

i = μi(β0i ). Let Ui(wit;βi�μi) ≡ ∂ψ(wit;βi�μi)/∂βi and

Vi(wit;βi�μi) ≡ ∂ψ(wit;βi�μi)/∂μi. Let Uμii and Uμiμi

i denote the first andsecond derivatives of Ui with respect to μi. Define V μi

i , V μiμii , Uβi

i , and V βii

similarly. For notational simplicity, denote Uit ≡ Ui(wit;β0i �μ

0i ), and similarly

for Uμiit , Uμiμi

it , Vit , Vμiit and Uμiμi

it . Define

miU ≡ 1T

T∑t=1

E(Uμiit

)� miV ≡ 1

T

T∑t=1

E(Vμiit

)�

miU2 ≡ 1T

T∑t=1

E(Uμiμiit

)� miV 2 ≡ 1

T

T∑t=1

E(Vμiμiit

)�

Uit ≡Uit − miU

miV

Vit� Uβiit ≡Uβi

it − miU

miV

Vβi ′it � and

Uμiit ≡Uμi

it − miU

miV

Vμiit �

Let ΩiT ≡ 1T

∑T

t=1

∑T

s=1 E(UitU′is), HiT ≡ 1

T

∑T

t=1 E[Uβiit ], and HkNT ≡ 1

Nk×∑

i∈G0kHiT . Define the two expected Hessian matrices for cross-sectional

Page 10: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2223

unit i:

Hiμμ(βi)≡ 1T

T∑t=1

E[Vμii

(wit;βi�μi(βi)

)]and

Hiββ(βi)≡ 1T

T∑t=1

E

[Uβiit (βi)+Uμi

it (βi)∂μi(βi)

∂β′i

]�

where Uβiit (βi) = U

βii (wit;βi�μi(βi)), and similarly for Uμi

it (βi). Let mini de-note min1≤i≤N , and similarly for maxi. We make the following assumptions:

ASSUMPTION A1: (i) For each i, {wit : t = 1�2� � � �} is stationary strong mixingwith mixing coefficients αi(·). α(·) ≡ maxi αi(·) satisfies α(τ) ≤ cαρ

τ for somecα > 0 and ρ ∈ (0�1). {wit : t = 1�2� � � �} are independent across i.

(ii) For each η > 0, mini[inf(βi�μi):‖(βi�μi)−(β0i �μ

0i )‖>η Ψi(βi�μi) − Ψi(β

0i �

μ0i )]> 0.(iii) Let Θ denote the parameter space for θi = (β′

i�μi)′. Θis a compact and

convex subset of Rp+1 such that θ0i = (β0′

i �μ0i )

′ lies in the interior of Θ for each i.(iv) Let |v| ≡ ∑p+1

j=1 vj and Dvψ(wit;θ) ≡ ∂|v|ψ(wit;θ)/(∂θ(1) · · ·∂θ(p+1)),where v = (v1� � � � � vp+1) is a vector of nonnegative integers and θ(j) denotes thejth element of θ. There exists a function M(·) such that supθ∈Θ ‖Dvψ(wit;θ)‖ ≤M(wit), ‖Dvψ(wit;θ) − Dvψ(wit;θ)‖ ≤ M(wit)‖θ − θ‖ for any θ�θ ∈ Θ and|v| ≤ 3, and maxiE|M(wit)|q < cM for some cM <∞ and q≥ 6.

(v) There exists a constant cH > 0 such that mini infβ∈BHiμμ(β) ≥ cH andmini μmin(Hiββ(β

0i ))≥ cH .

(vi) There exists a constant cα > 0 such that min1≤k<l≤K0 ‖α0k − α0

l ‖ ≥ cα.(vii) K0 is fixed and Nk/N → τk ∈ (0�1) for each k= 1� � � � �K0 as N → ∞.

ASSUMPTION A2: (i) Tλ21/(lnT)

6+2ν → ∞ and λ1(lnT)ν → 0 for some ν > 0as (N�T)→ ∞.

(ii) N1/2T−1(lnT)9 → 0 and N2T 1−q/2 → c ∈ [0�∞) as (N�T)→ ∞.

ASSUMPTION A3: (i) For each k= 1� � � � �K0, Ωk ≡ lim(Nk�T)→∞ 1Nk

∑i∈G0

kΩiT

exists and Ωk > 0.(ii) For each k= 1� � � � �K0, Hk ≡ lim(Nk�T)→∞ HkNT exists and Hk > 0.

Assumption A1(i) imposes conditions on {wit}, which are commonly as-sumed for dynamic nonlinear panel data models; see, for example, Hahnand Kuersteiner (2011) and Lee and Phillips (2015). With more compli-cated notation, we can relax the stationarity assumption along the time di-mension. Assumption A1(ii) imposes an identification condition for the jointidentification of (βi�μi) for each i. Assumption A1(iii) restricts the param-eter space and it is possible to allow Θ to be i-dependent. Assumption

Page 11: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2224 L. SU, Z. SHI, AND P. C. B. PHILLIPS

A1(iv) specifies the smoothness and moment conditions on ψ or objects as-sociated with it. Assumption A1(v), in conjunction with Assumptions A1(ii)and (iv), implies that mini[infμi :|μi−μi(βi)|>η Ψi(βi�μi)−Ψi(βi�μi(βi))]> 0 andmini[infβi :‖βi−β0

i ‖>η Ψi(βi�μi(βi)) − Ψi(β0i �μi(β

0i ))] > 0. Assumption A1(vi)

specifies that the group-specific parameters are separated from each other,similarly to the separation requirement in Hahn and Moon (2010). Assump-tion A1(vii) implies that each group has an asymptotically nonnegligible mem-bership number of individuals as N → ∞. This assumption can also be relaxedat the cost of more lengthy arguments. Assumption A2(i) imposes conditionson λ1, all of which hold if

(2.6) λ1 ∝ T−a for any a ∈ (0�1/2)�

Assumption A2(ii) is needed to ensure some higher-order terms are asymp-totically negligible. Assumption A3 is used to derive the asymptotic bias andvariance of the C-Lasso estimator. The theory developed below under theseconditions does not require correct specification of the likelihood function andthe C-Lasso asymptotics apply under the general QMLE setup.

2.4. Asymptotic Properties of the PPL C-Lasso Estimators

2.4.1. Preliminary Rates of Convergence for Coefficient Estimates

The following theorem establishes the consistency of the PPL estimates {βi}and {αk}.

THEOREM 2.1: Suppose that Assumption A1 holds and λ1 = o(1). Then(i) βi −β0

i =OP(T−1/2 + λ1) for i= 1�2� � � � �N ,(ii) 1

N

∑N

i=1 ‖βi −β0i ‖2 =OP(T−1), and

(iii) (α(1)� � � � � α(K0))− (α01� � � � �α

0K0)=OP(T−1/2), where (α(1)� � � � � α(K0)) is a

suitable permutation of (α1� � � � � αK0).

REMARK 1: Theorem 2.1(i)–(ii) establishes the pointwise and mean squareconvergence of βi. Theorem 2.1(iii) indicates that the group-specific parame-ters α0

1� � � � �α0K can be estimated consistently by α1� � � � � αK0 subject to permu-

tation. As expected and consonant with other Lasso limit theory, the pointwiseconvergence rate of βi depends on the rate at which the tuning parameter λ1

converges to zero. Somewhat unexpectedly, this requirement is not the caseeither for mean square convergence of βi or convergence of αk. For notationalsimplicity, hereafter we simply write αk for α(k) as the consistent estimator ofα0k, and define

(2.7) Gk = {i ∈ {1�2� � � � �N} : βi = αk

}for k= 1� � � � �K0�

Page 12: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2225

2.4.2. Classification Consistency

Roughly speaking, a classification method is consistent if it classifies each in-dividual to the correct group w.p.a.1. For a rigorous statement of this property,we define

(2.8) EkNT�i ≡{i /∈ Gk|i ∈G0

k

}and FkNT�i ≡

{i /∈G0

k|i ∈ Gk

}�

where i = 1� � � � �N and k = 1� � � � �K0. Let EkNT = ⋃i∈G0

kEkNT�i and FkNT =⋃

i∈Gk FkNT�i. EkNT and FkNT mimic Type I and II errors in statistical tests: EkNTdenotes the error event of not classifying an element of G0

k into the estimatedgroup Gk; and FkNT denotes the error event of classifying an element that doesnot belong to G0

k into the estimated group Gk. Both types of errors must becontrolled. We use the following definition.

DEFINITION 1—Consistent Classification: The classification is individuallyconsistent if P(EkNT�i)→ 0 as (N�T)→ ∞ ∀i ∈ G0

k and k ∈ {1� � � � �K0}, andP(FkNT�i)→ 0 as (N�T)→ ∞ ∀i ∈ Gk and k ∈ {1� � � � �K0}. It is uniformly con-sistent if P(

⋃K0k=1 EkNT )→ 0 and P(

⋃K0k=1 FkNT )→ 0 as (N�T)→ ∞.

The following theorem establishes uniform consistency for the PPL classi-fier.

THEOREM 2.2: Suppose that Assumptions A1–A2 hold. Then(i) P(

⋃K0k=1 EkNT )≤∑K0

k=1P(EkNT )→ 0 as (N�T)→ ∞, and(ii) P(

⋃K0k=1 FkNT )≤∑K0

k=1P(FkNT )→ 0 as (N�T)→ ∞.

REMARK 2: Theorem 2.2 implies that all individuals within a group, sayG0k, can be simultaneously correctly classified into the same group (denoted

Gk) w.p.a.1. Conversely, all individuals that are classified into the same group,say Gk, simultaneously correctly belong to the same group (G0

k) w.p.a.1. LetG0 ≡ {1�2� � � � �N} \ (⋃K0

k=1 Gk) and HiNT ≡ {i ∈ G0}. Theorem 2.2(i) impliesthat P(

⋃1≤i≤N HiNT )≤∑K0

k=1P(EkNT )→ 0. That is, all individuals can be clas-sified into one of the K0 groups w.p.a.1. Nevertheless, when T is not large,a small percentage of individuals could be left unclassified if we stick withthe classification rule in (2.7). To ensure that all individuals are classifiedinto one of the K0 groups in finite samples, we can modify the classifier. Inparticular, we classify i ∈ Gk if βi = αk for some k = 1� � � � �K0, and i ∈ Gl

for some l = 1� � � � �K0 if ‖βi − αl‖ = min{‖βi − α1‖� � � � �‖βi − αK0‖} and∑K0k=1 1{βi = αk} = 0. Since the event

∑K0k=1 1{βi = αk} = 0 occurs w.p.a.1 uni-

formly in i, we can ignore it in large samples in subsequent theoretical analysisand restrict our attention to the classification rule in (2.7) to avoid confusion.

Page 13: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2226 L. SU, Z. SHI, AND P. C. B. PHILLIPS

Let Nk ≡ ∑N

i=1 1{i ∈ Gk}. The following corollary studies the consistencyof Nk.

COROLLARY 2.3: Suppose that Assumptions A1–A2 hold. Then Nk − Nk =oP(1) for k= 1� � � � �K0.

2.4.3. The Oracle Property and Asymptotic Properties of Post-Lasso Estimators

The following theorem reports the oracle property of the Lasso estima-tor {αk}.

THEOREM 2.4: Suppose Assumptions A1–A3 hold. Then√NkT(αk − α0

k)−H

−1kNTBkNT

D→ N(0�H−1k Ωk(H

−1k )

′), where BkNT = B1kNT − B2kNT , B1kNT =1√NkT

3

∑i∈G0

km−1iV

∑T

s=1

∑T

t=1 VisUμiit , and B2kNT = 1

2√NkT

∑i∈G0

km−2iV (miU2 −

miV 2miVmiU)(

1√T

∑T

t=1 Vit)2 for k= 1� � � � �K0.

REMARK 3: BkNT is written as the difference between two terms that are de-rived from the first- and second-order Taylor expansions of the PPL estimatingequation, respectively. Comparing the above result with HK, we find that thequantities Ωk, Hk, and Bk coincide with the corresponding terms in HK; seethe remark after Lemma S1.12 for details. Then we can use the formula in HKto estimate the asymptotic bias and variance with obvious modifications. Alter-natively, we can use the jackknife to correct bias; see Hahn and Newey (2004)and Dhaene and Jachmans (2015) for static and dynamic models, respectively.

If group membership is known, the oracle estimator of αk is given byαG0

k≡ arg minαk

1NkT

∑i∈G0

k

∑T

t=1ψ(wit;αk� μi(αk)). Then following our asymp-totic analysis or that of HK, we can readily show that

√NkT(αG0

k− α0

k) −H

−1kNTBkNT

D→N(0�H−1k Ωk(H

−1k )

′) under Assumptions A1 and A3. Theorem 2.4indicates that the PPL estimator αk achieves the same limit distribution asthis oracle estimator. In this sense, we say that the PPL estimators {αk} en-joy the asymptotic oracle property. In addition, given the estimated groupsGk, we can obtain the post-Lasso estimator of αk by αGk ≡ arg minαk

1NkT

×∑i∈Gk

∑T

t=1ψ(wit;αk� μi(αk)). The following theorem reports the asymptoticdistribution of αGk .

THEOREM 2.5: Suppose Assumptions A1–A3 hold. Then√NkT(αGk −α0

k)−H

−1kNTBkNT

D→N(0�H−1k Ωk(H

−1k )

′) for k= 1� � � � �K0, where BkNT is as defined inTheorem 2.4.

Page 14: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2227

REMARK 4: Theorems 2.4 and 2.5 indicate that αk and αGk are asymptoti-cally equivalent. In a totally different framework, Belloni and Chernozhukov(2013) studied post-Lasso estimators which apply OLS to the model selectedby first-step penalized estimators and showed that the post-Lasso estimatorsperform at least as well as Lasso in terms of rate of convergence and have theadvantage of smaller bias. Correspondingly, it would be interesting to comparethe higher-order asymptotic properties of αk and αGk in future work.

REMARK 5: Note that our asymptotic results are “pointwise” in the sensethat the unknown parameters are treated as fixed. The implication is that infinite samples, the distributions of our estimators can be quite different fromnormal, as discussed in Leeb and Pötscher (2008, 2009). This is a well-knownchallenge for shrinkage estimators. Despite its importance, developing a thor-ough theory on uniform inference in this context is beyond the scope of thepresent work.

2.5. Determination of the Number of Groups

In practice, the exact number of groups is typically unknown. We assumethat K0 is bounded from above by a finite integer Kmax and study the de-termination of the number of groups via some information criterion (IC).By minimizing (2.5) with K0 replaced by K, we obtain the C-Lasso esti-mates {βi(K�λ1)� αk(K�λ1)} of {βi�αk}, where we make the dependenceof βi and αk on (K�λ1) explicit. As above, we classify individual i intogroup Gk(K�λ1) if and only if βi(K�λ1) = αk(K�λ1), that is, Gk(K�λ1) ≡{i ∈ {1�2� � � � �N} : βi(K�λ1) = αk(K�λ1)} for k = 1� � � � �K. Let G(K�λ1) ≡{G1(K�λ1)� � � � � GK(K�λ1)}. The post-Lasso estimator of α0

k is denoted asαGk(K�λ1)

. We propose to select K to minimize

IC1(K�λ1)≡ 2NT

K∑k=1

∑i∈Gk(K�λ1)

T∑t=1

ψ(wit; αGk(K�λ1)

� μi(αGk(K�λ1)))

(2.9)

+ ρ1NTpK�

where ρ1NT is a tuning parameter. Let K(λ1)≡ arg min1≤K≤Kmax IC1(K�λ1). SeeWang, Li, and Tsai (2007), Liao (2013), and Lu and Su (2016) for the use of asimilar IC in various contexts.

Let G(K) ≡ (GK�1� � � � �GK�K) be any K-partition of {1�2� � � � �N} and GK acollection of all such partitions. Let σ2

G(K)≡ 2

NT

∑K

k=1

∑i∈GK�k

∑T

t=1ψ(wit; αGK�k�μi(αGK�k)), where αGK�k ≡ arg minαk

1NkT

∑i∈GK�k

∑T

t=1ψ(wit;αk� μi(αk)). Weadd the following two assumptions.

Page 15: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2228 L. SU, Z. SHI, AND P. C. B. PHILLIPS

ASSUMPTION A4: As (N�T) → ∞, min1≤K<K0 infG(K)∈GK σ2G(K)

P→ σ2 > σ20 ,

where σ20 ≡ lim(N�T)→∞ 2

NT

∑K0k=1

∑i∈G0

k

∑T

t=1 E[ψ(wit;α0k�μ

0i )].

ASSUMPTION A5: As (N�T)→ ∞, ρ1NT → 0 and ρ1NTT → ∞.

Assumption A4 is intuitively clear and applies under primitive conditions ina variety of models, such as panel autoregressions. It requires that all under-fitted models yield asymptotic mean square errors that are larger than σ2

0 ,which is delivered by the true model. Assumption A5 reflects the usual condi-tions for the consistency of model selection: ρ1NT cannot shrink to zero eithertoo fast or too slowly.

The following theorem justifies the use of (2.9) as a selector criterion for K.

THEOREM 2.6: Suppose Assumptions A1–A5 hold. Then P(K(λ1)=K0)→ 1as (N�T)→ ∞.

REMARK 6: As Theorem 2.6 indicates, as long as λ1 satisfies Assump-tion A2(i), we can ensure that the correct number of groups is chosen w.p.a.1.In practice, we can fine-tune this parameter over a finite set, for example,Λ1 ≡ {λ1 = cjT

−1/3� cj = c0γj for j = 1� � � � � J} for some c0 > 0 and γ > 1. That

is, we pick up λ1 ∈Λ1 such that IC1(K(λ1)�λ1) is minimized. We can show thatwith such a choice of λ1, Theorem 2.6 continues to hold. Alternatively, we canconsider a data-driven cross-validation procedure.

2.6. The Special Case of Linear Models

For the linear model in (2.3) with E(εit |xit�μ0i )= 0, we have ψ(wit;βi�μi)=

12(yit − β′

ixit − μi)2, μi(βi) = yi· − β′

ixi·, and Q1�NT (β) = 1NT

∑N

i=1

∑T

t=1(yit −β′ixit)

2, where yi· = 1T

∑T

t=1 yit , yit = yit − yi·, and xi· and xit are analogouslydefined. So the PPL problem becomes the penalized least squares (PLS)problem considered in Su, Shi, and Phillips (2014, SSP hereafter). In addi-tion, we can verify that μi(βi) = E(yi·) − β′

iE(xi·), Ui(wit;βi�μi) = −(yit −β′ixit − μi)xit , Vi(wit;βi�μi) = −(yit − β′

ixit − μi), Uμii (wit;βi�μi) = xit =

Vβii (wit;βi�μi), V μi

i (wit;βi�μi) = 1, Uβii (wit;βi�μi) = xitx

′it , Uit = −εit[xit −

E(xi·)], Uβiit = [xit − E(xi·)]x′

it , Uμiit = xit − E(xi·), Hiμμ(βi) = 1, Hiββ(βi) =

1T

∑T

t=1 E{[(xit−E(xi·)][xit−E(xi·)]′},

ΩiT = 1T

T∑t=1

T∑s=1

E{εitεis

[xit −E(xi·)

][xis −E(xi·)

]′}� and

HiT = 1T

T∑t=1

E{[xit −E(xi·)

][xit −E(xi·)

]′}�

Page 16: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2229

With the above calculations, we can readily verify that Assumptions A1(ii),A1(iv)–(v), and A3 hold under weak conditions. In addition, we can showthat

B1kNT = −1√NkT 3

∑i∈G0

k

T∑t=1

T∑s=1

εit[xis −E(xi·)

]= B1k + oP(1) and

B2kNT = 0�

where B1k = −1√NkT

3

∑i∈G0

k

∑T

t=1

∑T

s=1 E(εitxis). When xit is strictly exogenous so

that E(εitxis)= 0 for all t, s, and i, B1kNT = oP(1) and there is no need to makebias correction. When xit is predetermined, various bias correction formulaehave been proposed; see Kiviet (1995), Hahn and Kuersteiner (2002), Phillipsand Sul (2007b), and Lee (2012), among others. Jackknife methods can alsobe applied to correct for bias.

The post-Lasso and oracle estimators of α0k become αGk =

(∑

i∈Gk∑T

t=1 xit x′it)

−1∑

i∈Gk∑T

t=1 xit yit and αG0k

= (∑

i∈G0k

∑T

t=1 xit x′it)

−1 ×∑i∈G0

k

∑T

t=1 xit yit . The IC formula in (2.9) now reduces to IC1(K�λ1) =σ2G(K�λ1)

+ρ1NTpK, where σ2G(K�λ1)

= 1NT

∑K

k=1

∑i∈Gk(K�λ1)

∑T

t=1(yit−α′Gk(K�λ1)

xit)2

with αGk(K�λ1)being analogously defined as αGk . In practice, σ2

G(K�λ1)is fre-

quently replaced by its natural logarithm as in standard BIC to obtain

(2.10) IC1(K�λ1)= ln[σ2G(K�λ1)

]+ ρ1NTpK�

which will be used in our simulations and applications. But because thefixed effects are eliminated in the within-group transformed model, the

√T -

convergence rates of their estimates will not play a role to ensure the selectionconsistency of IC1. SSP showed that the requirement on ρ1NT can be relaxedwith Assumption A5 replaced by the following:

ASSUMPTION A5∗: As (N�T) → ∞, ρ1NT → 0 and ρ1NTδ2NT → ∞ where

δNT =N1/2T 1/2 if xit is strictly exogenous and min(N1/2T 1/2�T ) otherwise.

2.7. Extension to the Mixed Panel Structure Models

In some applications, certain parameters of interest may be common acrossall individuals whereas others are group-specific. For instance, Pesaran, Shin,and Smith (1999) constrained the long-run coefficients to be identical acrossindividuals while assuming the short-run coefficients to be heterogeneous, orin our case, group-specific. Example 4 above is another instance. To keep

Page 17: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2230 L. SU, Z. SHI, AND P. C. B. PHILLIPS

up with the early notation, we write the negative log-likelihood function asψ(wit;βi�γ�μi), where γ is the common parameter and the βi have a groupstructure as before. The negative profile log-likelihood function now be-comes Q1�NT (β�γ) = 1

NT

∑N

i=1

∑T

t=1ψ(wit;βi�γ� μi(βi�γ)), where μi(βi�γ) =arg minμi

1T

∑T

t=1ψ(wit;βi�γ�μi). Then we can estimate β and α by minimiz-ing the following PPL criterion function:

(2.11) Q(K0)1NT�λ1

(β�α�γ)=Q1�NT (β�γ)+ λ1

N

N∑i=1

K0∏k=1

‖βi − αk‖�

Our previous analysis can be followed to establish uniform consistency for theclassifier and the oracle property for the resulting estimators of the group-specific parameters αk and the common parameter γ.

When we have time effects {γt}, we generally cannot eliminate them throughtransformation even in a linear panel structure model because of the slopeheterogeneity. In this case, we need to estimate γ = (γ1� � � � � γT )

′ jointly withβ and α in (2.11). A formal asymptotic analysis of this case is left for futurework.

3. PENALIZED GMM ESTIMATION OF PANEL STRUCTURE MODELS

This section considers penalized GMM estimation of linear panel struc-ture models when some regressors are lagged dependent variables or endoge-nous.

3.1. Penalized GMM Estimation of α and β

To stay focused, we restrict attention to the linear panel structure model in(2.3).2 We consider the first-differenced system

(3.1) �yit = β0′i �xit +�εit�

where, for example, �yit = yit − yi�t−1 for t = 1� � � � �T and i= 1� � � � �N , and weassume that yi0 and xi0 are observed. Let zit be a d × 1 vector of instrumentsfor �xit with d ≥ p. Define �yi = (�yi1� � � � ��yiT )

′, with similar definitions for�xi and �εi. We propose to estimate β and α by minimizing the following

2Extension to general nonlinear panel data models with endogeneity and nonadditive fixedeffects (e.g., Fernández-Val and Lee (2013)) is possible, but rigorous analysis raises additionalstatistical challenges and is left for future research.

Page 18: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2231

penalized GMM (PGMM) criterion function:3

(3.2) Q(K0)2NT�λ2

(β�α)=Q2�NT (β)+ λ2

N

N∑i=1

K0∏k=1

‖βi − αk‖�

where Q2�NT (β) = 1N

∑N

i=1[ 1T

∑T

t=1 zit(�yit − β′i�xit)]′WiNT [ 1

T

∑T

t=1 zit(�yit −β′i�xit)], WiNT is a d × d symmetric matrix that is asymptotically nonsingular,

and λ2 = λ2NT is a tuning parameter. Minimizing (3.2) produces the PGMMestimates α and β, where α≡ (α1� � � � � αK0) and β≡ (β1� � � � � βN).

3.2. Basic Assumptions

Let Qi�z�x ≡ 1T

∑T

t=1 zit(�xit)′ and Qi�z�x ≡ E[Qi�z�x]. Let ξit ≡ (�yit�

(�xit)′� z′

it)′, ρ(ξit�β) ≡ zit(�yit − β′�xit), and ρi�T (β) ≡ 1√

T

∑T

t=1{ρ(ξit�β) −E[ρ(ξit�β)]}. Let Bi denote the parameter space for βi. We make the follow-ing assumptions.

ASSUMPTION B1: (i) E[ρ(ξit�β0i )] = 0 for each i= 1� � � � �Nand t = 1� � � � �T .

(ii) supβ∈Bi ‖ρi�T (β)‖ = OP(1), 1N

∑N

i=1 ‖ρi�T (βi)‖2 = OP(1), where βi ∈ Bi,and P(maxi ‖ρi�T (βi)‖ ≥C(lnT)3+ν)= o(N−1) for any C > 0 and ν > 0.

(iii) P(maxi ‖Qi�z�x − Qi�z�x‖ ≥ η) = o(N−1) for any η > 0 andlim inf(N�T)→∞ mini μmin(Q

′i�z�xQi�z�x)= c2

Q> 0.

(iv) There exist nonrandom matricesWi such that P(maxi ‖WiNT −Wi‖ ≥ η)=o(N−1) for any η> 0 and lim infN→∞ mini μmin(Wi)= cW > 0.

(v) There exists a constant cα > 0 such that min1≤k<l≤K0 ‖α0k − α0

l ‖ ≥ cα.(vi) K0 is fixed and Nk/N → τk ∈ (0�1) for each k= 1� � � � �K0 as N → ∞.

ASSUMPTION B2: (i) Tλ22/(lnT)

6+2ν → ∞ and λ2(lnT)ν → 0 for some ν > 0as (N�T)→ ∞.

(ii) For any given c > 0, Nmaxi P(‖T−1∑T

t=1 zit�εit‖ ≥ cλ2) → 0 as(N�T)→ ∞.

ASSUMPTION B3: (i) For each k= 1� � � � �K0,Ak ≡ 1Nk

∑i∈G0

kQ

′i�z�xWiQi�z�x →

Ak > 0 as (N�T)→ ∞.

3We were unable to establish asymptotic theory for the case where the criterionQ2�NT (β) is replaced by the fully pooled criterion Q2�NT (β) = [ 1

NT

∑Ni=1

∑Tt=1 zit(�yit −

β′i�xit)]′WNT [ 1

NT

∑Ni=1

∑Tt=1 zit(�yit −β′

i�xit)], whereWNT is asymptotically nonsingular. We alsofound that Arellano and Bond’s (1991) GMM estimation is not applicable to handle unobservedslope heterogeneity. Noticing this, Fernández-Val and Lee (2013) used a criterion similar toQ2�NT (β) in the nonlinear panel setup. As we shall see, the use of Q2�NT (β) means that thePGMM estimator generally does not have the oracle property.

Page 19: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2232 L. SU, Z. SHI, AND P. C. B. PHILLIPS

(ii) For each k = 1� � � � �K0, 1√NkT

∑i∈G0

kQ′i�z�xWiNT

∑T

t=1 zit�εit − BkNTD→

N(0�Ck) as (N�T)→ ∞.

Assumption B1(i) specifies moment conditions to identify β0i . Assump-

tion B1(ii) is a high-level condition. Its first part can be verified by apply-ing Donsker’s theorem. For example, if there exists Fit , a σ-field, such that{ξit�Fit} is a stationary ergodic adapted mixingale with size −1 (e.g., White,(2001, pp. 124–125)), and Var(ω′ρi�T (βi)) → ω′Σiω ∈ (0�∞) as T → ∞ for

some Σi > 0 and any nonrandom ω ∈ Rd with ‖ω‖ = 1, then ρi�T (βi)

D→N(0�Σi) and the first part of Assumption B1(ii) follows. The second and thirdparts of Assumption B1(ii) can be verified by the Markov inequality and the ap-plication of Lemma S1.2(iii) in the Supplemental Material under strong mix-ing conditions. Assumption B1(iii) provides a rank condition to identify β0

i .Assumption B1(iv) is automatically satisfied for WiNT = Id , the d × d identitymatrix. Assumptions B1(v)–(vi) and B2(i) parallel Assumptions A1(vi)–(vii)and A2(i). Assumption B2(ii) holds true by Lemma S1.2 in the SupplementalMaterial if {(zit��εit)� t ≥ 1} is strong mixing with geometric decay rate andzit�εit has six plus moments.

Assumption B3(i)–(ii) can be verified under various primitive conditions.For example, if (a) E‖zit(�xit)′‖2+σ > 0 for some σ > 0, (b) {(�xit� zit��εit)�t ≥ 1} is strong mixing for each i with mixing coefficients αi(τ) that sat-isfy 1

Nk

∑i∈G0

k

∑∞τ=1 αi(τ)

(2+σ)/σ < ∞, (c) {(�xit� zit)} is stationary along thetime dimension and i.i.d. along the individual dimension for all i ∈ G0

k,and (d) Wi = W ∀i ∈ G0

k, then Assumption B3(i) is satisfied with Ak ={E[zit(�xit)′]}′W E[zit(�xit)′] ∀i ∈G0

k. To verify Assumption B3(ii), for simplic-ity we assume that WiNT = Id and make the following decomposition:

1√NkT

∑i∈G0

k

Q′i�z�x

T∑t=1

zit�εit(3.3)

= 1

N1/2k T 3/2

∑i∈G0

k

T∑s=1

T∑t=1

E(�xisz

′iszit�εit

)

+ 1

N1/2k T 3/2

∑i∈G0

k

T∑s=1

T∑t=1

E(�xisz

′is

)zit�εit

+ 1

N1/2k T 3/2

∑i∈G0

k

T∑s=1

T∑t=1

{[�xisz

′is −E

(�xisz

′is

)]zit�εit

−E(�xisz

′iszit�εit

)}≡ BkNT + VkNT +RkNT � say�

Page 20: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2233

where BkNT and VkNT contribute to the asymptotic bias and variance, re-spectively, and RkNT is a term that is asymptotically negligible under suit-able conditions. Then Assumption B3(ii) is satisfied with WiNT = Id if VkNT =

1N

1/2kT 1/2

∑i∈G0

k

∑T

t=1Q′i�z�xzit�εit

D→ N(0�Ck) and RkNT = oP(1), both of which

can be verified by strengthening the conditions given in (a)–(c) above. Notethat A

−1

k BkNT signifies the asymptotic bias of αk, which may not vanish asymp-totically but can be corrected; see Section S2.2 in the Supplemental Material.4

3.3. Asymptotic Properties of the PGMM Estimators

3.3.1. Preliminary Rates of Convergence

We first establish the preliminary consistency rate of (β, α).

THEOREM 3.1: Suppose Assumption B1 holds and λ2 = o(1). Then(i) βi −β0

i =OP(T−1/2 + λ2) for i= 1� � � � �N ,(ii) 1

N

∑N

i=1 ‖βi −β0i ‖2 =OP(T−1), and

(iii) (α(1)� � � � � α(K0))− (α01� � � � �α

0K0)=OP(T−1/2), where (α(1)� � � � � α(K0)) is a

suitable permutation of (α1� � � � � αK0).

REMARK 7: Remark 1 applies here with obvious modifications. As before,hereafter we simply write αk for α(k) as the consistent estimator of α0

k, anddefine Gk ≡ {i ∈ {1�2� � � � �N} : βi = αk} for k= 1� � � � �K0.

3.3.2. Classification Consistency

Let EkNT�i ≡ {i /∈ Gk|i ∈ G0k} and FkNT�i ≡ {i /∈ G0

k|i ∈ Gk} for i = 1� � � � �Nand k = 1� � � � �K0. Let EkNT ≡ ⋃

i∈G0kEkNT�i and FkNT ≡ ⋃

i∈Gk FkNT�i. We es-tablish uniform classification consistency in the next theorem.

THEOREM 3.2: Suppose that Assumptions B1–B2 hold. Then(i) P(

⋃K0k=1 EkNT )≤∑K0

k=1P(EkNT )→ 0 as (N�T)→ ∞, and(ii) P(

⋃K0k=1 FkNT )≤∑K0

k=1P(FkNT )→ 0 as (N�T)→ ∞.

REMARK 8: Remark 2 also holds for the above theorem with obvious mod-ifications. Let G0 ≡ {1�2� � � � �N} \ (⋃K0

k=1 Gk) and HiNT = {i ∈ G0}. Theo-rem 3.2(i) implies that P(

⋃1≤i≤N HiNT )≤∑K0

k=1P(EkNT )→ 0, meaning that allindividuals are classified into one of the K0 groups w.p.a.1.

4If conditions (a)–(b) are satisfied and E‖zit�εit‖2+σ > 0, by the Davydov inequality, we have‖BkNT‖ ≤ 1

T√NkT

∑i∈G0

k

∑Tt=1

∑Ts=1 ‖E[�xisz′

iszit�εit ]‖ = O((N/T)1/2), which is o(1) if T � N

and usually asymptotically nonnegligible otherwise.

Page 21: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2234 L. SU, Z. SHI, AND P. C. B. PHILLIPS

Let Nk ≡∑N

i=1 1{i ∈ Gk}. The following corollary parallels Corollary 2.3.

COROLLARY 3.3: Suppose that Assumptions B1–B2 hold. Then Nk − Nk =oP(1).

3.3.3. Improved Convergence and Asymptotic Properties of Post-Lasso

The following theorem establishes the asymptotic distribution of the C-Lasso estimators {αk}.

THEOREM 3.4: Suppose Assumptions B1–B3 hold. Then√NkT(αk − α0

k)−A

−1

k BkNTD→N(0, A−1

k CkA−1k ) for k= 1� � � � �K0.

REMARK 9: In contrast to the PPL case, the PGMM estimators {αk} mayfail to possess the oracle property. If the group identities were known in ad-vance, one could obtain the GMM estimate αG0

kof α0

k by minimizing the fol-lowing objective function:

QNT (αk)=[

1NkT

∑i∈G0

k

T∑t=1

zit(�yit − α′

k�xit)]′

(3.4)

×W (k)NT

[1

NkT

∑i∈G0

k

T∑t=1

zit(�yit − α′

k�xit)]�

where W (k)NT is a d × d symmetric positive definite matrix. Let Q(k)

z�x�NT =1

NkT

∑i∈G0

k

∑T

t=1 zit(�xit)′ and Q(k)

z�y�NT = 1NT

∑i∈G0

k

∑T

t=1 zit�yit . Then αG0k

=[Q(k)′

z�x�NTW(k)NT Q

(k)z�x�NT ]−1Q(k)′

z�x�NTW(k)NT Q

(k)z�y�NT . We can readily show that the

asymptotic distribution of αG0k

is typically different from that of αk. See alsothe remark after Theorem 3.5 below.

When the individuals have group identities that are unknown, we canreplace G0

k by its C-Lasso estimate Gk in the GMM objective function(3.4) and obtain the post-Lasso GMM estimator of α0

k given by αGk =[Q(k)′

z�xW(k)NT Q

(k)z�x]−1Q(k)′

z�xW(k)NT Q

(k)z�y , where Q(k)

z�x = 1NkT

∑i∈Gk

∑T

t=1 zit(�xit)′ and

Q(k)z�y = 1

NT

∑i∈Gk

∑T

t=1 zit�yit . To study the asymptotic normality of αGk , we addthe following assumption.

ASSUMPTION B4: (i) For each k = 1� � � � �K0, W (k)NT

P→ W (k) > 0 as(N�T)→ ∞.

Page 22: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2235

(ii) Q(k)z�x�NT

P→Q(k)z�x, where Q(k)

z�xhas rank p.

(iii) 1√NkT

∑i∈G0

k

∑T

t=1 zit�εitD→N(0� Vk).

Assumption B4 is standard in GMM estimation and it can be verified undervarious primitive conditions that allow for both conditional heteroscedastic-ity and serial correlation in {zit�εit}. The following theorem establishes theasymptotic normality of {αGk}.

THEOREM 3.5: Suppose Assumptions B1–B4 hold. Then√NkT(αGk −α0

k)D→

N(0�Ωk), where Ωk = [Q(k)′z�xW

(k)Q(k)z�x]−1Q(k)′

z�xW(k)VkW

(k)Q(k)z�x[Q(k)′

z�xW(k)Q(k)

z�x]−1

and k= 1� � � � �K0.

REMARK 10: To prove the above theorem, we first apply Theorem 3.2 andshow that

√NkT(αGk −α0

k)= √NkT(αG0

k−α0

k)+oP(1). That is, the post-LassoGMM estimator αGk is asymptotically equivalent to the oracle estimator αG0

k.

To obtain the most efficient estimator among the class of GMM estimatorsbased on the moment conditions specified in Assumption B1(i), one can setW (k)NT to be a consistent estimator of V −1

k . Alternatively, we can consider Arel-lano and Bond’s (1991) GMM estimation based on the estimated groups. Theprocedure is standard and details are omitted.

REMARK 11: If WiNT = W (k)NT , Qi�z�x = Q(k)

z�x for each i ∈ G0k in Assump-

tion B3(i) (which is unrealistic before knowing the group identity), and BkNT =0 in Assumption B3(ii), then Ak =Q(k)′

z�xW(k)Q(k)

z�x, Ck =Q(k)′z�xW

(k)ΩkW(k)Q(k)

z�x,

and√NkT(αk − α0

k)D→ N(0�Ωk). Thus, in this special case, the C-Lasso es-

timator αk has the oracle property. But BkNT = 0 typically requires T �N , acondition that we do not usually want to impose. For this reason, we recom-mend the post-Lasso estimator αGk in practice.

3.4. Determination of the Number of Groups

When K0 is unknown, we minimize the PGMM criterion function in (3.2)with K0 replaced by K to obtain the C-Lasso estimates {βi(K�λ2)� αk(K�λ2)}of {βi�αk}. As above, we classify individual i into group Gk(K�λ2) if and only ifβi(K�λ2)= αk(K�λ2). Let G(K�λ2)≡ {G1(K�λ2)� � � � � GK(K�λ2)}. The post-Lasso GMM estimate of α0

k is given by

αGk(K�λ2)≡ [Q(K�k)′z�x W (k)

NT Q(K�k)z�x

]+Q(K�k)′z�x W (k)

NT Q(K�k)z�y �

Page 23: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2236 L. SU, Z. SHI, AND P. C. B. PHILLIPS

where

Q(K�k)z�x = 1

NkT

∑i∈Gk(K�λ2)

T∑t=1

zit(�xit)′�

Q(K�k)z�y = 1

NkT

∑i∈Gk(K�λ2)

T∑t=1

zit�yit�

and W (k)NT is defined as before but with k= 1�2� � � � �K. Let

σ2G(K�λ2)

= 1NT

K∑k=1

∑i∈Gk(K�λ2)

T∑t=1

[�yit − α′

Gk(K�λ1)�xit

]2�

We propose to select K to minimize the following IC:

(3.5) IC2(K�λ2)= ln[σ2G(K�λ2)

]+ ρ2NTpK�

where ρ2NT is a tuning parameter. Let K(λ2)≡ arg min1≤K≤Kmax IC2(K�λ2). Asbefore, for any G(K) = (GK�1� � � � �GK�K) ∈ GK , define

σ2G(K)

= 1NT

K∑k=1

∑i∈GK�k

T∑t=1

[�yit − α′

GK�k�xit

]2�

where αGK�k is analogously defined as αGk(K�λ2)with Gk(K�λ2) being replaced

by GK�k.To proceed, we add the following two assumptions, which parallel earlier

Assumptions A4–A5.

ASSUMPTION B5: As (N�T)→ ∞, min1≤K<K0 infG(K)∈GK σ2G(K)

P→ σ2�ε > σ

2�ε,

where σ2�ε = plim(N�T)→∞

1NT

∑N

i=1

∑T

t=1(�εit)2.

ASSUMPTION B6: As (N�T)→ ∞, ρ2NT → 0 and ρ2NTNT → ∞.

The following theorem proves consistency of K(λ2).

THEOREM 3.6: Suppose Assumptions B1–B2 and B4–B6 hold. ThenP(K(λ2)=K0)→ 1 as (N�T)→ ∞.

Page 24: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2237

4. SIMULATIONS

To evaluate the finite-sample performance of the classification and estima-tion procedure, we consider three data generating processes (DGPs) that coverlinear and nonlinear panels of static and dynamic models. The observations ineach DGP are drawn from three groups with the proportion N1 : N2 : N3 =0�3 : 0�3 : 0�4. We use sample sizes N = 100�200 and time spans T = 15�25�50.Throughout these DGPs, the fixed effect μ0

i and the idiosyncratic error εit arestandard normal, independent across i and t, and mutually independent. εit isalso independent of all regressors.

DGP 1 (Linear static panel). The observations (yit� xit) are generated fromthe linear panel structure model as in Example 1. The exogenous regressorxit = (0�2μ0

i + eit1�0�2μ0i + eit2)

′, where eit1� eit2 ∼ i�i�d� N(0�1), are mutuallyindependent, and independent of μ0

i . The true coefficients are (0�4�1�6), (1�1),(1�6�0�4) for the three groups, respectively. PLS will be applied in this DGP.

DGP 2 (Linear panel AR(1)). The dependent variable is determined by itslag term and two exogenous regressors xit2 and xit3 in yit = β0

i1yi�t−1 +β0i2xit2 +

β0i3xit3 + μ0

i (1 − β0i1)+ εit . The exogenous variables are standard normal and

mutually independent. For each i, the initial value is specified to guaran-tee that the time series (yi0� � � � � yiT ) is strictly stationary. The true coef-ficients are (0�4�1�6�1�6), (0�6�1�1), and (0�8�0�4�0�4). The AR(1) coeffi-cients represent weak, moderate, and strong persistence, respectively. PGMMwill be used to estimate the first-differenced model with the instruments(yi�t−2� yi�t−3��xit2��xit3). In the Supplementary Material, the same DGP is alsoestimated by PLS.

DGP 3 (Probit panel AR(1)). As in Example 3, the binary dependent vari-able yit = 1{β0

i1yi�t−1 + β0i2xit + β0

i3 + μ0i − εit > 0}. The exogenous regressor

xit = 0�1μ0i + eit , where eit ∼ i�i�d� N(0�1) and is independent of all other

variables. The true coefficients are (1�−1�0�5), (0�5�0�−0�25), and (0�1�0).β0i1 and β0

i2 are identifiable in this model, whereas β0i3 is unidentifiable as it is

absorbed into the individual heterogeneity. PPL will be implemented in thisDGP.

Since both classification consistency and the oracle property hinge on thecorrect number of groups, our first simulation exercise is designed to assesshow well the proposed IC selects the number of groups. Asymptotically, allsequences ρ1NT work if they satisfy Assumption A5 or A5∗, and so do the se-quences ρ2NT if these satisfy Assumption B6. In practice, the choice of ρjNT ,j = 1�2, can be crucial. We experimented with many alternatives, and foundthat ρjNT = 2

3(NT)−1/2, j = 1�2, work fairly well in the linear models and so

does ρ1NT = 14(ln lnT)/T in the Probit model. They are used throughout the

simulations as well as the empirical applications.Regarding the C-Lasso tuning parameter, we specify λj = cλj s

2YT

−1/3 forj = 1�2 in the linear models, where s2

Y is the sample variance of yit for PLSor the sample variance of �yit for PGMM, and cλj ∈ {0�125�0�25�0�5�1�2}

Page 25: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2238 L. SU, Z. SHI, AND P. C. B. PHILLIPS

(j = 1�2), a geometrically increasing sequence. In the Probit model, we alsospecify λ1 = cλ1s

2YT

−1/3, but the constant cλ1 ∈ {0�0125�0�025�0�05�0�1�0�2}, asthis criterion function is at a different magnitude from the linear models. Fol-lowing Remark 6, we pick up from the set of candidate values the λ1 that min-imizes IC1(K(λ1)�λ1) and similarly the λ2 that minimizes IC2(K(λ2)�λ2). Werun 500 replications for each DGP. Table I displays the empirical probabilitythat a particular group size from 1 to 5 is selected according to the IC whenthe true number of groups is 3. In the linear models, the IC achieves almostperfect selection of the true group number when T = 25. In the Probit model,the correct determination rate is also close to 100% when T = 50, although therate is lower than that of the linear models when T = 25, due to the presenceof incidental parameters. These statistics demonstrate the usefulness of the IC.

Next, given the true number of groups, we focus on the classification of indi-vidual units and the point estimation of post-Lasso.5 Due to space limitation,all tabulated results are produced under cλj = 0�5, j = 1�2, for the linear mod-els, and cλ1 = 0�05 for the Probit model. The outcomes are found robust overthe specified range of constants. Column 4 of Table II shows the percentageof correct classification of the N units, calculated as 1

N

∑K0k=1

∑i∈Gk 1{β0

i = α0k},

averaged over the Monte Carlo replications. Columns 5–7 summarize the post-Lasso estimator’s root-mean-squared error (RMSE), bias, and the coverageprobability of the two-sided nominal 95% confidence interval. To save space,we report the results for the first coefficient α1 = (αk1)

K0k=1 in each model. As

α1 is a K0 × 1 vector, we averaged over the statistics by their weight Nk/N ,k= 1� � � � �K0. For example, in DGPs 1 and 3, the coverage probability is com-puted as

∑K0k=1

NkN

1{αGk�1 − 1�96σk1 ≤ α0k1 ≤ αGk�1 + 1�96σk1}, where σk1 is the

estimated standard deviation of αGk�1; in DGP 2, αGk�1 and σk1 replace theircounterparts αGk�1 and σk1, respectively. For comparison purposes, columns8–10 show the corresponding statistics of the oracle estimator αG0

k�1 or αG0

k�1.

The only difference between the oracle estimator and the post-Lasso estima-tor is that the former utilizes the true group identity G0

k, which is infeasible inpractice, while the latter is based on the data-determined group Gk or Gk.

As expected, the correct classification percentage approaches 100% as T in-creases, and the oracle estimator’s RMSE and bias are typically smaller thanthose of post-Lasso. When T = 50, post-Lasso and the oracle perform almostidentically in DGP 1. In DGP 2, the PGMM confidence interval covers the trueparameter slightly more often than the oracle, since the estimated standard de-viation is inflated by a few misclassified units, which hide as outliers against themajority of the group members. The same reason explains the mild discrep-ancy of RMSE in DGPs 2 and 3. However, the confidence interval in DGP 3

5Here we report the results for post-Lasso under K0. In Table SIII of the Supplemental Ma-terial, we also compare the RMSE and bias of post-Lasso and C-Lasso under K0 and the IC-determined group number K or K.

Page 26: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDE

NT

IFY

ING

LA

TE

NT

STR

UC

TU

RE

SIN

PAN

EL

DA

TA2239

TABLE I

FREQUENCY OF SELECTING K = 1� � � � �5 GROUPS WHEN K0 = 3

DGP 1 DGP 2 DGP 3

N T 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

100 15 0 0 0.994 0.004 0.002 0 0.232 0.762 0.004 0.002100 25 0 0 1 0 0 0 0.016 0.984 0 0 0 0.096 0.646 0.242 0.016100 50 0 0 1 0 0 0 0 1 0 0 0 0 0.986 0.014 0200 15 0 0 0.890 0.106 0.004 0 0.022 0.970 0.008 0200 25 0 0 1 0 0 0 0 1 0 0 0 0.106 0.668 0.226 0200 50 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0

Page 27: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2240 L. SU, Z. SHI, AND P. C. B. PHILLIPS

TABLE II

CLASSIFICATION AND POINT ESTIMATION OF α1

Post-Lasso Oracle

N T

% of CorrectClassification RMSE Bias Coverage RMSE Bias Coverage

DGP 1 100 15 0.8935 0.0594 0�0105 0.8758 0.0463 0�0012 0.9336100 25 0.9674 0.0384 0�0018 0.9344 0.0353 0�0001 0.9362100 50 0.9964 0.0249 0�0000 0.9528 0.0245 −0�0002 0.9348200 15 0.8987 0.0432 0�0077 0.8650 0.0324 −0�0013 0.9410200 25 0.9661 0.0272 0�0015 0.9228 0.0250 −0�0006 0.9394200 50 0.9966 0.0174 −0�0001 0.9496 0.0171 −0�0002 0.9424

DGP 2 100 15 0.8063 0.0711 −0�0123 0.9562 0.0502 −0�0037 0.9090100 25 0.8974 0.0461 −0�0060 0.9760 0.0351 0�0011 0.9336100 50 0.9689 0.0278 −0�0011 0.9860 0.0242 −0�0010 0.9320200 15 0.8151 0.0557 −0�0159 0.9436 0.0352 −0�0017 0.9308200 25 0.9037 0.0328 −0�0047 0.9664 0.0252 −0�0006 0.9442200 50 0.9711 0.0193 −0�0014 0.9842 0.0164 0�0000 0.9304

DGP 3 100 25 0.7941 0.1701 0�0805 0.7856 0.1077 0�0114 0.9376100 50 0.9456 0.0859 0�0231 0.8970 0.0752 0�0090 0.9504200 25 0.8277 0.1325 0�0777 0.7214 0.0821 0�0116 0.9104200 50 0.9527 0.0635 0�0223 0.8818 0.0573 0�0121 0.9280

undercovers due to the extra complexity of the bias caused by incidental pa-rameters. Here the bias in post-Lasso is effectively reduced by the half-paneljackknife (Dhaene and Jochmans (2015)), but it cannot be completely elimi-nated in finite samples.

5. EMPIRICAL APPLICATIONS

In this section, we illustrate the use of C-Lasso in two cross-country studies.We explore the determinants of savings rates via a linear dynamic panel modeland the relationship between civil war incidence and poverty via a dynamic Pro-bit model. Due to space limitation, we only report the estimated coefficients inthe main text. Summary statistics, group membership, and additional details ofimplementation can be found in the Supplemental Material.

5.1. Savings Rate Dynamic Panel Modeling and Classification

Understanding the disparate savings behavior across countries is a long-standing research interest in development economics. Theoretical advancesand empirical studies have accumulated over many years; see Feldstein (1980),Deaton (1990), Edwards (1996), Bosworth, Collins, and Reinhart (1999),Rodrik (2000), and Li, Zhang, and Zhang (2007), among many others. Em-pirical research in this area typically employs standard panel data methods

Page 28: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2241

to handle heterogeneity or relies on prior information to categorize countriesinto groups. Classification criteria vary from geographic locations to the notionof developed countries versus developing countries (Loayza, Schmidt-Hebbel,and Servén (2000)). This section applies the methodology developed in thepresent paper to revisit this empirical problem.

Following Edwards (1996), we consider the simple regression model

(5.1) Sit = β1iSi�t−1 +β2iIit +β3iRit +β4iGit +μi + εit�where Sit is the ratio of savings to GDP, Iit is the CPI-based inflation rate, Ritis the real interest rate, Git is the per capita GDP growth rate, μi is a fixed ef-fect, and εit is an idiosyncratic error term. Inflation characterizes the degree ofmacroeconomic stability and the real interest rate reflects the price of money.The relationship between the savings rate and GDP growth rate is well docu-mented, with the latter being found to Granger-cause the former (Carroll andWeil (1994)). The first-order lagged savings rate is added to the specificationto capture persistence of the savings rate.

Data are obtained from the widely used World Development Indicators, acomprehensive data set compiled by the World Bank. For many countries, thetime series of real interest rates are often short in comparison with the othervariables. Using the time span 1995–2010, we were able to construct a balancedpanel of 56 countries. Substantial heterogeneity across countries was observedin all the major macroeconomic indicators. Evidence of within-group homo-geneity is therefore particularly important in supporting panel data poolingtechniques.

This dynamic panel model can be estimated by either PLS or PGMM. Wefirst try PLS, which has higher correct classification ratio in our simulationwhen T = 15. Following the simulation, ρ1NT is set as 2

3(NT)−1/2, and the IC

picks two groups and the tuning parameter constant cλ1 = 1�55 over all com-binations of K = 1� � � � �5 and cλ1 in a geometrically increasing sequence of 10points in (0�2� � � � �2). Based on this choice of tuning parameter, the data de-termine the group identities. Interestingly, some geographic features remainsalient in the classification. For example, we observe a strong collection ofAsian countries in Group 1. In particular, except for South Korea and the citystate Singapore, Group 1 includes all Eastern Asian and Southeastern Asiancountries in our sample, namely, China, Japan, Indonesia, Malaysia, Philip-pines, and Thailand.

Columns 3–4 in Table III report the results for the PLS-based post-Lassoestimation, in comparison with those for the pooled FE estimation in col-umn 2. The estimates are bias-corrected by the half-panel jackknife (Dhaeneand Jochmans (2015)), and the standard errors (in parentheses) are clusteredat the country level. Compared with Edwards (1996), the FE results re-confirmthe significance of lagged savings and GDP growth rate as well as the insignif-

Page 29: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2242 L. SU, Z. SHI, AND P. C. B. PHILLIPS

TABLE III

PLS AND PGMM ESTIMATION RESULTSa

PLS PGMM

Variables Pooled FE Group1 Group2 Pooled GMM Group1 Group2

Lagged savings 0�7609∗∗∗ 0�6952∗∗∗ 0�6939∗∗∗ 0�5854 0�4026 0�6373∗∗

(0�0322) (0�0433) (0�0449) (0�4588) (0�3095) (0�3197)Inflation −0�0145 −0�1601∗∗∗ 0�1967∗∗∗ 0�0350 −0�1647∗∗ 0�4128∗∗∗

(0�0324) (0�0388) (0�0435) (0�0621) (0�0733) (0�0758)Interest rate −0�0346 −0�1490∗∗∗ 0�1226∗∗∗ −0�0333 −0�1580∗∗ 0�1395∗

(0�0313) (0�0397) (0�0408) (0�0598) (0�0729) (0�0775)GDP growth 0�2027∗∗∗ 0�2892∗∗∗ 0�1127∗∗ 0�2081∗∗∗ 0�1853∗∗∗ 0�2061∗∗

(0�0353) (0�0413) (0�0517) (0�0541) (0�0627) (0�0908)

aNote: ∗∗∗ 1% significant, ∗∗ 5% significant, ∗ 10% significant.

icance of inflation and interest rates in the determination of savings rate. Thisresult also lends support to the conventional wisdom that, across countries,higher savings rates tend to go hand in hand with higher income growth (e.g.,Loayza, Schmidt-Hebbel, and Servén (2000)). The post-Lasso estimates de-liver some interesting findings. First, the coefficients of the inflation rate andthe real interest rate become significant in both groups but have opposite signs,which lead to insignificant effects in pooled FE estimation. Second, the coeffi-cient of the GDP growth rate is significant at the 5% level, which suggests thatconventional wisdom is universally relevant and applies both within and acrossgroups.

To check the robustness of PLS, we compare it with PGMM on the samedata. The IC with ρ2NT = 2

3(NT)−1/2 is minimized at K = 2 and cλ2 = 0�72

among the same combinations of K and cλ2 for PLS. The estimated groupidentities reveal 84% overlap with the PLS classified membership, and the co-efficients in columns 6–7 of Table III are comparable to those from PLS.

5.2. Dynamic Probit Panel Modeling of Civil War Conflict

According to a conservative estimate, direct casualties from civil conflictswere at least 16.2 million in the second half of the twentieth century, a figurefive times as large as the inter-state toll (Fearon and Laitin (2003)). Civil wardamage to national development has attracted interest among economists andpolitical scientists, looking at both causes and consequences, and leading to anexplosion of research output (Miguel, Satyanath, and Sergenti (2004), Besleyand Persson (2010), Nunn and Qian (2014)). A comprehensive overview wasgiven in Blattman and Miguel (2010).

Page 30: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2243

This section revisits the connection between civil wars and poverty, a topicof enduring research interest.6 Cross-country empirical work mostly followsFearon and Laitin (2003) and Collier and Hoeffler (2004) in regressing waronset or incidence against posited causes of civil conflict. Country-specific het-erogeneity is handled either by control variables or fixed effects. In view of themeasurement error in many macro variables and the difficulty in exhausting allrelevant factors, Djankov and Reynal-Querol (2010) explored the fixed effectapproach in linear regressions. Group-specific heterogeneity was also investi-gated after identifying groups using observed information relating to formercolonial families (Djankov and Reynal-Querol (2010)) or continental regions(Esteban, Mayoral, and Ray (2012)). Without such information, PPL can dealwith unobservable country- and group-specific heterogeneity simultaneously ina nonlinear model.

We use the replication data in Fearon and Laitin (2003). Since most of thevariables in the data set are time-invariant country characteristics, we collectcivil war incidence, GDP per capita, and log population to generate a balancedpanel of 38 countries and 39 years spanning from 1960 to 1998.7 We specifya panel AR(1) Probit model as in DGP 3 to capture the high persistence ofcivil war incidence, and we transform GDP per capita and log population intogrowth rates to avoid nonstationarity.

The IC with ρ1NT = 14(ln lnT)/T selects two groups and the tuning parame-

ter constant cλ1 = 0�046 from all the combinations ofK = 1� � � � �5, and cλ1 from10 points in the geometrically increasing sequence (0�01� � � � �0�1). C-Lassoclassifies 23 countries into a “high-occurrence” group (with mean civil war inci-dence 0.4302), and the other 15 countries into a “low-occurrence” group (withmean incidence 0.2263). In terms of geographic features, Iran and Jordan areseparated from all the other 12 Asian countries, most of which are plaguedby civil wars; the four included European countries (Cyprus, Russia, UK, Yu-goslavia) all fall into the low-occurrence group.

Table IV displays the estimated PPL coefficients along with those forstandard Probit and FE Probit regressions. Again, the estimates are bias-corrected by the half-panel jackknife, and the standard errors are clus-tered at the country level. Obviously, civil war incidence is highly persis-tent, and its association with GDP per capita growth remains robust in Pro-bit and FE Probit regressions. However, the effects are distinguished in the

6It was taken as a stylized fact in pooled regressions that “civil wars are more likely to occur incountries that are poor” (Blattman and Miguel (2010)), but Djankov and Reynal-Querol (2010)found “the statistical association between poverty and civil wars disappears once we include coun-try fixed effects [into a linear panel model].”

7Between 1960–1998, there are 102 countries with data on all the three variables available.However, 61 of them had no civil war in the sample period and 3 had ongoing civil wars through-out the time. These countries are dropped from the data, since they are associated with infinite(−∞ or ∞) fixed effects in a FE Probit model.

Page 31: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2244L

.SU,Z

.SHI,A

ND

P.C.B

.PHIL

LIPS

TABLE IV

PROBIT, FE PROBIT, AND PPL ESTIMATION RESULTSa

Post-Lasso PPL

Probit FE Probit High-Occurrence Low-Occurrence

Variables Coef. S.E. Coef. S.E. coef. S.E. Coef. S.E.

Lagged civil war 3�1955∗∗∗ 0�1156 3�2649∗∗∗ 0�1140 3�3012∗∗∗ 0�1363 2�9630∗∗∗ 0�2707GDP per capita growth −0�4359∗∗∗ 0�1155 −0�3854∗∗∗ 0�1389 0�1591 0�1193 −1�2072∗∗∗ 0�2220Population growth −0�0125 0�1107 0�0162 0�1284 −0�0448 0�1429 0�2811 0�1736

aNote: ∗∗∗ 1% significant, ∗∗ 5% significant, ∗ 10% significant.

Page 32: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2245

two groups: the negative coefficient is statistically significant in the low-occurrence group, but no such relationship is found in the high-occurrencegroup.

6. CONCLUSION

We propose a novel and systematic approach to identify and estimate latentgroup structures in panel data, developing panel penalized profile likelihood(PPL) and panel GMM (PGMM) methods for classification and estimation,and providing asymptotic properties for use in inference. The PPL method en-joys the oracle property, but PGMM typically does not. Post-Lasso estimatesare also studied and a BIC-type information criterion is proposed to deter-mine the number of groups. These techniques combine to provide a general ap-proach to classifying and estimating panel models with unknown homogeneousgroups, heterogeneity across groups, and an unknown number of groups. Sim-ulations show that the approach has good finite-sample performance and canbe readily implemented in practical work. Two applications reveal the advan-tages of data-determined identification of latent group structures in empiricalpanel modeling.

The present work raises interesting issues for further research. First, it maybe appealing to consider a more general framework that allows the num-ber (K0) of groups to grow with the sample size. Close examination of thetheory provided in this paper suggests that it is possible to permit K0 to in-crease with N but at a very slow rate. Second, both the linear and nonlinearmodels may be extended to include time effects or interactive fixed effects(IFE). In linear models with IFE but without endogeneity, we remark that thepresent approach can be used in conjunction with principal component anal-ysis to address cross-sectional dependence modeled through IFE. Extensionto nonlinear models or to models with endogeneity will raise new statisticaland computational challenges. Third, our method can be extended to non-stationary panels where panel unit and cointegrating relationships may pos-sess latent group structures. Some of these topics will be explored in futurework.

APPENDIX A: PROOFS OF THE RESULTS IN SECTION 2

PROOF OF THEOREM 2.1: (i) Let Q1NT�i(βi)= 1T

∑T

t=1ψ(wit;βi� μi(βi)) andQ(K0)1iNT�λ1

(βi�α)=Q1NT�i(βi)+λ1∏K0

k=1 ‖βi−αk‖. Let bi = βi−β0i and bi = βi−

β0i . Since μi(βi) = arg minμi

1T

∑T

t=1ψ(wit;βi�μi), we have 1T

∑T

t=1 Vi(wit;βi�μi(βi))= 0 ∀βi. Then by second-order Taylor expansion and the envelope the-

Page 33: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2246 L. SU, Z. SHI, AND P. C. B. PHILLIPS

orem, we have

Q1NT�i(βi)−Q1NT�i

(β0i

)(A.1)

= 1T

T∑t=1

ψ(wit; βi� μi(βi)

)− 1T

T∑t=1

ψ(wit;β0

i � μi(β0i

))

= b′iSi +

12b′iHiββ(βi)bi�

where βi lies between βi and β0i elementwise, Si = 1

T

∑T

t=1Ui(wit;β0i � μi(β

0i )),

and

Hiββ(βi)(A.2)

= 1T

T∑t=1

[Uβii

(wit;βi� μi(βi)

)+Uμii

(wit;βi� μi(βi)

)∂μi(βi)∂β′

i

]�

By Lemmas S1.6 and S1.10 in the Supplemental Material, Si = OP(T−1/2),

1N

∑N

i=1 ‖Si‖2 = OP(T−1), and cH ≡ min1≤i≤N μmin(Hiββ(βi)) ≥ cH − oP(1). By

the triangle and reverse triangle inequalities,

∣∣∣∣∣K0∏k=1

‖βi − αk‖ −K0∏k=1

∥∥β0i − αk

∥∥∣∣∣∣∣(A.3)

≤∣∣∣∣∣K0−1∏k=1

‖βi − αk‖{‖βi − αK0‖ − ∥∥β0

i − αK0

∥∥}∣∣∣∣∣+∣∣∣∣∣K0−2∏k=1

‖βi − αk‖∥∥β0

i − αK0

∥∥{‖βi − αK0−1‖ − ∥∥β0i − αK0−1

∥∥}∣∣∣∣∣+ · · ·

+∣∣∣∣∣K0∏k=2

∥∥β0i − αk

∥∥{‖βi − α1‖ − ∥∥β0i − α1

∥∥}∣∣∣∣∣≤ ciNT (α)

∥∥βi −β0i

∥∥�where ciNT (α) = ∏K0−1

k=1 ‖βi − αk‖ + ∏K0−2k=1 ‖βi − αk‖‖β0

i − αK0‖ + · · · +∏K0k=2 ‖β0

i − αk‖ = OP(1). By (A.1), (A.3), and the fact that Q(K0)1iNT�λ1

(βi� α) −Q(K0)1iNT�λ1

(β0i � α) ≤ 0, we have cH‖bi‖2 ≤ 2(‖Si‖ + ciNT (α)λ1)‖bi‖. Then, by As-

Page 34: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2247

sumption A1(v),

(A.4) ‖bi‖ ≤ 2c−1H

(‖Si‖ + ciNT (α)λ1

)=OP(T−1/2 + λ1

)�

(ii) Let β = β0 + T−1/2v, where v = (v1� � � � � vN) is a p×N matrix. We wantto show that for any given ε∗ > 0, there exists a large constant L= L(ε∗) suchthat, for sufficiently large N and T , we have

(A.5) P{

infN−1 ∑N

i=1 ‖vi‖2=LQ(K0)1NT�λ1

(β0 + T−1/2v� α

)>Q

(K0)1NT�λ1

(β0�α0

)}≥ 1 − ε∗�

This implies that w.p.a.1 there is a local minimum {β� α} such thatN−1

∑N

i=1 ‖bi‖2 = OP(T−1) regardless of the property of α� By (A.1) and the

Cauchy–Schwarz inequality,

T[Q(K0)1NT�λ1

(β0 + T−1/2v� α

)−Q(K0)1NT�λ1

(β0�α0

)]= 1

2N

N∑i=1

v′iHiββ(βi)vi +

√T

N

N∑i=1

v′iSi +

λ1

N

N∑i=1

K0∏k=1

‖βi − αk‖

≥ cH2N

N∑i=1

‖vi‖2 −{

1N

N∑i=1

‖vi‖2

}1/2{T

N

N∑i=1

‖Si‖2

}1/2

≡D1NT −D2NT � say�

Noting that cH = cH − oP(1) and TN

∑N

i=1 ‖Si‖2 =OP(1), D1NT dominates D2NT

for sufficiently large L. That is, T [Q(K0)1NT�λ1

(β0 +T−1/2v� α)−Q(K0)1NT�λ1

(β0�α0)]>0 for sufficiently large L. Consequently, we must have N−1

∑N

i=1 ‖bi‖2 =OP(T

−1).(iii) Let PNT (β�α) = 1

N

∑N

i=1

∏K0k=1 ‖βi − αk‖. By the Minkowski inequality

and the result in (i), as (N�T)→ ∞,

ciNT (α) ≤K0−1∏k=1

{∥∥βi −β0i

∥∥+ ∥∥β0i − αk

∥∥}(A.6)

+K0−2∏k=1

{∥∥βi −β0i

∥∥+ ∥∥β0i − αk

∥∥}∥∥β0i − αK0

∥∥

+ · · · +K0∏k=2

∥∥β0i − αk

∥∥

Page 35: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2248 L. SU, Z. SHI, AND P. C. B. PHILLIPS

=K0−1∑s=0

∥∥βi −β0i

∥∥s s∏k=1

aks∥∥β0

i − αk∥∥K0−1−s

≤ CK0(α)

K0−1∑s=0

∥∥βi −β0i

∥∥s ≤ CK0(α)(1 + 2

∥∥βi −β0i

∥∥)�where the aks are finite integers and

CK0(α)≡ max1≤l≤K0

max1≤s≤K0−1

s∏k=1

aks∥∥α0

l − αk∥∥K0−1−s =O(1)

as K0 is finite. By (A.3) and (A.6), as (N�T)→ ∞,

∣∣PNT (β�α)− PNT(β0�α

)∣∣(A.7)

≤ CK0(α)1N

N∑i=1

‖bi‖ + 2CK0(α)1N

N∑i=1

‖bi‖2

≤ CK0(α)

{1N

N∑i=1

‖bi‖2

}1/2

+OP(T−1

)=OP(T−1/2

)�

By (A.7), and the fact that PNT (β0�α0)= 0 and that PNT (β� α)−PNT (β�α0)≤0, we have

0 ≥ PNT (β� α)− PNT(β�α0

)(A.8)

= PNT(β0� α

)− PNT(β0�α0

)+OP(T−1/2

)= 1N

N∑i=1

K0∏k=1

∥∥β0i − αk

∥∥+OP(T−1/2

)

= N1

N

K0∏k=1

∥∥αk − α01

∥∥+ · · · + NK0

N

K0∏k=1

∥∥αk − α0K0

∥∥+OP(T−1/2

)�

By Assumption A1(vii), Nk/N → τk ∈ (0�1) for each k = 1� � � � �K0. So (A.8)implies that

∏K0k=1 ‖αk − α0

l ‖ = OP(T−1/2) for l = 1� � � � �K0. It follows that

(α(1)� � � � � α(K0))− (α01� � � � �α

0K0)=OP(T−1/2). Q.E.D.

PROOF OF THEOREM 2.2: (i) First, we fix k ∈ {1� � � � �K0}. By the consistencyof αk and βi in Theorem 2.1 and Assumption A1(vi)–(vii), βi− αl P→ α0

k−α0l �=

Page 36: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2249

0 for all i ∈ G0k and l �= k and cki ≡ ∏K0

l=1�l �=k ‖βi − αl‖ P→ c0k ≡ ∏K0

l=1�l �=k ‖α0k −

α0l ‖ ≥ cK0−1

α > 0 for i ∈G0k. Now, suppose that ‖βi − αk‖ �= 0 for some i ∈G0

k.By the envelope theorem, the first-order condition (with respect to βi) for theminimization problem in (2.5) yields that

0p×1 = 1√T

T∑t=1

Ui

(wit; βi� μi(βi)

)+ √Tλ1

K0∑j=1

eij

K0∏l=1�l �=j

‖βi − αl‖(A.9)

= 1√T

T∑t=1

Uit +(

λ1cki

‖βi − αk‖Ip +Hiββ

)√T(βi − αk)

+ 1√T

T∑t=1

[Ui

(wit;β0

i � μi(β0i

))−Uit

]

+Hiββ

√T(αk − α0

k

)+ √Tλ1

K0∑j=1�j �=k

eij

K0∏l=1�l �=j

‖βi − αl‖

≡ Bi1 + Bi2 + Bi3 + Bi4 + Bi5�

where eij = βi−αj‖βi−αj‖ if ‖βi − αj‖ �= 0 and ‖eij‖ ≤ 1 otherwise, the second equal-

ity follows from the first-order Taylor expansion and rearrangement of terms,Hiββ ≡ Hiββ(βi), Hiββ(·) is defined in (A.2), βi lies between βi and β0

i elemen-twise.

Let κ1NT = (T−1/2(lnT)3 + λ1)(lnT)ν . Let C denote a generic constant thatmay vary across lines. By (A.4) and Lemmas S1.6–S1.10 in the SupplementalMaterial, we can readily show that

(A.10) P(

maxi

∥∥βi −β0i

∥∥≥ Cκ1NT

)= o(N−1

)for some C > 0�

which, in conjunction with the proof of Theorem 2.1(iii), implies that

P(√T∥∥αk − α0

k

∥∥≥ C(lnT)ν)= o(N−1)

and(A.11)

P(

maxi∈G0

k

∣∣cki − c0k

∣∣≥ c0k/2

)= o(N−1

)�

By (A.10)–(A.11), P(maxi∈G0k‖Bi5‖ ≥ C

√Tλ1κ1NT ) = o(N−1). Combining

these results with those in Lemmas S1.6(v) and S1.11(i), we have P(ΞkNT ) =

Page 37: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2250 L. SU, Z. SHI, AND P. C. B. PHILLIPS

1 − o(N−1), where

ΞkNT ≡{

maxi∈G0

k

∣∣cki − c0k

∣∣≤ c0k/2

}∩{

maxi∈G0

k

∥∥Hiββ −Hiββ

(β0i

)∥∥≤ cH/2}

∩{

maxi∈G0

k

‖Bi3‖ ≤ C(lnT)3+ν}

∩{

maxi∈G0

k

‖Bi4‖ ≤ C(lnT)ν}

∩{

maxi∈G0

k

‖Bi5‖ ≤ C√Tλ1κ1NT

}�

Then conditional on ΞkNT , we have, uniformly in i ∈G0k,∥∥(βi − αk)′(Bi2 + Bi3 + Bi4 + Bi5)

∥∥≥ ∥∥(βi − αk)′Bi2∥∥− ∥∥(βi − αk)′(Bi3 + Bi4 + Bi5)

∥∥≥ √

Tλ1cki‖βi − αk‖ −C‖βi − αk‖[2(lnT)3+ν + √

Tλ1κ1NT

]≥ √

Tλ1c0k‖βi − αk‖/4 for sufficiently large (N�T)�

where the last inequality follows because√Tλ1 � 2(lnT)3+ν + √

Tλ1κ1NT byAssumption A2(i). Then for all i ∈G0

k, we have

P(EkNT�i) = P(i /∈ Gk|i ∈G0

k

)= P(−Bi1 = Bi2 + Bi3 + Bi4 + Bi5)≤ P

(∣∣(βi − αk)′Bi1∣∣≥ ∣∣(βi − αk)′(Bi2 + Bi3 + Bi4 + Bi5)∣∣)

≤ P(‖Bi1‖ ≥ √

Tλ1c0k/4�ΞkNT

)+ P(ΞckNT

)→ 0 as (N�T)→ ∞�

where ΞckNT denotes the complement of ΞkNT and the convergence follows

by Lemma S1.6(iv) and Assumption A2. Consequently, we conclude that withprobability 1−o(N−1), the difference βi− αk must reach the point where ‖βi−αk‖ is not differentiable with respect to βi for any i ∈ G0

k. That is, P(‖βi −αk‖ = 0|i ∈G0

k)= 1 − o(N−1).For uniform consistency, we have P(

⋃K0k=1 EkNT ) ≤ ∑K0

k=1P(EkNT ) ≤∑K0k=1

∑i∈G0

kP(EkNT�i) and by Lemma S1.6(iv),

K0∑k=1

∑i∈G0

k

P(EkNT�i)(A.12)

≤K0∑k=1

∑i∈G0

k

[P(‖Bi1‖ ≥ √

Tλ1c0k/4�ΞkNT

)+ P(ΞckNT

)]

Page 38: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2251

≤N max1≤i≤N

P

(∥∥∥∥∥ 1T

T∑t=1

Uit

∥∥∥∥∥≥ λ1cK0−1α /4

)+ o(1)= o(1)�

This completes the proof of (i).(ii) Pretending each individual’s membership is random, we have P(i ∈

G0k)=Nk/N → τk ∈ (0�1) for k= 1� � � � �K0 and can interpret previous results

as conditional on the group membership assignment. By Bayes’s theorem,

P(FkNT�i)(A.13)

= 1 − P(i ∈G0k|i ∈ Gk

)

=

K0∑l=1�l �=k

P(i ∈ Gk|i ∈G0

l

)P(i ∈G0

l

)

P(i ∈ Gk|i ∈G0

k

)P(i ∈G0

k

)+K0∑

l=1�l �=kP(i ∈ Gk|i ∈G0

l

)P(i ∈G0

l

) �

For the numerator, we have, by (A.12),

K0∑l=1�l �=k

∑i∈Gk

P(i ∈ Gk|i ∈G0

l

)P(i ∈G0

l

)

≤ (K0 − 1)K0∑l=1

∑i∈G0

l

P(i /∈ Gl|i ∈G0

l

)= o(1)�

In addition, noting that P(i ∈ Gk|i ∈ G0k) = 1 − P(i /∈ Gk|i ∈ G0

k) = 1 − o(1)uniformly in i and k by (i), we have that P(i ∈ Gk|i ∈ G0

k)P(i ∈ G0k) +∑K0

l=1�l �=k P(i ∈ Gk|i ∈G0l )P(i ∈G0

l )≥ P(i ∈G0k)/2 w.p.a.1. It follows that

P

(K0⋃k=1

FkNT

)≤

K0∑k=1

∑i∈Gk

P(FkNT�i)

K0∑l=1�l �=k

∑i∈Gk

P(i ∈ Gk|i ∈G0

l

)P(i ∈G0

l

)min

1≤i≤Nmin

1≤k≤K0P(i ∈G0

k

)/2

= o(1)min

1≤k≤K0τk/2

= o(1)�Q.E.D.

Page 39: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2252 L. SU, Z. SHI, AND P. C. B. PHILLIPS

PROOF OF COROLLARY 2.3: Noting that Nk = ∑N

i=1 1{i ∈ Gk}, Nk =∑N

i=1 1{i ∈G0k}, and 1{i ∈ Gk}− 1{i ∈G0

k} = 1{i ∈ Gk \G0k}− 1{i ∈G0

k \ Gk}, wehave Nk−Nk =∑N

i=1[1{i ∈ Gk \G0k}−1{i ∈G0

k \ Gk}]. Then by the implicationrule and the Markov inequality, for any ε > 0,

P(|Nk −Nk| ≥ 2ε

) ≤ P(

N∑i=1

1{i ∈ Gk \G0

k

}≥ ε)

+ P(

N∑i=1

1{i ∈G0

k \ Gk

}≥ ε)

= 1ε

N∑i=1

P(FkNT�i)+ 1ε

N∑i=1

P(EkNT�i)�

By (A.12),∑N

i=1 P(EkNT�i) = ∑K0k=1

∑i∈G0

kP(EkNT�i) = o(1). By the proof of

Theorem 2.2(i),∑N

i=1P(FkNT�i)=∑K0k=1

∑i∈Gk P(FkNT�i)= o(1). Consequently,

P(|Nk −Nk| ≥ 2ε)= o(1) and the conclusion follows. Q.E.D.

PROOF OF THEOREM 2.4: To study the oracle property of the Lasso estima-tor, we utilize conditions from subdifferential calculus (e.g., Bersekas (1995,Appendix B.5)). In particular, necessary and sufficient Karush–Kuhn–Tucker(KKT) conditions for {βi} and {αk} to minimize the objective function in (2.5)are that for each i = 1� � � � �N (resp. k= 1� � � � �K0), 0p×1 belongs to the subd-ifferential of Q(K0)

1NT�λ1(β�α) with respect to βi (resp. αk) evaluated at {βi} and

{αk}. That is, for each i= 1� � � � �N and k= 1� � � � �K0, we have by the envelopetheorem,

0p×1 = 1T

T∑t=1

Ui

(wit; βi� μi(βi)

)+ λ1

K0∑j=1

eij

K0∏l=1�l �=j

‖βi − αl‖�(A.14)

0p×1 = λ1

N

N∑i=1

eik

K0∏l=1�l �=k

‖βi − αl‖�(A.15)

where eij is defined after (A.9). Fix k ∈ {1� � � � �K0}. Observe that (a) ‖βi −αk‖ = 0 for any i ∈ Gk by the definition of Gk, and (b) βi − αl P→ α0

k − α0l �= 0

for any i ∈ Gk and l �= k. It follows that ‖eik‖ ≤ 1 for any i ∈ Gk and eij =βi−αj

‖βi−αj‖ = αk−αj‖αk−αj‖ for any i ∈ Gk and j �= k. Then by the fact that βi = αk ∀i ∈ Gk

Page 40: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2253

and (A.15),

∑i∈Gk

K0∑j=1�j �=k

eij

K0∏l=1�l �=j

‖βi − αl‖(A.16)

=∑i∈Gk

K0∑j=1�j �=k

αk − αj‖αk − αj‖

K0∏l=1�l �=j

‖αk − αl‖ = 0p×1

and

0p×1 =N∑i=1

eik

K0∏l=1�l �=k

‖βi − αl‖(A.17)

=∑i∈Gk

eik

K0∏l=1�l �=k

‖αk − αl‖ +∑i∈G0

eik

K0∏l=1�l �=k

‖βi − αl‖

+K0∑

j=1�j �=k

∑i∈Gj

eik

K0∏l=1�l �=k

‖αj − αl‖

=∑i∈Gk

eik

K0∏l=1�l �=k

‖αk − αl‖ +∑i∈G0

eik

K0∏l=1�l �=k

‖βi − αl‖�

where the last equality follows from the fact that

K0∑j=1�j �=k

∑i∈Gj

eik

K0∏l=1�l �=k

‖αj − αl‖

=K0∑

j=1�j �=k

∑i∈Gj

αj − αk‖αj − αk‖

K0∏l=1�l �=k

‖αj − αl‖ = 0�

Then, averaging both sides of (A.14) over i ∈ Gk and using (A.16)–(A.17), wehave

(A.18) 0p×1 = 1NkT

∑i∈Gk

T∑t=1

Ui

(wit; αk� μi(αk)

)+ λ1

Nk

∑i∈G0

eik

K0∏l=1�l �=k

‖βi − αl‖�

Using Ui(wit; αk� μi(αk)) = Ui(wit;α0k� μi(α

0k)) + H(k)(αk − α0

k) with H(k) ≡1

NkT

∑i∈Gk

∑T

t=1[Uβii (wit; αk� μi(αk))+Uμi

i (wit; α0k� μi(αk))

∂μi(αk)

∂α′k

] and αk lying

Page 41: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2254 L. SU, Z. SHI, AND P. C. B. PHILLIPS

between αk and α0k elementwise, and rearranging terms in (A.18) yield

αk − α0k = −H−1

(k)

1NkT

∑i∈Gk

T∑t=1

Ui

(wit;α0

k� μi(α0k

))+ Rk�

where Rk = H−1(k)

λ1N

∑i∈G0

eik∏K0

l=1�l �=k ‖βi − αl‖. In view of the fact that

eik∏K0

l=1�l �=k ‖βi − αl‖ �= 0 only if i ∈ G0, we have, for any ε > 0,

P(√NT‖Rk‖ ≥ ε) ≤

K0∑k=1

∑i∈G0

k

P(i ∈ G0|i ∈G0

k

)

≤K0∑k=1

∑i∈G0

k

P(i /∈ Gk|i ∈G0

k

)= o(1) by (A.6).

So ‖Rk‖ = oP((NT)−1/2). By Lemmas S1.11–S1.12, 1√

NkT

∑i∈Gk

∑T

t=1Ui(wit;α0k� μi(α

0k)) + BkNT

D→ N(0�Ωk) and H(k) = HkNT + oP(νNT ), where νNT =min(1�

√T/Nk). These results in conjunction with the fact that BkNT =

OP(√Nk/T) and Assumption A3(ii) imply that

√NkT(αk−α0

k)−H−1kNTBkNT

D→N(0�H−1

k Ωk(H−1k )

′). Q.E.D.

PROOF OF THEOREM 2.5: For the post-Lasso estimator, we have the follow-ing first-order conditions: 1

NkT

∑i∈Gk

∑T

t=1Ui(wit; αGk� μi(αGk)) = 0p×1. Fol-

lowing the analyses of μi(βi), βi, and ∂μi(βi)/∂βi in Lemmas S1.5, S1.7, andS1.9, we can readily establish the consistency of μi(αk), αGk , and ∂μi(βi)/∂βiin the absence of the Lasso penalty term. By Taylor expansion, we have

αGk − α0k = −H−1

Gk

1NkT

∑i∈Gk

T∑t=1

Ui

(wit;α0

k� μi(α0k

))�

where HGk≡ 1

NkT

∑i∈Gk

∑T

t=1[Uαki (wit; αGk� μi(αGk))+Uμi

i (wit; α0k� μi(αGk))×

∂μi(αGk)/∂α′k] and αGk lies between αGk and α0

k elementwise. Following theanalysis of H(k) in Lemma S1.13, we can also show that HGk

= Hk + oP(1).This result, in conjunction with Lemma S1.12, implies that

√NkT(αGk −α0

k)−H

−1k BkNT

D→N(0�H−1k Ωk(H

−1k )

′). Q.E.D.

PROOF OF THEOREM 2.6: Let K = {1�2� � � � �Kmax}. We divide K into threesubsets: K0 ≡ {K0}, K− ≡ {K ∈ K : K < K0}, and K+ ≡ {K ∈ K : K > K0},

Page 42: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2255

in which the true, under-, and over-fitted models are produced, respec-tively. Let σ2

G(K�λ1)= 2

NT

∑K

k=1

∑i∈Gk(K�λ1)

∑T

t=1ψ(wit; αGk(K�λ1)� μi(αGk(K�λ1)

)).Using Theorems 2.2 and 2.5 and Assumption A5, we can readily show thatIC1(K0�λ1)= σ2

G(K0�λ1)+ρ1NTpK0 = 2

NT

∑K0k=1

∑i∈Gk(K0�λ1)

∑T

t=1ψ(wit;α0k�μ

0i )+

oP(1)P→ ln(σ2

0 ). We consider the cases of under- and over-fitted models sepa-rately.

Case 1: Under-fitted model. In this case, we have K <K0. Noting that

σ2G(K�λ1)

≥ min1≤K<K0

infG(K)∈GK

2NT

K∑k=1

∑i∈GK�k

T∑t=1

ψ(wit; αGK�k� μi(αGK�k)

)

= min1≤K<K0

infG(K)∈GK

σ2G(K)

we have by Assumptions A4–A5 that

min1≤K<K0

IC1(K�λ1)≥ min1≤K<K0

infG(K)∈GK

σ2G(K)

+ ρ1NTpKP→ ln(σ2) > ln(σ2

0 )�

It follows that P(minK∈Ω− IC1(K�λ1) > IC1(K0�λ1))→ 1.Case 2: Over-fitted model. LetK ∈Ω+. By Lemma S1.14 in the Supplemental

Material and the fact that Tρ1NT → ∞ under Assumption A5, we have

P(

minK∈Ω+

IC1(K�λ1) > IC1(K0�λ1))

= P(

minK∈Ω+

[T(σ2G(K�λ1)

− σ2G(K0�λ1)

)+ Tρ1NT (K −K0)]> 0

)→ 1 as (N�T)→ ∞�

It follows that P(K(λ1)=K0)→ 1 as (N�T)→ ∞. Q.E.D.

APPENDIX B: PROOFS OF THE RESULTS IN SECTION 3

We start by proving a useful technical result and then proceed to provethe main results. Let ViNT (βi) ≡ [ 1

T

∑T

t=1 ρ(ξit�βi)]′WiNT [ 1T

∑T

t=1 ρ(ξit�βi)],and Vi(βi) ≡ { 1

T

∑T

t=1 E[ρ(ξit�βi)]}′Wi1T

∑T

t=1 E[ρ(ξit�βi)]. Let Ri�T (βi) =[ 1T

∑T

t=1{ρ(ξi�βi)−E[ρ(ξi�βi)]}]′Wi[ 1T

∑T

t=1{ρ(ξi�βi)−E[ρ(ξi�βi)]}].

LEMMA B.1: Suppose Assumption B1(iv) holds. Then P(c[ 12Vi(βi) −

Ri�T (βi)] ≤ ViNT (βi) ≤ c[2V i(βi) + 2Ri�T (βi)]) = 1 − o(N−1) for all βi ∈ Bi,where c and c are some generic positive constants that do not depend on i with0< c < 1< c <∞.

Page 43: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2256 L. SU, Z. SHI, AND P. C. B. PHILLIPS

PROOF: Let ΛNT ≡ {maxi ‖WiNT − Wi‖ ≤ CcW } for some C ∈ (0�1). ThenP(ΛNT )= 1 − o(N−1) by Assumption B1(iv). On the set ΛNT , we have

c

[1T

T∑t=1

ρ(ξit�βi)

]′

Wi

[1T

T∑t=1

ρ(ξit�βi)

](B.1)

≤ ViNT (βi)≤ c[

1T

T∑t=1

ρ(ξit�βi)

]′

Wi

[1T

T∑t=1

ρ(ξit�βi)

]

for all βi ∈ Bi. By positive definiteness of Wi and simple manipulations, we canreadily show that

(a− b)′Wi(a− b)≥ 12a′Wia− b′Wib and

(a− b)′Wi(a− b)≤ 2a′Wia+ 2b′Wib

for any conformable vectors a and b. Taking a= 1T

∑T

t=1 E[ρ(ξit�βi)] and b=1T

∑T

t=1{ρ(ξit�βi)−E[ρ(ξit�βi)]}, we have

[1T

T∑t=1

ρ(ξit�βi)

]′

Wi

[1T

T∑t=1

ρ(ξit�βi)

]≥ 1

2Vi(βi)−Ri�T (βi)� and(B.2)

[1T

T∑t=1

ρ(ξit�βi)

]′

Wi

[1T

T∑t=1

ρ(ξit�βi)

]≤ 2Vi(βi)+ 2Ri�T (βi)�(B.3)

Combining (B.1)–(B.3) yields the desired results. Q.E.D.

PROOF OF THEOREM 3.1: (i) Let Q(K0)2iNT�λ2

(βi�α)= ViNT (βi)+λ2∏K0

k=1 ‖βi −αk‖. Then Q(K0)

2NT�λ2(β�α)= 1

N

∑N

i=1Q(K0)2iNT�λ2

(βi�α). By the definition of (β� α),we have

Q2iNT�λ2(βi� α)−Q2iNT�λ2

(β0i � α

)= ViNT (βi)− ViNT

(β0i

)+ λ2

{K∏k=1

‖βi − αk‖ −K∏k=1

∥∥β0i − αk

∥∥}≤ 0�

By Lemma B.1 and Assumption B1(i) and (iv), we have that ViNT (βi) ≥c[ 1

2Vi(βi) − Ri�T ] and ViNT (β0i ) ≤ c[2Vi(β0

i ) + 2R0i�T ] = 2cR0

i�T on the set ΛNT ,where Ri�T = Ri�T (βi) and R0

i�T = Ri�T (β0i ). It follows that c[ 1

2Vi(βi) − Ri�T ] −

Page 44: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2257

2cR0i�T + λ2{∏K

k=1 ‖βi − αk‖ −∏K

k=1 ‖β0i − αk‖} ≤ 0, which can be rewritten as

(B.4) Vi(βi)≤ 2c

[2cR0

i�T + cRi�T − λ2

(K∏k=1

‖βi − αk‖ −K∏k=1

∥∥β0i − αk

∥∥)]�By the arguments used to obtain (A.3) and (A.6), we have

(B.5)

∣∣∣∣∣K0∏k=1

‖βi − αk‖ −K0∏k=1

∥∥β0i − αk

∥∥∣∣∣∣∣≤ CK0(α)

(∥∥βi −β0i

∥∥+ 2∥∥βi −β0

i

∥∥2)�

Noting that 1T

∑T

t=1 E[ρ(ξi�βi)] = −Qi�z�x(βi −β0i ), we have

max1≤i≤N

Vi(βi)= max1≤i≤N

(βi −β0

i

)′Q

′i�z�xWiQi�z�x

(βi −β0

i

)(B.6)

≥ c1NT max1≤i≤N

∥∥βi −β0i

∥∥2�

where c1NT ≡ min1≤i≤N μmin(Q′i�z�xWiQi�z�x) satisfies that lim inf(N�T)→∞ c1NT ≥

cW c2Q> 0 by Assumption B1(iii)–(iv). Combining Assumptions (B.4)–(B.6)

yields

c1NT

∥∥βi−β0i

∥∥2 ≤ 2c

[2cR0

i�T +cRi�T +λ2CK0

(∥∥βi−β0i

∥∥+2∥∥βi−β0

i

∥∥2)]�

or (c1NT − 4cλ2CK0)‖βi − β0

i ‖2 ≤ 2c[2cR0

i�T + cRi�T + λ2CK0‖βi − β0i ‖], where

CK0 = CK0(α). Then

∥∥βi −β0i

∥∥ ≤(

2cλ2CK0 +

[(2cλ2CK0

)2

(B.7)

+ 8c

(c1NT − 4

cλ2CK0

)(2cR0

i�T + cRi�T)]1/2)

/(2(c1NT − 4

cλ2CK0

))

= OP(T−1/2 + λ2

)�

As in the proof of Theorem 2.1(ii), we can further demonstrate that1N

∑N

i=1 ‖βi −β0i ‖2 =OP(T−1).

The proof of (iii) is completely analogous to that of Theorem 2.1(iii),now using the facts that |PNT (β�α) − PNT (β

0�α)| = OP(T−1/2) and that 0 ≥

PNT (β� α)− PNT (β�α0). Q.E.D.

Page 45: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2258 L. SU, Z. SHI, AND P. C. B. PHILLIPS

PROOF OF THEOREM 3.2: (i) First, we fix k ∈ {1� � � � �K0}. By the consistencyof αk and βi in Theorem 3.1 and Assumption B1(v)–(vi), we have βi − αl

P→α0k − α0

l �= 0 for all i ∈ G0k and l �= k and cki ≡ ∏K0

l=1�l �=k ‖βi − αl‖ P→ c0k ≡∏K0

l=1�l �=k ‖α0k −α0

l ‖ ≥ cK0−1α > 0 for any i ∈G0

k. Now, suppose that ‖βi − αk‖ �= 0for some i ∈ G0

k. Then the first-order condition (with respect to βi) for theminimization problem in (3.2) implies that

0p×1 = −2Q′i�z�xWiNT

1√T

T∑t=1

zit(�yit − β′

i�xit)

(B.8)

+ √Tλ2

K0∑j=1

eij

K0∏l=1�l �=j

‖βi − αl‖

= −2Q′i�z�xWiNT

1√T

T∑t=1

zit�εit

+{

λ2cki

‖βi − αk‖Ip + 2Q′

i�z�xWiNT Qi�z�x

}√T(βi − αk)

+ 2Q′i�z�xWiNT Qi�z�x

√T(αk − α0

k

)+ √

Tλ2

K0∑j=1�j �=k

eij

K0∏l=1�l �=j

‖βi − αl‖

≡ −Bi1 + Bi2 + Bi3 + Bi4� say�

where eij = βi−αj‖βi−αj‖ if ‖βi − αj‖ �= 0 and ‖eij‖ ≤ 1 if ‖βi − αj‖ = 0. Following the

proof of Lemma S1.7, we can show that P(maxi ‖βi − β0i ‖ ≥ η)= o(N−1) for

any given η> 0. With this, by (B.7) and Assumption B2(ii)–(iv), we can readilyshow that

(B.9) P(

maxi

∥∥βi −β0i

∥∥≥ Cκ2NT

)= o(N−1

)for some C > 0�

where κ2NT = (T−1/2(lnT)3 +λ2)(lnT)ν . This, in conjunction with the proof ofTheorem 3.1(iii), implies that

P(√T∥∥αk − α0

k

∥∥≥ C(lnT)ν)= o(N−1)

and(B.10)

P(

maxi∈G0

k

∣∣cki − c0k

∣∣≥ c0k/2

)= o(N−1

)�

Page 46: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2259

By (B.9)–(B.10), P(maxi∈G0k‖Bi4‖ ≥ C

√Tλ2κ2NT ) = o(N−1). By Assump-

tion B1(iii)–(iv), we have P(maxi∈G0k‖Q′

i�z�xWiNT Qi�z�x − Q′i�z�xWiQi�z�x‖ ≥

η) = o(N−1) for any η > 0. This result, in conjunction with (B.10), impliesthat P(maxi∈G0

k‖Bi3‖ ≥ C(lnT)ν) = o(N−1) for some C > 0. It follows that

P(ΓkNT )= 1 − o(N−1), where

ΓkNT ≡{

maxi∈G0

k

∣∣cki − c0k

∣∣≤ c0k/2

}∩{

maxi∈G0

k

‖WiNT −Wi‖ ≤ cW /2}

∩{

maxi∈G0

k

‖Qi�z�x −Qi�z�x‖ ≤ cQ/2}

∩{

maxi∈G0

k

‖Bi3‖ ≤ C(lnT)ν}

∩{

maxi∈G0

k

‖Bi4‖ ≤ C√Tλ2κ2NT

}�

Then, conditional on ΓkNT , we have, uniformly in i ∈G0k,

(βi − αk)′(Bi2 + Bi3 + Bi4)≥ ∥∥(βi − αk)′Bi2∥∥− ∥∥(βi − αk)′(Bi3 + Bi4)

∥∥≥ √

Tλ2cki‖βi − αk‖ −C‖βi − αk‖[(lnT)ν + √

Tλ2κ2NT

]≥ √

Tλ2c0k‖βi − αk‖/4 for sufficiently large (N�T)�

because√Tλ2 � (lnT)ν +√

Tλ2κ2NT by Assumption B2(i). Then by Assump-tion B2(i)–(ii),

P(EkNT�i) = P(i /∈ Gk|i ∈G0

k

)= P(Bi1 = Bi2 + Bi3 + Bi4)≤ P

(∣∣(βi − αk)′Bi1∣∣≥ ∣∣(βi − αk)′(Bi2 + Bi3 + Bi4)∣∣)

≤ P(‖Bi1‖ ≥ √

Tλ2c0k/4� ΓkNT

)+ P(Γ ckNT

)→ 0 as (N�T)→ ∞�

It follows that P(‖βi − αk‖ = 0|i ∈ G0k) → 1 as (N�T) → ∞. Now, observe

that P(⋃K0

k=1 EkNT ) ≤ ∑K0k=1P(EkNT ) ≤ ∑K0

k=1

∑i∈G0

kP(EkNT�i) and by Assump-

tion B2(ii),

K0∑k=1

∑i∈G0

k

P(EkNT�i)

≤K0∑k=1

∑i∈G0

k

[P(‖Bi1‖ ≥ √

Tλ2c0k/4� ΓkNT

)+ P(Γ ckNT

)]

Page 47: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2260 L. SU, Z. SHI, AND P. C. B. PHILLIPS

≤N max1≤i≤N

P

(∥∥∥∥∥Q′i�z�xWiNT

1T

T∑t=1

zit�εit

∥∥∥∥∥≥ λ2c0k/4� ΓkNT

)+ o(1)

≤N max1≤i≤N

P

(∥∥∥∥∥ 1T

T∑t=1

zit�εit

∥∥∥∥∥≥ λ2cK0−1α /(16cQcW )

)+ o(1)= o(1)�

where we use the fact that ‖Qi�z�x‖‖WiNT‖ ≥ (‖Qi�z�x‖ − ‖Qi�z�x − Qi�z�x‖)×(‖Wi‖ − ‖WiNT − Wi‖) ≥ cQcW /4 on the set ΓkNT . Consequently, we haveshown (i).

(ii) The proof of (i) is almost identical to that of Theorem 2.2(ii) and isomitted. Q.E.D.

PROOF OF THEOREM 3.4: The proof follows closely from that of Theo-rem 2.4. Based on the subdifferential calculus, the KKT conditions for theminimization of (3.2) are that, for each i= 1� � � � �N and k= 1� � � � �K0,

0p×1 = −2Q′i�z�xWiNT

1NT

T∑t=1

zit(�yit − β′

i�xit)

+ λ2

N

K0∑j=1

eij

K0∏l=1�l �=j

‖βi − αl‖� and

0p×1 = λ1

N

N∑i=1

eik

K0∏l=1�l �=k

‖βi − αl‖�

where eij is defined after (B.8). Fix k ∈ {1� � � � �K0}. As in the proof of The-orem 2.4, we can show that 2

NT

∑i∈Gk Q

′i�z�xWiNT

∑T

t=1 zit(�yit − α′k�xit) +

λ2N

∑i∈G0

eik∏K0

l=1�l �=k ‖βi − αl‖ = 0p×1 w.p.a.1. It follows that αk = α1k + Rk,where α1k = ( 1

N

∑i∈Gk Q

′i�z�xWiNT Qi�z�x)

−1 × 1NT

∑i∈Gk Q

′i�z�xWiNT

∑T

t=1 zit�yit

and Rk = ( 1N

∑i∈Gk Q

′i�z�xWiNT Qi�z�x)

−1 λ22N

∑i∈G0

eik∏K0

l=1�l �=k ‖βi − αl‖. By The-orem 3.2, we can readily show that P(

√NT‖Rk‖ ≥ ε) = o(1) for any ε > 0,

and

√NkT

(α1k − α0

k

)=(

1Nk

∑i∈G0

k

Q′i�z�xWiNT Qi�z�x

)−1

× 1√NkT

∑i∈G0

k

Q′i�z�xWiNT

T∑t=1

zit�εit + oP(1)�

Page 48: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2261

Under Assumptions B1(iv) and B3(i)–(ii), we have 1Nk

∑i∈G0

kQ′i�z�xWiNT Qi�z�x =

1Nk

∑i∈G0

kQ

′i�z�xWiQi�z�x+oP(1)=Ak+oP(1). Then the result follows from As-

sumption B3(iii) and the Slutsky theorem. Q.E.D.

PROOF OF THEOREM 3.5: By Theorem 3.2, we can readily show that√NkT

(αGk − α0

k

)= [Q(k)′z�xW

(k)NT Q

(k)z�x

]−1Q(k)′z�xW

(k)NT

√NkTQ

(k)z�ε + oP(1)

= [Q(k)′z�x�NTW

(k)NT Q

(k)z�x�NT

]−1Q(k)′z�x�NTW

(k)NT

√NkTQ

(k)z�ε�NT + oP(1)�

where Q(k)z�ε = 1

NkT

∑i∈Gk

∑T

t=1 zit�εit and Q(k)z�ε�NT = 1

NkT

∑i∈G0

k

∑T

t=1 zit�εit . Theresults then follow by Assumption B3 and the Slutsky theorem. Q.E.D.

PROOF OF THEOREM 3.6: The proof is analogous to that of Theorem 2.6and is omitted. Q.E.D.

REFERENCES

ANDO, T., AND J. BAI (2015): “Asset Pricing With a General Multifactor Structure,” Journal ofFinancial Econometrics, 13, 556–604. [2216]

(2016): “Panel Data Models With Grouped Factor Structure Under Unknown GroupMembership,” Journal of Applied Econometrics, 31, 163–191. [2217]

ARELLANO, M., AND S. BOND (1991): “Some Tests of Specification for Panel Data: Monte CarloEvidence and an Application to Employment Equations,” Review of Economic Studies, 58, 277–297. [2231,2235]

BALTAGI, B. H., G. BRESSON, AND A. PIROTTE (2008): “To Pool or Not to Pool?” in The Econo-metrics of Panel Data: Fundamentals and Recent Developments in Theory and Practice (ThirdEd.), ed. by L. Mátyás and P. Sevestre. Berlin: Springer-Verlag, 517–546. [2216]

BELLONI, A., AND V. CHERNOZHUKOV (2013): “Least Squares After Model Selection in High-Dimensional Sparse Models,” Bernoulli, 19, 521–547. [2227]

BERTSEKAS, D. (1995): Nonlinear Programming. Belmont, MA: Athena Scientific. [2252]BESLEY, T., AND T. PERSSON (2010): “State Capacity, Conflict, and Development,” Econometrica,

78, 1–34. [2242]BESTER, C. A., AND C. B. HANSEN (2016): “Grouped Effects Estimators in Fixed Effects Mod-

els,” Journal of Econometrics, 190, 197–208. [2216]BLATTMAN, C., AND E. MIGUEL (2010): “Civil Wars,” Journal of Economic Literature, 48, 3–57.

[2242,2243]BONHOMME, S., AND E. MANRESA (2015): “Grouped Patterns of Heterogeneity in Panel Data,”

Econometrica, 83, 1147–1184. [2217]BOSWORTH, P., S. COLLINS, AND C. M. REINHART (1999): “Capital Flows to Developing

Economies: Implications for Saving and Investment,” Brookings Papers on Economic Activity,30, 143–180. [2240]

BROWNING, M., AND J. M. CARRO (2007): “Heterogeneity and Microeconometrics Modelling,”in Advances in Economics and Econometrics, Theory and Applications: Ninth World Congress ofthe Econometric Society, Vol. 3, ed. by R. Blundell, W. K. Newey, and T. Persson. New York:Cambridge University Press, 45–74. [2216]

Page 49: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2262 L. SU, Z. SHI, AND P. C. B. PHILLIPS

(2010): “Heterogeneity in Dynamic Discrete Choice Models,” Econometrics Journal, 13,1–39. [2216]

(2014): “Dynamic Binary Outcome Models With Maximal Heterogeneity,” Journal ofEconometrics, 178, 805–823. [2216,2217]

CARROLL, C., AND D. N. WEIL (1994): “Saving and Growth: A Reinterpretation,” Carnegie-Rochester Conference Series on Public Policy, 40, 133–192. [2241]

CHAN, N. H., C. Y. YAU, AND R.-M. ZHANG (2014): “Group Lasso for Structural Break TimeSeries,” Journal of the American Statistical Association, 109, 590–599. [2218]

COLLIER, P., AND A. HOEFFLER (2004): “Greed and Grievance in Civil Wars,” Oxford EconomicPapers, 56, 563–595. [2243]

DEATON, A. (1990): “Saving in Developing Countries: Theory and Review,” in Proceedings ofthe World Bank Annual Conference on Development Economics. Washington, D.C.: The WorldBank, 61–96. [2240]

DHAENE, G., AND K. JOCHMANS (2015): “Split-Panel Jackknife Estimation of Fixed-Effect Mod-els,” Review of Economic Studies, 82, 991–1030. [2240,2241]

DJANKOV, S., AND M. REYNAL-QUEROL (2010): “Poverty and Civil War: Revisiting the Evi-dence,” Review of Economics and Statistics, 92, 1035–1041. [2243]

DUALAUF, S. N., A. KOURTELLOS, AND A. MINKIN (2001): “The Local Solow Growth Model,”European Economic Review, 45, 928–940. [2216]

EDWARDS, S. (1996): “Why Are Latin America’s Savings Rates So Low? An International Com-parative Analysis,” Journal of Development Economics, 51, 5–44. [2240,2241]

ESTEBAN, J., L. MAYORAL, AND D. RAY (2012): “Ethnicity and Conflict: An Empirical Study,”American Economic Review, 102, 1310–1342. [2243]

FEARON, J. D., AND D. D. LAITIN (2003): “Ethnicity, Insurgency, and Civil War,” The AmericanPolitical Science Review, 97, 75–90. [2242,2243]

FELDSTEIN, M. (1980): “International Differences in Social Security and Saving,” Journal of Pub-lic Economics, 14, 225–244. [2240]

FERNÁNDEZ-VAL, I., AND L. LEE (2013): “Panel Data Models With Nonadditive UnobservableHeterogeneity: Estimation and Inference,” Quantitative Economics, 4, 453–481. [2230,2231]

HAHN, J., AND G. KUERSTEINER (2002): “Asymptotically Unbiased Inference for a DynamicPanel Model With Fixed Effects When Both n and T Are Large,” Econometrica, 70, 1639–1657. [2229]

(2011): “Bias Reduction for Dynamic Nonlinear Panel Models With Fixed Effects,”Econometric Theory, 27, 1152–1191. [2221,2223]

HAHN, J., AND H. R. MOON (2010): “Panel Data Models With Finite Number of Multiple Equi-libria,” Econometric Theory, 26, 863–881. [2217,2220,2224]

HAHN, J., AND W. NEWEY (2004): “Jackknife and Analytical Bias Reduction for Nonlinear PanelModels,” Econometrica, 72, 1295–1319. [2221,2226]

HARCHAOUI, Z., AND C. LÉVY-LEDUC (2010): “Multiple Change-Point Estimation With a TotalVariation Penalty,” Journal of the American Statistical Association, 105, 1481–1493. [2218]

HSIAO, C. (2014): Analysis of Panel Data (Third Ed.). New York: Cambridge University Press.[2215]

HSIAO, C., AND H. PESARAN (2008): “Random Coefficient Panel Data Models,” in The Econo-metrics of Panel Data: Fundamentals and Recent Developments in Theory and Practice (ThirdEd.), ed. by L. Mátyás and P. Sevestre. Berlin: Springer-Verlag, 187–216. [2216]

HSIAO, C., AND A. K. TAHMISCIOGLU (1997): “A Panel Analysis of Liquidity Constraints andFirm Investment,” Journal of the American Statistical Association, 92, 455–465. [2216]

KASAHARA, H., AND K. SHIMOTSU (2009): “Nonparametric Identification of Finite Mixture Mod-els of Dynamic Discrete Choices,” Econometrica, 77, 135–175. [2217]

KATO, K., A. F. GAVAO, AND G. V. MONTES-ROJAS (2012): “Asymptotics for Panel Quantile Re-gression Models With Individual Effects,” Journal of Econometrics, 170, 76–91. [2220]

KIVIET, J. F. (1995): “On Bias, Inconsistency, and Efficiency of Various Estimators in DynamicPanel Data Models,” Journal of Econometrics, 68, 53–78. [2229]

Page 50: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

IDENTIFYING LATENT STRUCTURES IN PANEL DATA 2263

LEE, K., M. H. PESARAN, AND R. SMITH (1997): “Growth and Convergence in a Multi-CountryEmpirical Stochastic Growth Model,” Journal of Applied Econometrics, 12, 357–392. [2216]

LEE, Y. (2012): “Bias in Dynamic Panel Models Under Time Series Misspecification,” Journal ofEconometrics, 169, 54–60. [2229]

LEE, Y., AND P. C. B. PHILLIPS (2015): “Model Selection in the Presence of Incidental Parame-ters,” Journal of Econometrics, 188, 474–489. [2223]

LEEB, H., AND P. M. PÖTSCHER (2008): “Sparse Estimators and the Oracle Property, or the Re-turn of Hodges’ Estimator,” Journal of Econometrics, 142, 201–211. [2227]

(2009): “On the Distribution of Penalized Maximum Likelihood Estimators: TheLASSO, SCAD, and Thresholding,” Journal of Multivariate Analysis, 100, 2065–2082. [2227]

LI, H., J. ZHANG, AND J. ZHANG (2007): “Effects of Longevity and Dependency Rates on Savingand Growth,” Journal of Development Economics, 84, 138–154. [2240]

LIAO, Z. (2013): “Adaptive GMM Shrinkage Estimation With Consistent Moment Selection,”Econometric Theory, 29, 857–904. [2227]

LIN, C.-C., AND S. NG (2012): “Estimation of Panel Data Models With Parameter HeterogeneityWhen Group Membership Is Unknown,” Journal of Econometric Methods, 1, 42–55. [2217,2220]

LOAYZA, N., K. SCHMIDT-HEBBEL, AND L. SERVÉN (2000): “Saving in Developing Countries: AnOverview,” The World Bank Economic Review, 14, 393–414. [2241,2242]

LU, X., AND L. SU (2016): “Shrinkage Estimation of Dynamic Panel Data Models With Interac-tive Fixed Effects,” Journal of Econometrics, 190, 148–175. [2227]

MIGUEL, E., S. SATYANATH, AND E. SERGENTI (2004): “Economic Shocks and Civil Conflict: AnInstrumental Variables Approach,” Journal of Political Economy, 112, 725–753. [2242]

NUNN, N., AND N. QIAN (2014): “US Food Aid and Civil Conflict,” American Economic Review,104, 1630–1666. [2242]

PESARAN, H., Y. SHIN, AND R. SMITH (1999): “Pooled Mean Group Estimation of DynamicHeterogeneous Panels,” Journal of the American Statistical Association, 94, 621–634. [2229]

PHILLIPS, P. C. B., AND D. SUL (2007a): “Transition Modeling and Econometric ConvergenceTests,” Econometrica, 75, 1771–1855. [2216,2217,2220]

(2007b): “Bias in Dynamic Panel Estimation With Fixed Effects, Incidental Trends andCross Section Dependence,” Journal of Econometrics, 137, 162–188. [2229]

QIAN, J., AND L. SU (2015): “Shrinkage Estimation of Regression Models With Multi-ple Structural Changes,” Econometric Theory, first published online 23 June 2015, 1–58,DOI:10.1017/S0266466615000237. [2218]

RODRIK, D. (2000): “Saving Transitions,” The World Bank Economic Review, 14, 481–507. [2240]SARAFIDIS, V., AND N. WEBER (2015): “A Partially Heterogenous Framework for Analyzing Panel

Data,” Oxford Bulletin of Economics and Statistics, 77, 274–296. [2217]SU, L., AND Q. CHEN (2013): “Testing Homogeneity in Panel Data Models With Interactive Fixed

Effects,” Econometric Theory, 29, 1079–1135. [2216]SU, L., Z. SHI, AND P. C. B. PHILLIPS (2014): “Identifying Latent Structures in Panel Data,”

Cowles Foundation Discussion Papers 1965, Yale University. [2228](2016): “Supplement to ‘Identifying Latent Structures in Panel Data’,” Econometrica

Supplemental Material, 84, http://dx.doi.org/10.3982/ECTA12560. [2219]SUN, Y. (2005): “Estimation and Inference in Panel Structure Models,” Working Paper, Dept. of

Economics, UCSD. [2217,2220]TIBSHIRANI, R., M. SAUNDERS, S. ROSSET, J. ZHU, AND K. KNIGHT (2005): “Sparsity and

Smoothness via the Fused Lasso,” Journal of the Royal Statistical Society, Series B, 67, 91–108.[2218]

TIBSHIRANI, R. J. (1996): “Regression Shrinkage and Selection via the LASSO,” Journal of theRoyal Statistical Society, Series B, 58, 267–288. [2222]

WANG, H., R. LI, AND C.-L. TSAI (2007): “Tuning Parameter Selectors for the Smoothly ClippedAbsolute Deviation Method,” Biometrika, 94, 553–568. [2227]

WHITE, H. (2001): Asymptotic Theory for Econometricians. London: Emerald. [2232]

Page 51: mysmu.edumysmu.edu/.../Ecta12560_2016_SU_Shi_Phillips.pdf · Econometrica, Vol. 84, No. 6 (November, 2016), 2215–2264 IDENTIFYING LATENT STRUCTURES IN PANEL DATA BY LIANGJUN SU,ZHENTAO

2264 L. SU, Z. SHI, AND P. C. B. PHILLIPS

YUAN, M., AND Y. LIN (2006): “Model Selection and Estimation in Regression With GroupedVariables,” Journal of the Royal Statistical Society, Series B, 68, 49–67. [2218,2221,2222]

School of Economics, Singapore Management University, 90 Stamford Road,Singapore 178903, Singapore; [email protected],

Dept. of Economics, The Chinese University of Hong Kong, Shatin, Hong KongSAR, China; [email protected],

andCowles Foundation for Research in Economics, Yale University, P.O. Box

208281, New Haven, CT 06520, U.S.A., University of Auckland, University ofSouthampton, and Singapore Management University; [email protected].

Co-editor Elie Tamer handled this manuscript.

Manuscript received June, 2014; final revision received December, 2015.