Upload
derek-greer
View
219
Download
1
Tags:
Embed Size (px)
Citation preview
Raymond J. CarrollTexas A&M University
http://stat.tamu.edu/[email protected]
Non/Semiparametric Regression and Clustered/Longitudinal Data
Outline
• Series of Semiparametric Problems: • Panel data• Matched studies• Family studies• Finance applications
Outline
• General Framework: • Likelihood-criterion functions
• Algorithms: kernel-based• General results:
• Semiparametric efficiency• Backfitting and profiling
• Splines and kernels: Summary and conjectures
Xihong Lin Harvard University
Acknowledgments
Basic Problems
• Semiparametric problems
• Parameter of interest, called
• Unknown function
• The key is that the unknown function is evaluated multiple times in computing the likelihood for an individual
θ(•)
β
Example 1: Panel Data
• i = 1,…,n clusters/individuals• j = 1,…,m observations per cluster
Subject Wave 1 Wave 2 … Wave m
1 X X X
2 X X X
… X
n X X X
Example 1: Marginal Parametric Model
• Y = Response• X,Z = time-varying covariates
• General Result: We can improve efficiency for by accounting for correlation: Generalized Least Squares (GLS)
ij ij ij ij
ij
Y =Z +X +ε
cov(ε
θβ
)=Σ
Example 1: Marginal Semiparametric Model
• Y = Response• X,Z = varying covariates
• Question: can we improve efficiency for by accounting for correlation?
ij ij ij ij
ij
Y =Z + (X )+ε
cov(ε
Θβ
)=Σ
Example 1: Marginal Nonparametric Model
• Y = Response• X = varying covariate
• Question: can we improve efficiency by accounting for correlation? (GLS)
ij ij ij
ij
Y = (X )+ε
• unknown function
cov(ε Σ
Θ
Θ
)=
Example 2: Matched Studies
• Prospective logistic model: i = person, S = stratum
• The usual idea is that the stratum-dependent random variables may have been chosen by an extremely weird process, hence impossible to model.
TiS S iS iSpr(Y =1)=H + β+θZ (X )
S
Example 2: Matched Studies
• The usual likelihood is determined by
• Note how the conditioning removes
• Also note: function evaluated twice per stratum
0S 1S 0S 1S
T0S 1S 0S 1S
pr(Y =0,Y =1| Y +Y =1)
=H (Z -Z ) + (X )-θ(Xθβ )
S
θ(•)
Example 3: Model in Finance
• Model in finance
• Note how the function is evaluated m-times for each subject
mj-1
i ij ij=1
Y= β (X )Θ +ε
θ(•)
Example 3: Model in Finance
• Model in finance
• Previous literature used an integration estimator, namely first solved via backfitting:
• Computation was pretty horrible• For us, exact computation, general theory
m
i ij ij=1
jY= (X )Θ +ε
mj-1
i ij ij=1
Y= β (X )Θ +ε
Example 4: Twin Studies
• Family consists of twins, followed longitudinally
• Baseline for each twin modeled nonparametrically via
• Longitudinal modeled parametrically via
i1 i2(X ), (Xθ θ )
ij1 ij2Z ,Z ,β
General Formulation
• These examples all have common features:
• They have a parameter
• They have an unknown function
• The function is evaluated multiple times for each unit (individual, matched pair, family)
• This distinguishes it from standard semiparametric models
θ( )
θ( )
β
General Formulation
• Yij = Response
• Xij,Zij = possibly varying covariates
• Loglikelihood (or criterion function)
• All my examples have the criterion function
i i i1 imY ,Z , , (X ),..., (θ )θβ X
General Formulation: Examples
• Loglikelihood (or criterion function)
• As stated previously, this is not a standard semiparametric problem, because of the multiple function evaluations
i i i1 imY ,Z , , (X ),..., (θ )θβ X
General Formulation: Overview
• Loglikelihood (or criterion function)
• For these problems, I will give constructive methods of estimation with• Asymptotic expansions and inference available
• If the criterion function is a likelihood function, then the methods are semiparametric efficient.• Methods avoid solving integral equations
i i i1 imY ,Z , , (X ),..., (θ )θβ X
The Semiparametric Model
• Y = Response• X,Z = time-varying covariates
• Question: can we improve efficiency for by accounting for correlation, i.e., what method is semiparametric efficient?
ij ij ij ij
ij
Y =Z + (X )+ε
cov(ε
θβ
)=Σ
Semiparametric Efficiency
• The semiparametric efficient score is readily worked out.
• Involves a Fredholm equation of the 2nd kind
• Effectively impossible to solve directly:• Involves densities of each X conditional on the
others
• The usual device of solving integral equations does not work here (or at least is not worth trying)
The Efficient Score (Yuck!)
1 m
1 m
-1eff
m mjk
k eff k j jj 1 k 1
Efficient Score
θ
Fredholm integral equati
X=(X ,...,X )
Z=(Z ,...,Z )
{X- (Z)} {Y X (Z)}
:
0
o
β
E[{X (Z )}| Z z]f (z)
n
My Approach
• First pretend that if you knew , then you could solve for .
• I am going to suggest an algorithm for then estimating
• I am then going to turn to the question of estimating
β
θ( )β
θ( )
• Profile methods work like this.• Fix• Apply your smoother• Call the result • Maximize the Gaussian Loglikelihood
function in
• Explicit solution for most smoothers in Gaussian cases
β
β
(Y-Z )βS
θ̂(X,β)
n
i i i1 imi 1Y ,Z , , (X , ),..., (Xβ β β, )ˆ ˆθ θ
Profiling in Gaussian Problems
• Profile methods maximize
• This can be difficult numerically in nonlinear problems
• A type of backfitting is often much easier numerically
n
i i i1 imi 1Y ,Z , , (X , ),..., (Xβ β β, )ˆ ˆθ θ
Profiling
Backfitting Methods
• Backfitting methods work like this.• Fix• Apply your smoother• Call the result • Maximize the Loglikelihood function in :
• Iterate until convergence (explicit solution for most smoothers, but different from profiling)
newβ
oldβ
old (Y-Zβ )S
oldβ,ˆ(Xθ )
old
n
i i new i1 imi ld1 oY ,Z , , (X , ),..., (X ,ˆ ˆθβ β βθ )
Backfitting/Profiling Example
• Partially linear model, one function
• Define
• Fit the expectations by local linear kernel regression (or whatever)
T θY Z + (β X)+ε
Y=Y-E(Y| X); Z=Z-E(Z| X)
Backfitting/Profiling Example
• The Estimators are
• These are numerically different, but asymptotically equivalent
• The equivalence is a subtle calculation, even in this simple context
1TiB ii i
1Ti
i i
i
F
P i iiR i
β̂
ˆ X
= X Y ;X
β Y
X
X= X
Backfitting/Profiling Example
• The asymptotic equivalence of profiling and backfitting in this partially linear model has one subtlety
• Profiling: off-the-shelf smoothers are OK
• Backfitting: off-the-shelf smoothers need to be undersmoothed to get rid of asymptotic bias
Backfitting/Profiling
• Hu, et al. (2004, Biometrika) showed that in general problems:• Backfitting is generally more variable than
profiling, for linear-type problems• Backfitting and profiling need not
necessarily have the same limit distributions
General Formulation: Revisited
• Yij = Response
• Xij,Zij = varying covariates
• Loglikelihood (or criterion function)
• The key is that the function is evaluated multiple times for each individual
• The goal is to estimate and efficiently
θ( )
i i i1 imY ,Z , , (X ),..., (θ )θβ X
θ( ) β
General Formulation: Revisited
• What I want to show you is a constructive solution, i.e., one that can be computed• Different from solving integral equations• Completely general• Theoretically sound
• The methodology is based on kernel methods, i.e., local methods.
• First a little background
Simple Local Likelihood
• Consider a nonparametric regression with iid data
• The Loglikelihood function is
i i iY= (X )+εθ
2
i i i i
1θY , (X θ) Y- (X )
2
Simple Local Likelihood
• Let K be a density function, and h a bandwidth
• Your target is the function at x• The kernel weights for local likelihood are
• If K is the uniform density, only observations within h of x get any weight
iKX -xh
Simple Local Likelihood
Only observations within h = 0.25 of x = -1.0 get any weight
Simple Local Likelihood
• Near x, the function should be nearly linear
• The idea then is to do a likelihood estimate local to x via weighting, i.e., maximize
• Then announce
n
i1 i
i0i
=1
KX -
Y , + (X - )h
xx
0θ̂(x)
Simple Local Likelihood
• In the linear model, local likelihood is local linear regression
• It is essentially equivalent to loess, splines, etc.
• I’ll now use local likelihood ideas to solve the general problem
General Formulation: Revisited
• Likelihood (or criterion function)
• The goal is to estimate the function at a target value t
• Fix . Pretend that the formulation involves different functions
β
i i i1 imY ,Z , , (X ),..., (θ )θβ X
i i i11 m imY ,Z , , (X )θ θβ ,..., (X )
General Formulation: Revisited
• Pretend that the formulation involves different functions
• Pretend that are known
• Fit a local linear regression via local likelihood:
• Get the local score function for
i i i11 m imY ,Z , , (X )θ θβ ,..., (X )
i1 im2 m(X ),.. θ., (Xθ )
11 i1 0 1 i t)=θ (X (X )
0 1( , )
i1 0 1( , ) A
General Formulation: Revisited
• Repeat: Pretend knowing
• Fit a local linear regression:• Get the local score function• Finally, solve
• Explicit solution in the Gaussian cases
1 j-1 j+1 mi1 im im im(X ),..., (X ),θ θ θ θ(X ),... (X )
jj ij 0 1 i t)=θ (X (X )
ij 0 1( , ) A
n m
ij 0 1i=1 j 1
0 ( , )
= A
Main Results
• Semiparametric Efficient for
• Backfitting (under-smoothed) = profiling
• The equivalence of backfitting and profiling is not obvious in the general case.
β
Main Results
• Explicit variance formulae
• High-order expansions for parameters and functions• Used for estimating population quantities
such as population means, etc.
Marginal Approaches
• The most standard approach is a marginal one
• Often, we can write, for known G,
• Similar would be to write the likelihood function for single observations:
ij ij ij ij ijE(Y | Z ,X ) θ,βG Z , (X )
ij ij ijY ,Z , ,θ(Xβ )
Marginal Approaches
• The marginal approaches ignore the correlation structure
• Lots, and lots, and lots of papers
• Methods tend to be very inefficient if the correlation structure is important
Econometric Example
• In panel data, interest can be in random-fixed effects models
• Our usual variance components model: is independent of everything
• If so, this is a version of our partially linear model, hence already solved by us
ij ij ij i ijY =Z + (X +δ) +εΘβ
iδ
Econometric Example
• Econometricians though worry that is correlated with Z or X
• This says that represents unmeasured variables. This is the fixed-effects model
• They want to know the effects of (X,Z), controlling for individual factors
ij ij ij i ijY =Z + (X +δ) +εΘβ
iδ
iδ
Econometric Example
• Starting model:
• Get rid of the terms, e.g.,
• A special case of our model!
ij ij ij i ijY =Z + (X +δ) +εΘβ
iδ
ij i1 ij i1 ij i1 ij i1Y -Y =(Z -Z ) + (X )- (X )+εΘβ εΘ
Econometric Example
• Model:
• The terms are correlated over j = 2,…,m
• The variance efficiency loss of ignoring these correlations is (2+m)/4
ij i1ε -ε
ij i1 ij i1 ij i1 ij i1Y -Y =(Z -Z ) + (X )- (X )+εΘβ εΘ
Econometric Example
• Example: China Health and Nutrition Survey
• No parametric part• Response Y = caloric intake (log scale)• Predictor X = income• Initial random effects model result
suggests that for very low incomes, an increase in income is NOT associated with an increase in calories
Econometric Example
• Random effects model suggests that for very low incomes, an increase in income is NOT associated with an increase in calories
• The fixed effects model fits with economic theory and common sense
• Specification test confirms this
Econometric Example
• The fixed effects cubic regression fit is far too steep at either end.
• The nonparametric fit makes much more sense
Remarks on Splines
• Splines are a practical alternative to kernels
• Penalized splines (smoothing, P-splines, etc.) with penalty parameter = • Easy to develop, very flexible
• Computable, truly nonparametric
• Difficult theory (Mammen & van der Geer, Mammen & Nielsen)
• In partially linear model for smoothing splines, for example, they are equivalent to kernel methods
Remarks on Splines
• Unpenalized splines
• There are theoretical results for non-penalized splines
• These methods assume fixed, known knots
• Then slowly grow the number of knots
• Theoretically equivalent to our methods
• The theory, and the method, is irrelevant
Unpenalized Splines
No penalty and standard number of knots = crazy curves
Unpenalized Splines
• The theoretical results for unpenalized splines require that the relationship between the number of knots k and the sample size n be
• Every paper in this area does data analysis with <= 5 knots. Why?
1/ 5 5k n or n k
557 16,807 (n k )
Splines With Knot Selection
• There is a nice literature on using fixed-knot splines but with the knots selected
• Basically, use model-selection techniques to zero out some of the coefficients
• This gets the smoothness back
Conclusions
• General likelihood:
• Distinguishing Property: Unknown function evaluated repeatedly in each individual
• Kernel method: Iterated local likelihood calculations, explicit solution in Gaussian cases
i i i1 imY ,Z , , (X ),..., (θ )θβ X
Conclusions
• General results:
• Semiparametric efficient: construction, no integral equations need to be solved
• Backfitting and profiling: asymptotically equivalent
i i i1 imY ,Z , , (X ),..., (θ )θβ X
Conclusions
• Smoothing Splines and Kernels: Asymptotically the same in the Gaussian case
• Splines: generally easier to compute, although smoothing parameter selection can be intensive
• Unpenalized splines: irrelevant theory, need knot selection
i i i1 imY ,Z , , (X ),..., (θ )θβ X
Conclusions
• Splines and Kernels: One might conjecture that splines can be constructed for the general problem that are asymptotically efficient
• Open Problem: is this true, and how?
Thanks!
http://stat.tamu.edu/~carroll
Conjectured Approach
• Mammen and Nielsen worked in a nonlinear least squares context with multiple functions
• Roughly, the obvious version of their method is
• Both methods are semiparametric efficient when profiled
ij ij ij ij ijE(Y | Z ,X ) G Z , (θ X )
n
ij jma i i 1 m j 1 j
i 1
X xargmax Y ,Z ,a(x ),...,a(x ) K dx
h
Conjectured Approach
• Roughly, the obvious version of the Mammen and Nielsen method is
• This can be used for the model
n
ij jma i i 1 m j 1 j
i 1
X xargmax Y ,Z ,a(x ),...,a(x ) K dx
h
mj-1
i ij ij=1
Y= β (X )Θ +ε