Upload
others
View
16
Download
0
Embed Size (px)
Citation preview
Covariance estimation with Cholesky decomposition andgeneralized linear model
Bo Chang
Graphical Models Reading Group
May 22, 2015
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 1 / 21
Modified Cholesky decomposition
Goal: Find a re-parameterization of a covariance matrix that isunconstrained and statistically interpretable.
Assume Y = (Y1, . . . ,Yp)′ is an ordered (time-ordered) randomvector with mean 0 and covariance matrix Σ.
Yt =t−1∑j=1
φt,jYj + εt .
Let σ2t = Var(εt) and
Cov(ε) = diag(σ21, . . . , σ2p) = D.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 2 / 21
Modified Cholesky decomposition
Rearranging
Yt =t−1∑j=1
φt,jYj + εt ,
we have TY = ε, where
T =
1−φ2,1 1−φ3,1 −φ3,2 1
......
. . .
−φp,1 −φp,2 · · · −φp,p−1 1
.
Cov(TY ) = Cov(ε) = TΣT′ = D.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 3 / 21
Modified Cholesky decomposition
Definition: For a positive-definite covariance matrix Σ, its modifiedCholesky decomposition is
TΣT′ = D,
where T is a unique unit lower-triangular matrix having ones on itsdiagonal and D is a unique diagonal matrix.
Precision matrix can be written as
Σ−1 = T′D−1T.
T is unconstrained and statistically meaningful.
T and D can be fitted by regressing a variable Yt on its predecessors.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 4 / 21
Sparse estimation
k-banding:
AR(k) model.
Yt =k∑
i=1
φt,t−iYt−i + εt
The resulting estimate of the precision matrix is also k-banded.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 5 / 21
Sparse estimation
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 6 / 21
Sparse estimation
k-banding:
Nonparametric estimation: Wu and Pourahmadi (2003) used localpolynomial estimators to smooth the subdiagonals of T.
k∑j=0
fj ,p(t/p)Yt−j = σp(t/p)εt ,
where f0,p(·) = 1, fj ,p(·) and σp(·) are continuous functions on [0, 1].εt are independent with mean 0 and variance 1.
φt,t−j = fj ,p(t/p), σt = σp(t/p).
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 7 / 21
Sparse estimation
Lasso penalty: Huang et al. (2006)
Minimize
n log |Σ|+ ntr(D−1TST′) + λ
p∑t=2
t−1∑j=1
|φt,j |.
Zeros are placed in T with no regular patterns.
Sparsity of the precision matrix is not guaranteed.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 8 / 21
Sparse estimation
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 9 / 21
Sparse estimation
Nested lasso penalty / Adaptive banding: Levina et al. (2008)
Minimize
n log |Σ|+ ntr(D−1TST′) + λ
p∑t=2
P(φt),
P(φt) = |φt,t−1|+|φt,t−2||φt,t−1|
+ · · ·+ |φt,1||φt,2|
,
where 0/0 is defined to be zero.
Select the best model that regresses the jth variable on its k closestpredecessors, where k = kj is dependent on j .
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 10 / 21
Sparse estimation
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 11 / 21
Sparse estimation
Forward adaptive banding: Leng and Li. (2011)
Minimize modified BIC:
n log |Σ|+ ntr(D−1TST′) + Cn log(n)
p∑j=1
kj ,
s.t. kj ≤ min{n/(log n)2, j − 1},
where kj is the band length.
Fit AR(kj) to obtain T and D.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 12 / 21
Cholesky decomposition: summary
Cholesky decomposition is dependent on the order in which thevariables appear in the random vector Y .
It works when the variables have a natural ordering.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 13 / 21
GLM for covariance matrices
Another way to reduce number of covariance parameters is to usecovariates, as in modeling the mean vector.
Path of development: linear → log-linear → GLM.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 14 / 21
Linear covariance models
Linear covariance models (LCM):
Σ± = α1U1 + · · ·+ αqUq,
where Ui ’s are some known symmetric basis matrices (covariates) andαi ’s are unknown parameters.
For q = p2, any covariance matrix can be written as:
Σ = (σij) =
p∑i=1
p∑j=1
σijUij ,
where Uij is matrix with 1 on (i , j)th position and 0 elsewhere.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 15 / 21
Linear covariance models
MLE: the score equation of αi is
tr(Σ−1Ui )− tr(SΣ−1UiΣ−1) = 0,
which can be solved by an iterative method.
Constraint: αi ’s are restricted so that the matrix is positive definite.
Lack of interpretation.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 16 / 21
Log-linear covariance models
Log-linear covariance models:
log Σ = α1U1 + · · ·+ αqUq,
αi ’s are now unconstrained.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 17 / 21
GLM via Cholesky decomposition
Pourahmadi (1999):
Cholesky decomposition: Σ−1 = T′D−1T.
T and log D are unconstrained.
Parametric models for φt,j and log σ2t :
log σ2t = z ′tλ, φt,j = w ′t,jγ,
where zt and wt,j are q × 1 and d × 1 vectors of covariates, λ and γare parameters.
Common covariates are powers of times and lags
zt = (1, t, t2, . . . , tq−1)′,
wt,j = (1, t − j , (t − j)2, . . . , (t − j)d−1)′.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 18 / 21
GLM via Cholesky decomposition
Number of parameters: q + d .
Computing MLE is relatively simple:
−2l(λ, γ) = n log |D|+ ntr(D−1TST′).
Given D, the MLE of T has a closed form. Similarly, given T, theMLE of D has a closed form.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 19 / 21
References
Pourahmadi, M. (2011). Covariance estimation: The GLM and regularizationperspectives. Statistical Science, 26(3), 369-387.
Pourahmadi, M. (2013). High-Dimensional Covariance Estimation: WithHigh-Dimensional Data. John Wiley & Sons.
Pourahmadi, M. (1999). Joint mean-covariance models with applications tolongitudinal data: Unconstrained parameterisation. Biometrika, 86(3), 677-690.
Huang, J. Z., Liu, N., Pourahmadi, M., & Liu, L. (2006). Covariance matrixselection and estimation via penalised normal likelihood. Biometrika, 93(1), 85-98.
Leng, C., & Li, B. (2011). Forward adaptive banding for estimating largecovariance matrices. Biometrika, 98(4), 821-830.
Levina, E., Rothman, A., & Zhu, J. (2008). Sparse estimation of large covariancematrices via a nested Lasso penalty. The Annals of Applied Statistics, 2(1),245-263.
Wu, W. B., & Pourahmadi, M. (2003). Nonparametric estimation of largecovariance matrices of longitudinal data. Biometrika, 90(4), 831-844.
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 20 / 21
The End
Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 21 / 21