Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
High-Dimensional Multivariate BayesianLinear Regression with Shrinkage Priors
Ray Bai
Department of Statistics, University of Florida
Joint work with Dr. Malay Ghosh
March 20, 2018
Ray Bai (University of Florida) MBSP March 20, 2018 1 / 48
Overview
1 Overview of High-Dimensional Multivariate Linear Regression
2 Multivariate Bayesian Model with Shrinkage Priors (MBSP)
3 Posterior Consistency of MBSPLow-Dimensional CaseUltrahigh-Dimensional Case
4 Implementation of the MBSP Model
5 Simulation Study
6 Yeast Cell Cycle Data Analysis
Ray Bai (University of Florida) MBSP March 20, 2018 2 / 48
Simultaneous Prediction and Estimation
There are many scenarios where we would want to simultaneously predictq continuous response variables y1, ..., yq:
Longitudinal data: The q response variables represent measurementsat q consecutive time points.
mRNA levels at different time pointschildren’s heights at different ages of developmentCD4 cell counts over time for HIV/AIDS patients
The data have a group structure: The q response variablesrepresent a “group.”
In genomics, genes within the same pathway often act together inregulating a biological system.
Ray Bai (University of Florida) MBSP March 20, 2018 3 / 48
Multivariate Linear Regression
Consider the multivariate linear regression model,
Y = XB + E,
where Y = (y1, ..., yq) is an n× q response matrix of n samples and qresponse variables, X is an n× p matrix of n samples and p covariates,B ∈ Rp×q is the coefficient matrix, and E = (ε1, ..., εn)
T is an n× q noise
matrix, where ε ii.i.d.∼ Nq(0, Σ), i = 1, ..., n.
Throughout, we assume that X is centered, so there is no intercept term.
Ray Bai (University of Florida) MBSP March 20, 2018 4 / 48
Multivariate Linear Regression
For the multivariate linear regression model,
Yn×q = Xn×pBp×q + En×q,
where E = (ε1, ..., εn)T , ε i
i.i.d.∼ Nq(0, Σ), i = 1, ..., n,
Σ represents the covariance structure of the q response variables.
We wish to estimate the coefficient matrix B.
Model selection from the p covariates is also often desired. This canbe done using multivariate generalizations of AIC, BIC, or Mallow’sCp.
Ray Bai (University of Florida) MBSP March 20, 2018 5 / 48
Multivariate Linear Regression
For the multivariate linear regression model, the usual maximum likelihoodestimator (MLE) is the ordinary least squares estimator,
B̂ = (XTX)−1XTY.
The MLE is only unique if p ≤ n.
It is well-known that the MLE is an inconsistent estimator of B ifp/n→ c, c > 0.
Variable selection using AIC, BIC, and Mallow’s Cp is infeasible forlarge p, since it requires searching over a model space of 2p models.
Ray Bai (University of Florida) MBSP March 20, 2018 6 / 48
High-Dimensional Multivariate Linear Regression
To handle cases where p is large (including the p > n regime), frequentiststypically use penalized regression (e.g. Li et al. (2015), Vincent andHAnsen (2014), Wilms and Croux (2017)):
minB||Y−XB||22 + λ
p
∑i=1
||bi ||2,
where bi represents the ith row of B and λ > 0 is a tuning parameter.
The group lasso penalty, || · ||2, shrinks entire rows of B to exactly 0,leading to a sparse estimate of B and facilitating variable selectionfrom the p estimators.
We can use adaptive group lasso penalty to avoid overshrinkage ofbi , i = 1, ..., p.
Ray Bai (University of Florida) MBSP March 20, 2018 7 / 48
Bayesian High-Dimensional Multivariate Linear Regression
The Bayesian approach is to put a prior distribution on B, π(B). That is,given the model, Y = XB + E and data (X, Y), we have
π(B|Y) ∝ f (Y|X, B)π(B).
Inference can be conducted through the posterior, π(B|Y).
Ray Bai (University of Florida) MBSP March 20, 2018 8 / 48
Bayesian High-Dimensional Multivariate Linear Regression
To achieve sparsity and variable selection, a common approach is to placespike-and-slab priors on the rows of B (e.g. Brown et al. (1998), Liquet etal. (2017)):
bTi
i .i .d .∼ (1− p)δ{0} + pNq(0, τ2V), i = 1, ..., p.
δ{0} represents a point mass at 0 ∈ Rq, and V is a q × q symmetricpositive definite matrix.
τ2 can be treated as a tuning parameter, or a prior can be placed onτ2.
A prior can also be placed on p so that the model adapts to theunderlying sparsity. Usually, we put a Beta prior on p.
Ray Bai (University of Florida) MBSP March 20, 2018 9 / 48
Bayesian High-Dimensional Multivariate Linear Regression
For the spike-and-slab approach,
bTi
i .i .d .∼ (1− p)δ{0} + pNq(0, τ2V), i = 1, ..., p,
τ2 ∼ µ(τ2),p ∼ B(a, b),
Taking the posterior median will give a point estimate of B with rowsequal to 0T , thus recovering a sparse estimate of B and facilitatingvariable selection.
Due to the point mass at 0, this model can be very, very slow forlarge p.
Ray Bai (University of Florida) MBSP March 20, 2018 10 / 48
Bayesian High-Dimensional Multivariate Linear Regression
Due to the computational inefficiency of discontinuous priors, it is oftendesirable to put a continuous prior on the parameters of interest.
For the multivariate linear regression model,
Y = XB + E,
our aim to estimate B.
This requires putting a prior density on a p × q matrix.
A popular continuous prior to place on B is the matrix-normal prior.
Ray Bai (University of Florida) MBSP March 20, 2018 11 / 48
The Matrix-Normal Prior
Definition
A random matrix X is said to have the matrix-normal density if X has thedensity function (on the space Ra×b):
f (X) =|U|−b/2|V|−a/2
(2π)ab/2e−
12 tr[U−1(X−M)V−1(X−M)T],
where M ∈ Ra×b, and U and V are positive semi-definite matrices ofdimension a× a and b× b respectively. If X is distributed as amatrix-normal distribution with pdf above, we writeX ∼ MNa×b(M, U, V).
Ray Bai (University of Florida) MBSP March 20, 2018 12 / 48
Multivariate Bayesian Model with Shrinkage Priors(MBSP)
By adding an additional layer in the Bayesian hierarchy, we can obtain arow-sparse estimate of B. This row-sparse estimate also facilitates variableselection from the p variables. Our model is specified as follows:
Y|X, B, Σ ∼ MNn×q(XB, In, Σ),B|ξ1, ..., ξp, Σ ∼ MNp×q(O, τ diag(ξ1, ..., ξp), Σ),
ξiind∼ π(ξi ), i = 1, ..., p,
where τ > 0 is a tuning parameter, and π(ξi ) is a polynomial-tailed priordensity of the form,
π(ξi ) = K (ξi )−a−1L(ξi ),
where K > 0 is the constant of proportionality, a is positive real number,and L is a a positive measurable, non-constant, slowly varying functionover (0, ∞).
Ray Bai (University of Florida) MBSP March 20, 2018 13 / 48
Examples of Polynomial-Tailed Priors
Prior π(ξi )/C L(ξi )Student’s t ξ−a−1
i exp(−a/ξi ) exp−a/ξiHorseshoe ξ−1/2
i (1 + ξi )−1 ξai /(1 + ξi )
Horseshoe+ ξ−1/2i (ξi − 1)−1 log(ξi ) ξai (ξi − 1)−1 log(ξi )
NEG (1 + ξi )−1−a {ξi/(1 + ξi )}a+1
TPBN ξu−1i (1 + ξi )−a−u {ξi/(1 + ξi )}a+u
GDP∫ ∞
0λ2
2 exp(−λ2ξi
2
)λ2a−1 exp(−ηλ)dλ
∫ ∞0 ta exp(−t − η
√2t/ξi )dt
HIB ξu−1i (1 + ξi )−(a+u) exp
{− s
1+ξi
}{ξi/(1 + ξi )}a+u
×{
φ2 + 1−φ2
1+ξi
}−1× exp
{− s
1+ξi
}{φ2 + 1−φ2
1+ξi
}−1
Table: Polynomial-tailed priors, their respective prior densities for π(ξi ) up to normalizingconstant C , and the slowly-varying component L(ξi ).
Ray Bai (University of Florida) MBSP March 20, 2018 14 / 48
Sparse Estimation of B: Examples
If π(ξj )ind∼ Inverse-Gamma(αj ,
γj
2 ), then the marginal density for B, π(B),under the MBSP model is proportional to
p
∏j=1
(||bj (τΣ)−1/2||22 + γj
)−(αj+q2 )
,
which corresponds to a multivariate t-distribution. Here bj denotes the jthrow of B.
Ray Bai (University of Florida) MBSP March 20, 2018 15 / 48
Sparse Estimation of B: Examples
If π(ξj ) ∝ ξq/2−1j (1 + ξj )−1, then the joint density π(B, ξ1, ..., ξp) under
the MBSP model is proportional to
p
∏j=1
ξ−1j (1 + ξj )
−1e− 1
2ξj||bj (τΣ)−1/2||22 ,
and integrating out the ξj ’s gives a multivariate horseshoe density function.
Ray Bai (University of Florida) MBSP March 20, 2018 16 / 48
Notation
For any two sequences of positive real numbers {an} and {bn} withbn 6= 0,
an = O(bn) if∣∣∣ anbn ∣∣∣ ≤ M for all n, for some positive real number M
independent of nan = o(bn) if limn→∞
anbn
= 0. Therefore, an = o(1) if limn→∞ an = 0.
For a vector v ∈ Rn, ||v ||2 :=√
∑ni=1 v
2i denote the `2 norm.
For a matrix A ∈ Ra×b with entries aij , ||A||F :=√
tr(ATA)
=√
∑ai=1 ∑b
j=1 a2ij denotes the Frobenius norm of A.
For a symmetric matrix A, we denote its minimum and maximumeigenvalues by λmin(A) and λmax(A) respectively.
Ray Bai (University of Florida) MBSP March 20, 2018 17 / 48
Posterior Consistency
Suppose that the data is generated from a true model,
Yn = XB0 + En,
where Yn := (Yn,1, ..., Yn,q) and En ∼ MNn×q(O, In, Σ).
Letting P0 denote the probability measure underlying the true modelabove, we define the following notion of posterior consistency:
Definition
(strong posterior consistency) Let Bn = {Bn : ||Bn −B0||F > ε},where ε > 0. The sequence of posterior distributions of Bn under priorπn(Bn) is said to be strongly consistent under the true model if, for anyε > 0,
Πn(Bn|Yn) = Πn(||Bn −B0||F > ε|Yn)→ 0 a.s. P0 as n→ ∞.
Ray Bai (University of Florida) MBSP March 20, 2018 18 / 48
Sufficient Conditions for Posterior Consistency
For our theoretical analysis, we assume that q < n is fixed and Σ isknown.
In practice, Σ is often unknown and can be estimated from the datausing an Inverse Wishart prior on Σ or by obtaining a separateestimate Σ̂ (e.g. the MLE) and plugging Σ̂ into our model as anempirical Bayes estimate.
Theory is developed separately for:
pn = o(n) (low-dimensional setting)
pn ≥ O(n) (ultrahigh-dimensional setting)
Ray Bai (University of Florida) MBSP March 20, 2018 19 / 48
Regularity Conditions for the Low-Dimensional Case
(A1) pn = o(n) and pn ≤ n for all n ≥ 1.
(A2) There exist constants c1, c2 so that
0 < c1 < lim supn→∞
λmin
(XT
n Xn
n
)≤ lim sup
n→∞λmax
(XT
n Xn
n
)< c2 < ∞.
(A3) There exist constants d1 and d2 so that
0 < d1 < λmin(Σ) ≤ λmax(Σ) < d2 < ∞.
Ray Bai (University of Florida) MBSP March 20, 2018 20 / 48
Sufficient Conditions for Posterior Consistency Whenp = o(n)
Theorem
Assume that conditions (A1)-(A3) hold. Then the posterior of Bn underany prior πn(Bn) is strongly consistent. That is, for any ε > 0,
Πn(Bn|Yn) = Πn(Bn : ||Bn −B0||F > ε|Yn)→ 0 P0 a.s. as n→ ∞
if
Πn
(Bn : ||Bn −B0||F <
∆nρ/2
)> exp(−kn)
for all 0 < ∆ <ε2c1d
1/21
48c1/22 d2
and 0 < k < ε2c132d2− 3∆c1/2
2
2d1/21
, where ρ > 0.
This theorem applies to any prior on Bn. Provided the prior satisfies theabove condition and p = o(n), the posterior is strongly consistent.
Ray Bai (University of Florida) MBSP March 20, 2018 21 / 48
The MBSP Model
Recall the MBSP model:
Y|X, B, Σ ∼ MNn×q(XB, In, Σ),B|ξ1, ..., ξp, Σ ∼ MNp×q(O, τ diag(ξ1, ..., ξpn), Σ),
ξiind∼ π(ξi ), i = 1, ..., pn,
where τn > 0 and π(ξi ) is a polynomial-tailed density of the form,
π(ξi ) = K (ξi )−a−1L(ξi ),
To achieve posterior consistency, we require mild conditions on the slowlyvarying component L(·), τn > 0, and the true unknown coefficients matrixB0.
Ray Bai (University of Florida) MBSP March 20, 2018 22 / 48
Additional Assumptions under the MBSP Model
(i) For the slowly varying function L(t) in the priors for ξi , 1 ≤ i ≤ pn,limt→∞ L(t) ∈ (0, ∞). That is, there exists c0(> 0) such that L(t) ≥ c0
for all t ≥ t0, for some t0 which depends on both L and c0.
(ii) There exists M > 0 so that supj ,k |b0jk | ≤ M < ∞ for all n, i.e. the
maximum entry in B0 is uniformly bounded above in absolute value.
(iii) 0 < τn < 1 for all n, and τn = o(
1pnnρ
)for some ρ > 0.
Ray Bai (University of Florida) MBSP March 20, 2018 23 / 48
Posterior Consistency of MBSP (low-dimensional case)
Theorem
Suppose that we have the MBSP model with polynomial-tailed priors forξ1, ..., ξp. Provided that Assumptions (A1)-(A3) and (i)-(iii) hold, ourmodel achieves strong posterior consistency. That is, for any ε > 0,
Πn(Bn : ||Bn −B0||F > ε|Yn)→ 0 P0 a.s. as n→ 0.
Ray Bai (University of Florida) MBSP March 20, 2018 24 / 48
Ultrahigh-Dimensional Case
We have shown that the MBSP model achieves posterior consistencyunder mild conditions if pn = o(n).
What if pn > n and pn ≥ O(n)?
It turns out that with some additional regularity conditions on the modelsize and the design matrix, we can achieve posterior consistency in thisultrahigh-dimensional setting!
Ray Bai (University of Florida) MBSP March 20, 2018 25 / 48
Regularity Conditions for the Ultrahigh-dimensional Case
(B1) pn > n for all n ≥ 1, and log(pn) = O(nd ) for some 0 < d < 1.
(B2) The rank of Xn is n.
(B3) Let J denote a set of indices, where J ⊂ {1, ..., pn} such that|J | ≤ n. Let XJ denote the submatrix of X that contains the columnswith indices in J . For any such set J , there exists a finite constant
c̃1(> 0) so that lim infn→∞ λmin
(XTJXJn
)≥ c̃1.
(B4) There is finite constant c̃2(> 0) so that
lim supn→∞
λmax
(XT
n Xn
n
)≤ c̃2 < ∞.
(B5) There exist constants d1 and d2 so that
0 < d1 < λmin(Σ) ≤ λmax(Σ) < d2 < ∞.
(B6) The true model S∗ ⊂ {1, ..., pn} is nonempty for all n ands∗ = |S∗| = o(n/log(pn)).
Ray Bai (University of Florida) MBSP March 20, 2018 26 / 48
Sufficient Conditions for Posterior Consistency Whenlog p = o(n)
Theorem
Assume that conditions B1-B6 hold. Then the posterior of Bn under anyprior πn(Bn) is strongly consistent. That is, for any ε > 0,
Πn(Bn|Yn) = Πn(Bn : ||Bn −B0||F > ε|Yn)→ 0 P0 a.s. as n→ ∞
if
Πn
(Bn : ||Bn −B0||F <
∆nρ/2
)> exp(−kn)
for all 0 < ∆̃ <ε2c̃1d
1/21
48c̃1/22 d2
and 0 < k < ε2c̃132d2− 3∆̃c̃1/2
2
2d1/21
, where ρ > 0.
This theorem applies to any prior on Bn. Provided the prior satisfies theabove condition and log p = o(n), the posterior is strongly consistent.
Ray Bai (University of Florida) MBSP March 20, 2018 27 / 48
The MBSP Model
Recall the MBSP model:
Y|X, B, Σ ∼ MNn×q(XB, In, Σ),B|ξ1, ..., ξp, Σ ∼ MNp×q(O, τ diag(ξ1, ..., ξpn), Σ),
ξiind∼ π(ξi ), i = 1, ..., pn,
where τn > 0 and π(ξi ) is a polynomial-tailed density of the form,
π(ξi ) = K (ξi )−a−1L(ξi ),
To achieve posterior consistency, we require mild conditions on the slowlyvarying component L(·), τn > 0, and the true unknown coefficients matrixB0.
Ray Bai (University of Florida) MBSP March 20, 2018 28 / 48
Additional Assumptions under the MBSP Model
(i) For the slowly varying function L(t) in the priors for ξi , 1 ≤ i ≤ pn,limt→∞ L(t) ∈ (0, ∞). That is, there exists c0(> 0) such that L(t) ≥ c0
for all t ≥ t0, for some t0 which depends on both L and c0.
(ii) There exists M > 0 so that supj ,k |b0jk | ≤ M < ∞ for all n, i.e. the
maximum entry in B0 is uniformly bounded above in absolute value.
(iii) 0 < τn < 1 for all n, and τn = o(
1pnnρ
)for some ρ > 0.
Note that these are the same conditions as in the low-dimensionalsetting!
The same rate for τn works for both low-dimensional andhigh-dimensional cases.
Ray Bai (University of Florida) MBSP March 20, 2018 29 / 48
Posterior Consistency of MBSP (ultrahigh-dimensionalcase)
Theorem
Suppose that we have the MBSP model with polynomial-tailed priors forξ1, ..., ξp. Provided that Assumptions (B1)-(B6) and (i)-(iii) hold, ourmodel achieves strong posterior consistency. That is, for any ε > 0,
Πn(Bn : ||Bn −B0||F > ε|Yn)→ 0 P0 a.s. as n→ 0.
Ray Bai (University of Florida) MBSP March 20, 2018 30 / 48
Three Parameter Beta Normal (TPBN) Family
A random variable y said to follow the three parameter beta density,denoted as TPB(u, a, τ), if
π(y) =Γ(u + a)
Γ(u)Γ(a)τaya−1(1− y)u−1 {1− (1− τ)y}−(u+a) .
In univariate regression, a global-local shrinkage prior of the form
βi |τ, ξiind∼ N(0, τξi ), i = 1, ..., n,
π(ξi )ind∼ Γ(u+a)
Γ(u)Γ(a)ξu−1i (1 + ξi )−(u+a), i = 1, ..., n,
may therefore be represented alternatively as
βi |νiind∼ N(0, ν−1
i − 1),
νiind∼ TPB(u, a, τ).
Ray Bai (University of Florida) MBSP March 20, 2018 31 / 48
Three Parameter Beta Normal (TPBN) Family
After integrating out νi in
βi |νiind∼ N(0, ν−1
i − 1),
νiind∼ TPB(u, a, τ),
the marginal prior for βi is said to belong to the three parameter betanormal (TPBN) family.
Special cases of the TPBN family include:
the horseshoe prior (u = 0.5, a = 0.5),
the Strawderman-Berger prior (u = 1, a = 0.5),
the normal-exponential-gamma (NEG) prior (u = 1, a > 0).
Ray Bai (University of Florida) MBSP March 20, 2018 32 / 48
Three Parameter Beta Normal (TPBN) Model
By Proposition 1 of Armagan et al. (2011), the TPBN prior can also bewritten as a hierarchical mixture of two Gamma distributions,
βi |ψi ∼ N(0, ψi ), ψi |ζi ∼ G(u, ζi ), ζi ∼ G(a, τ),
where ψi = ξiτ.
Using the TPBN family as our chosen prior and placing a conjugate prioron Σ, we can construct a specific variant of the MBSP model which wecall the MBSP-TPBN model.
Ray Bai (University of Florida) MBSP March 20, 2018 33 / 48
MBSP-TPBN Model
Reparametrizing ψi = τξi , i = 1, ..., p, we have:
Y|X, B, Σ ∼ MNn×q(XB, In, Σ),B|ψ1, ..., ψp, Σ ∼ MNp×q(O, diag(ψ1, ..., ψp), Σ),
ψi |ζiind∼ G(u, ζi ), i = 1, ..., p,
ζii.i.d.∼ G(a, τ), i = 1, ..., p,
Σ ∼IW(d , kIq),
The MBSP-TPBN model admits a Gibbs sampler.
Ray Bai (University of Florida) MBSP March 20, 2018 34 / 48
Variable Selection
Although the MBSP model and the MBSP-TPBN model produce robustestimates for B, they do not produce exact zeros.
In order to use the MBSP model for variable selection, we recommendlooking at the 95% credible intervals for each entry bij in row i andcolumn j .
If the credible intervals for every single entry in row i , 1 ≤ i ≤ p,contain zero, then we classify predictor i as an irrelevant predictor.
If at least one credible interval in row i , 1 ≤ i ≤ p does not containzero, then we classify i as an active predictor.
Ray Bai (University of Florida) MBSP March 20, 2018 35 / 48
Simulation Study
For our simulation study, we implement the MBSP-TPBN model with thehorseshoe prior (a = u = 0.5), one of the most popular polynomial priors.
We also set:
τ = 1p√n log n
d = 3
k = variance of residuals, Y−XB(0), where B(0) is the initial guessin the Gibbs sampler (taken as a ridge estimator).
Ray Bai (University of Florida) MBSP March 20, 2018 36 / 48
Simulation Study
Our primary interest is in the p > n case. We consider three differentsimulation settings with varying levels of sparsity:
Experiment 1 (p > n): n = 50, p = 200, q = 5. 20 of the predictorsare randomly picked as active (sparse model).
Experiment 2 (p > n): n = 60, p = 100, q = 6. 40 of the predictorsare randomly picked as active (dense model).
Experiment 3 (p � n): n = 100, p = 500, q = 3. 10 of the predictorsare randomly picked as active (ultra-sparse model).
Ray Bai (University of Florida) MBSP March 20, 2018 37 / 48
Simulation Study Metrics
As our point estimate for B, we take the posterior median B̂ = (B̂ij )p×q.We also perform variable selection by inspecting the 95% credibleintervals.
We compute the following metrics, averaged across 100 replications:
MSEest = 100× ||B̂−B||2F /(pq),MSEpred = 100× ||XB̂−XB||2F /(nq),
FDR = FP / (TP + FP),FNR = FN / (TN + FN),
MP = (FP + FN)/(pq),
where FP, TP, FN, and TN denote the number of false positives, truepositives, false negatives, and true negatives respectively.
Ray Bai (University of Florida) MBSP March 20, 2018 38 / 48
Simulation Study
Experiment 1: n = 50, p = 200, q = 5. 20 active predictors
Method MSEest MSEpred FDR FNR MP
MBSP 1.36 117.52 0.0117 0 0.0013MBGL-SS 57.25 694.81 0.858 0.02 0.619LSGL 8.65 169.30 0.788 0 0.374SRRR 17.46 161.70 0.698 0 0.307
Experiment 2: n = 60, p = 100, q = 6. 40 active predictors
Method MSEest MSEpred FDR FNR MP
MBSP 10.969 172.84 0.0249 0 0.0107MBGL-SS 204.33 318.80 0.505 0.1265 0.415LSGL 44.635 188.81 0.544 0 0.479SRRR 242.67 193.64 0.594 0 0.587
Experiment 3: n = 100, p = 500, q = 3. 10 active predictors
Method MSEest MSEpred FDR FNR MP
MBSP 0.185 64.14 0.048 0 0.0011MBGL-SS 1.327 155.51 0.483 0.0005 0.092LSGL 0.2305 72.894 0.849 0 0.117SRRR 0.9841 49.428 0.688 0 0.104
Table: Simulation results for MBSP-TPBN, compared with thee other methods, averagedacross 100 replications.
Ray Bai (University of Florida) MBSP March 20, 2018 39 / 48
Yeast Cell Cycle Data Analysis
Transcription factors (TFs) are sequence-specific DNA binding proteinswhich regulate the transcription of genes from DNA to mRNA by bindingspecific DNA sequences. We want to know which TFs are significant.
In this yeast cell cycle data set (first studied by Chun and Keles (2010)):
mRNA levels are measured at 18 time points seven minutes apart(every 7 minutes for a duration of 119 minutes).
The 542× 18 response matrix Y consists of 542 cell-cycle-regulatedgenes from an α factor arrested method, with columns correspondingto the mRNA levels at the 18 distinct time points. The 542× 106design matrix X consists of the binding information of a total of 106TFs.
We fit the MBSP model to this data set. We assess its predictiveperformance using 5-fold cross validation and perform variable selectionfrom the 106 TFs.
Ray Bai (University of Florida) MBSP March 20, 2018 40 / 48
Yeast Cell Cycle Data Analysis
Method Number of Proteins Selected MSPEMBSP 10 18.491MBGL-SS 7 20.093LSGL 4 22.819SRRR 44 18.204
Table: Results for analysis of the yeast cell cycle data set. The MSPE has been scaled by afactor of 100. In particular, all four models selected the three TFs, ACE2, SWI5, and SWI6 as
significant.
The SRRR method has the lowest MSPE but it recovers a non-parsimonious model. In contrast, MBSP has good predictive performanceand recovers a parsimonious model.
Ray Bai (University of Florida) MBSP March 20, 2018 41 / 48
Yeast Cell Cycle Data Analysis
0 20 40 60 80 100 120
−0.
6−
0.4
−0.
20.
00.
20.
40.
6
ACE2
0 20 40 60 80 100 120
−0.
6−
0.4
−0.
20.
00.
20.
40.
6
HIR1
0 20 40 60 80 100 120
−0.
6−
0.4
−0.
20.
00.
20.
40.
6
NDD1
0 20 40 60 80 100 120
−0.
6−
0.4
−0.
20.
00.
20.
40.
6
SWI6
Figure: Plots of the estimates and 95% credible bands for four of the 10 TFs that were deemedas significant by the MBSP-TPBN model. The x-axis indicates time (minutes) and the y-axis
indicates the estimated coefficients.
Ray Bai (University of Florida) MBSP March 20, 2018 42 / 48
Summary of MBSP Model
We have introduced a new Bayesian approach known as the MultivariateBayesian model with Shrinkage Priors (MBSP) for the multivariate linearregression model, Y = XB + E.
Our model produces a row-sparse estimate of the p × q matrix, B,allowing for sparse estimation and variable selection from the pvariables.
Our model can consistently estimate B even when p � n and pgrows at nearly exponential rate with n (i.e. p = O(en
d), 0 < d < 1.)
A wide variety of polynomial-tailed shrinkage priors may be used, soour model and our theoretical results are quite general.
We illustrated practical application of our model with the threeparameter beta normal family (MBSP-TPBN), using the horseshoeprior as a special case.
Ray Bai (University of Florida) MBSP March 20, 2018 43 / 48
Future Work
Open problems:
Theoretical investigation of MBSP (and Bayesian multivariateregression models in general) when q → ∞ and when Σ is treated asunknown.
Moving beyond consistency, deriving a particular contraction rate ofthe MBSP’s posterior around B0.
Applying polynomial-tailed priors to reduced rank regression andpartial least squares regression.
Ray Bai (University of Florida) MBSP March 20, 2018 44 / 48
Pre-print of Paper
A pre-print of the paper for this presentation is available at:https://arxiv.org/abs/1711.07635
Accepted pending minor revision at Journal of Multivariate Analysis.
Ray Bai (University of Florida) MBSP March 20, 2018 45 / 48
References
Armagan, A., Clyde, M., and Dunson, D.B. (2011) “Generalized Beta Mixtures ofGaussians.” Advances in Neural Information Processing Systems 24, 523-531.
Armagan, A., Dunson, D.B., Lee, J., Bajwa, W., and Strawn, N. (2013) “PosteriorConsistency in Linear Models Under Shrinkage Priors.” Biometrika, 100(4):1011-1018.
Brown, P.J., Vannucci, M., and Fearn, T. (1998) “Multivariate Bayesian VariableSelection and Prediction.” Journal of the Royal Statistical Society: Series B, 60(3):627-641.
Carvalho, C.M., Polson, N.G., and Scott, J.G. (2010) “The Horseshoe Estimatorfor Sparse Signals.” Biometrika, 97(2):465-480.
Ray Bai (University of Florida) MBSP March 20, 2018 46 / 48
References
Chen, L. and Huang, J.Z. (2012) “Sparse Reduced-Rank Regression forSimultaneous Dimension Reduction and Variable Selection.” Journal of theAmerican Statistical Association, 107(500): 1533-1545.
Li, Y., Nan, B., and Zhu, J. (2015) “Multivariate Sparse Group Lasso for theMultivariate Multiple Linear Regression with an Arbitrary Group Structure.”Biometrics, 71(2): 354-363.
Liquet, B., Mengersen, K., Pettitt, A.N., and Sutton, M. (2017) “Bayesian VariableSelection Regression of Multivariate Responses for Group Data.” Bayesian Analysis12(4): 1039-1067.
Tang, X., Xu, X., Ghosh, M., and Ghosh, P. (2017) “Bayesian Variable Selectionand Estimation Based on Global-Local Shrinkage Priors.” Sankhya A.
Ray Bai (University of Florida) MBSP March 20, 2018 47 / 48
Questions?
Ray Bai (University of Florida) MBSP March 20, 2018 48 / 48