High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

High-Dimensional Multivariate BayesianLinear Regression with Shrinkage Priors

Ray Bai

Department of Statistics, University of Florida

Joint work with Dr. Malay Ghosh

March 20, 2018

Ray Bai (University of Florida) MBSP March 20, 2018 1 / 48

Overview

1 Overview of High-Dimensional Multivariate Linear Regression

2 Multivariate Bayesian Model with Shrinkage Priors (MBSP)

3 Posterior Consistency of MBSPLow-Dimensional CaseUltrahigh-Dimensional Case

4 Implementation of the MBSP Model

5 Simulation Study

6 Yeast Cell Cycle Data Analysis


Simultaneous Prediction and Estimation

There are many scenarios where we would want to simultaneously predictq continuous response variables y1, ..., yq:

Longitudinal data: The q response variables represent measurementsat q consecutive time points.

mRNA levels at different time pointschildren’s heights at different ages of developmentCD4 cell counts over time for HIV/AIDS patients

The data have a group structure: The q response variablesrepresent a “group.”

In genomics, genes within the same pathway often act together inregulating a biological system.


Multivariate Linear Regression

Consider the multivariate linear regression model,

Y = XB + E,

where Y = (y1, ..., yq) is an n× q response matrix of n samples and qresponse variables, X is an n× p matrix of n samples and p covariates,B ∈ Rp×q is the coefficient matrix, and E = (ε1, ..., εn)

T is an n× q noise

matrix, where ε ii.i.d.∼ Nq(0, Σ), i = 1, ..., n.

Throughout, we assume that X is centered, so there is no intercept term.



For the multivariate linear regression model,

Yn×q = Xn×pBp×q + En×q,

where E = (ε1, ..., εn)T , ε i

i.i.d.∼ Nq(0, Σ), i = 1, ..., n,

Σ represents the covariance structure of the q response variables.

We wish to estimate the coefficient matrix B.

Model selection from the p covariates is also often desired. This canbe done using multivariate generalizations of AIC, BIC, or Mallow’sCp.



For the multivariate linear regression model, the usual maximum likelihoodestimator (MLE) is the ordinary least squares estimator,

B̂ = (XTX)−1XTY.

The MLE is only unique if p ≤ n.

It is well-known that the MLE is an inconsistent estimator of B ifp/n→ c, c > 0.

Variable selection using AIC, BIC, and Mallow’s Cp is infeasible forlarge p, since it requires searching over a model space of 2p models.


High-Dimensional Multivariate Linear Regression

To handle cases where p is large (including the p > n regime), frequentiststypically use penalized regression (e.g. Li et al. (2015), Vincent andHAnsen (2014), Wilms and Croux (2017)):

minB||Y−XB||22 + λ

p

∑i=1

||bi ||2,

where bi represents the ith row of B and λ > 0 is a tuning parameter.

The group lasso penalty, || · ||2, shrinks entire rows of B to exactly 0,leading to a sparse estimate of B and facilitating variable selectionfrom the p estimators.

We can use adaptive group lasso penalty to avoid overshrinkage ofbi , i = 1, ..., p.


Bayesian High-Dimensional Multivariate Linear Regression

The Bayesian approach is to put a prior distribution on B, π(B). That is,given the model, Y = XB + E and data (X, Y), we have

π(B|Y) ∝ f (Y|X, B)π(B).

Inference can be conducted through the posterior, π(B|Y).



To achieve sparsity and variable selection, a common approach is to placespike-and-slab priors on the rows of B (e.g. Brown et al. (1998), Liquet etal. (2017)):

bTi

i .i .d .∼ (1− p)δ{0} + pNq(0, τ2V), i = 1, ..., p.

δ{0} represents a point mass at 0 ∈ Rq, and V is a q × q symmetricpositive definite matrix.

τ2 can be treated as a tuning parameter, or a prior can be placed onτ2.

A prior can also be placed on p so that the model adapts to theunderlying sparsity. Usually, we put a Beta prior on p.



For the spike-and-slab approach,

bTi

i .i .d .∼ (1− p)δ{0} + pNq(0, τ2V), i = 1, ..., p,

τ2 ∼ µ(τ2),p ∼ B(a, b),

Taking the posterior median will give a point estimate of B with rowsequal to 0T , thus recovering a sparse estimate of B and facilitatingvariable selection.

Due to the point mass at 0, this model can be very, very slow forlarge p.



Due to the computational inefficiency of discontinuous priors, it is oftendesirable to put a continuous prior on the parameters of interest.

For the multivariate linear regression model,

Y = XB + E,

our aim to estimate B.

This requires putting a prior density on a p × q matrix.

A popular continuous prior to place on B is the matrix-normal prior.


The Matrix-Normal Prior

Definition

A random matrix X is said to have the matrix-normal density if X has thedensity function (on the space Ra×b):

f (X) =|U|−b/2|V|−a/2

(2π)ab/2e−

12 tr[U−1(X−M)V−1(X−M)T],

where M ∈ Ra×b, and U and V are positive semi-definite matrices ofdimension a× a and b× b respectively. If X is distributed as amatrix-normal distribution with pdf above, we writeX ∼ MNa×b(M, U, V).


Multivariate Bayesian Model with Shrinkage Priors(MBSP)

By adding an additional layer in the Bayesian hierarchy, we can obtain arow-sparse estimate of B. This row-sparse estimate also facilitates variableselection from the p variables. Our model is specified as follows:

Y|X, B, Σ ∼ MNn×q(XB, In, Σ),B|ξ1, ..., ξp, Σ ∼ MNp×q(O, τ diag(ξ1, ..., ξp), Σ),

ξiind∼ π(ξi ), i = 1, ..., p,

where τ > 0 is a tuning parameter, and π(ξi ) is a polynomial-tailed priordensity of the form,

π(ξi ) = K (ξi )−a−1L(ξi ),

where K > 0 is the constant of proportionality, a is positive real number,and L is a a positive measurable, non-constant, slowly varying functionover (0, ∞).


Examples of Polynomial-Tailed Priors

Prior π(ξi )/C L(ξi )Student’s t ξ−a−1

i exp(−a/ξi ) exp−a/ξiHorseshoe ξ−1/2

i (1 + ξi )−1 ξai /(1 + ξi )

Horseshoe+ ξ−1/2i (ξi − 1)−1 log(ξi ) ξai (ξi − 1)−1 log(ξi )

NEG (1 + ξi )−1−a {ξi/(1 + ξi )}a+1

TPBN ξu−1i (1 + ξi )−a−u {ξi/(1 + ξi )}a+u

GDP∫ ∞

0λ2

2 exp(−λ2ξi

2

)λ2a−1 exp(−ηλ)dλ

∫ ∞0 ta exp(−t − η

√2t/ξi )dt

HIB ξu−1i (1 + ξi )−(a+u) exp

{− s

1+ξi

}{ξi/(1 + ξi )}a+u

×{

φ2 + 1−φ2

1+ξi

}−1× exp

{− s

1+ξi

}{φ2 + 1−φ2

1+ξi

}−1

Table: Polynomial-tailed priors, their respective prior densities for π(ξi ) up to normalizingconstant C , and the slowly-varying component L(ξi ).


Sparse Estimation of B: Examples

If π(ξj )ind∼ Inverse-Gamma(αj ,

γj

2 ), then the marginal density for B, π(B),under the MBSP model is proportional to

p

∏j=1

(||bj (τΣ)−1/2||22 + γj

)−(αj+q2 )

,

which corresponds to a multivariate t-distribution. Here bj denotes the jthrow of B.


Sparse Estimation of B: Examples

If π(ξj ) ∝ ξq/2−1j (1 + ξj )−1, then the joint density π(B, ξ1, ..., ξp) under

the MBSP model is proportional to

p

∏j=1

ξ−1j (1 + ξj )

−1e− 1

2ξj||bj (τΣ)−1/2||22 ,

and integrating out the ξj ’s gives a multivariate horseshoe density function.


Notation

For any two sequences of positive real numbers {an} and {bn} withbn 6= 0,

an = O(bn) if∣∣∣ anbn ∣∣∣ ≤ M for all n, for some positive real number M

independent of nan = o(bn) if limn→∞

anbn

= 0. Therefore, an = o(1) if limn→∞ an = 0.

For a vector v ∈ Rn, ||v ||2 :=√

∑ni=1 v

2i denote the `2 norm.

For a matrix A ∈ Ra×b with entries aij , ||A||F :=√

tr(ATA)

=√

∑ai=1 ∑b

j=1 a2ij denotes the Frobenius norm of A.

For a symmetric matrix A, we denote its minimum and maximumeigenvalues by λmin(A) and λmax(A) respectively.


Posterior Consistency

Suppose that the data is generated from a true model,

Yn = XB0 + En,

where Yn := (Yn,1, ..., Yn,q) and En ∼ MNn×q(O, In, Σ).

Letting P0 denote the probability measure underlying the true modelabove, we define the following notion of posterior consistency:

Definition

(strong posterior consistency) Let Bn = {Bn : ||Bn −B0||F > ε},where ε > 0. The sequence of posterior distributions of Bn under priorπn(Bn) is said to be strongly consistent under the true model if, for anyε > 0,

Πn(Bn|Yn) = Πn(||Bn −B0||F > ε|Yn)→ 0 a.s. P0 as n→ ∞.


Sufficient Conditions for Posterior Consistency

For our theoretical analysis, we assume that q < n is fixed and Σ isknown.

In practice, Σ is often unknown and can be estimated from the datausing an Inverse Wishart prior on Σ or by obtaining a separateestimate Σ̂ (e.g. the MLE) and plugging Σ̂ into our model as anempirical Bayes estimate.

Theory is developed separately for:

pn = o(n) (low-dimensional setting)

pn ≥ O(n) (ultrahigh-dimensional setting)


Regularity Conditions for the Low-Dimensional Case

(A1) pn = o(n) and pn ≤ n for all n ≥ 1.

(A2) There exist constants c1, c2 so that

0 < c1 < lim supn→∞

λmin

(XT

n Xn

n

)≤ lim sup

n→∞λmax

(XT

n Xn

n

)< c2 < ∞.

(A3) There exist constants d1 and d2 so that

0 < d1 < λmin(Σ) ≤ λmax(Σ) < d2 < ∞.


Sufficient Conditions for Posterior Consistency Whenp = o(n)

Theorem

Assume that conditions (A1)-(A3) hold. Then the posterior of Bn underany prior πn(Bn) is strongly consistent. That is, for any ε > 0,

Πn(Bn|Yn) = Πn(Bn : ||Bn −B0||F > ε|Yn)→ 0 P0 a.s. as n→ ∞

if

Πn

(Bn : ||Bn −B0||F <

∆nρ/2

)> exp(−kn)

for all 0 < ∆ <ε2c1d

1/21

48c1/22 d2

and 0 < k < ε2c132d2− 3∆c1/2

2

2d1/21

, where ρ > 0.

This theorem applies to any prior on Bn. Provided the prior satisfies theabove condition and p = o(n), the posterior is strongly consistent.


The MBSP Model

Recall the MBSP model:

Y|X, B, Σ ∼ MNn×q(XB, In, Σ),B|ξ1, ..., ξp, Σ ∼ MNp×q(O, τ diag(ξ1, ..., ξpn), Σ),

ξiind∼ π(ξi ), i = 1, ..., pn,

where τn > 0 and π(ξi ) is a polynomial-tailed density of the form,

π(ξi ) = K (ξi )−a−1L(ξi ),

To achieve posterior consistency, we require mild conditions on the slowlyvarying component L(·), τn > 0, and the true unknown coefficients matrixB0.


Additional Assumptions under the MBSP Model

(i) For the slowly varying function L(t) in the priors for ξi , 1 ≤ i ≤ pn,limt→∞ L(t) ∈ (0, ∞). That is, there exists c0(> 0) such that L(t) ≥ c0

for all t ≥ t0, for some t0 which depends on both L and c0.

(ii) There exists M > 0 so that supj ,k |b0jk | ≤ M < ∞ for all n, i.e. the

maximum entry in B0 is uniformly bounded above in absolute value.

(iii) 0 < τn < 1 for all n, and τn = o(

1pnnρ

)for some ρ > 0.


Posterior Consistency of MBSP (low-dimensional case)

Theorem

Suppose that we have the MBSP model with polynomial-tailed priors forξ1, ..., ξp. Provided that Assumptions (A1)-(A3) and (i)-(iii) hold, ourmodel achieves strong posterior consistency. That is, for any ε > 0,

Πn(Bn : ||Bn −B0||F > ε|Yn)→ 0 P0 a.s. as n→ 0.


Ultrahigh-Dimensional Case

We have shown that the MBSP model achieves posterior consistencyunder mild conditions if pn = o(n).

What if pn > n and pn ≥ O(n)?

It turns out that with some additional regularity conditions on the modelsize and the design matrix, we can achieve posterior consistency in thisultrahigh-dimensional setting!


Regularity Conditions for the Ultrahigh-dimensional Case

(B1) pn > n for all n ≥ 1, and log(pn) = O(nd ) for some 0 < d < 1.

(B2) The rank of Xn is n.

(B3) Let J denote a set of indices, where J ⊂ {1, ..., pn} such that|J | ≤ n. Let XJ denote the submatrix of X that contains the columnswith indices in J . For any such set J , there exists a finite constant

c̃1(> 0) so that lim infn→∞ λmin

(XTJXJn

)≥ c̃1.

(B4) There is finite constant c̃2(> 0) so that

lim supn→∞

λmax

(XT

n Xn

n

)≤ c̃2 < ∞.

(B5) There exist constants d1 and d2 so that

0 < d1 < λmin(Σ) ≤ λmax(Σ) < d2 < ∞.

(B6) The true model S∗ ⊂ {1, ..., pn} is nonempty for all n ands∗ = |S∗| = o(n/log(pn)).


Sufficient Conditions for Posterior Consistency Whenlog p = o(n)

Theorem

Assume that conditions B1-B6 hold. Then the posterior of Bn under anyprior πn(Bn) is strongly consistent. That is, for any ε > 0,

Πn(Bn|Yn) = Πn(Bn : ||Bn −B0||F > ε|Yn)→ 0 P0 a.s. as n→ ∞

if

Πn

(Bn : ||Bn −B0||F <

∆nρ/2

)> exp(−kn)

for all 0 < ∆̃ <ε2c̃1d

1/21

48c̃1/22 d2

and 0 < k < ε2c̃132d2− 3∆̃c̃1/2

2

2d1/21

, where ρ > 0.

This theorem applies to any prior on Bn. Provided the prior satisfies theabove condition and log p = o(n), the posterior is strongly consistent.


The MBSP Model

Recall the MBSP model:

Y|X, B, Σ ∼ MNn×q(XB, In, Σ),B|ξ1, ..., ξp, Σ ∼ MNp×q(O, τ diag(ξ1, ..., ξpn), Σ),

ξiind∼ π(ξi ), i = 1, ..., pn,

where τn > 0 and π(ξi ) is a polynomial-tailed density of the form,

π(ξi ) = K (ξi )−a−1L(ξi ),

To achieve posterior consistency, we require mild conditions on the slowlyvarying component L(·), τn > 0, and the true unknown coefficients matrixB0.


Additional Assumptions under the MBSP Model

(i) For the slowly varying function L(t) in the priors for ξi , 1 ≤ i ≤ pn,limt→∞ L(t) ∈ (0, ∞). That is, there exists c0(> 0) such that L(t) ≥ c0

for all t ≥ t0, for some t0 which depends on both L and c0.

(ii) There exists M > 0 so that supj ,k |b0jk | ≤ M < ∞ for all n, i.e. the

maximum entry in B0 is uniformly bounded above in absolute value.

(iii) 0 < τn < 1 for all n, and τn = o(

1pnnρ

)for some ρ > 0.

Note that these are the same conditions as in the low-dimensionalsetting!

The same rate for τn works for both low-dimensional andhigh-dimensional cases.


Posterior Consistency of MBSP (ultrahigh-dimensionalcase)

Theorem

Suppose that we have the MBSP model with polynomial-tailed priors forξ1, ..., ξp. Provided that Assumptions (B1)-(B6) and (i)-(iii) hold, ourmodel achieves strong posterior consistency. That is, for any ε > 0,

Πn(Bn : ||Bn −B0||F > ε|Yn)→ 0 P0 a.s. as n→ 0.


Three Parameter Beta Normal (TPBN) Family

A random variable y said to follow the three parameter beta density,denoted as TPB(u, a, τ), if

π(y) =Γ(u + a)

Γ(u)Γ(a)τaya−1(1− y)u−1 {1− (1− τ)y}−(u+a) .

In univariate regression, a global-local shrinkage prior of the form

βi |τ, ξiind∼ N(0, τξi ), i = 1, ..., n,

π(ξi )ind∼ Γ(u+a)

Γ(u)Γ(a)ξu−1i (1 + ξi )−(u+a), i = 1, ..., n,

may therefore be represented alternatively as

βi |νiind∼ N(0, ν−1

i − 1),

νiind∼ TPB(u, a, τ).


Three Parameter Beta Normal (TPBN) Family

After integrating out νi in

βi |νiind∼ N(0, ν−1

i − 1),

νiind∼ TPB(u, a, τ),

the marginal prior for βi is said to belong to the three parameter betanormal (TPBN) family.

Special cases of the TPBN family include:

the horseshoe prior (u = 0.5, a = 0.5),

the Strawderman-Berger prior (u = 1, a = 0.5),

the normal-exponential-gamma (NEG) prior (u = 1, a > 0).


Three Parameter Beta Normal (TPBN) Model

By Proposition 1 of Armagan et al. (2011), the TPBN prior can also bewritten as a hierarchical mixture of two Gamma distributions,

βi |ψi ∼ N(0, ψi ), ψi |ζi ∼ G(u, ζi ), ζi ∼ G(a, τ),

where ψi = ξiτ.

Using the TPBN family as our chosen prior and placing a conjugate prioron Σ, we can construct a specific variant of the MBSP model which wecall the MBSP-TPBN model.


MBSP-TPBN Model

Reparametrizing ψi = τξi , i = 1, ..., p, we have:

Y|X, B, Σ ∼ MNn×q(XB, In, Σ),B|ψ1, ..., ψp, Σ ∼ MNp×q(O, diag(ψ1, ..., ψp), Σ),

ψi |ζiind∼ G(u, ζi ), i = 1, ..., p,

ζii.i.d.∼ G(a, τ), i = 1, ..., p,

Σ ∼IW(d , kIq),

The MBSP-TPBN model admits a Gibbs sampler.


Variable Selection

Although the MBSP model and the MBSP-TPBN model produce robustestimates for B, they do not produce exact zeros.

In order to use the MBSP model for variable selection, we recommendlooking at the 95% credible intervals for each entry bij in row i andcolumn j .

If the credible intervals for every single entry in row i , 1 ≤ i ≤ p,contain zero, then we classify predictor i as an irrelevant predictor.

If at least one credible interval in row i , 1 ≤ i ≤ p does not containzero, then we classify i as an active predictor.


Simulation Study

For our simulation study, we implement the MBSP-TPBN model with thehorseshoe prior (a = u = 0.5), one of the most popular polynomial priors.

We also set:

τ = 1p√n log n

d = 3

k = variance of residuals, Y−XB(0), where B(0) is the initial guessin the Gibbs sampler (taken as a ridge estimator).


Simulation Study

Our primary interest is in the p > n case. We consider three differentsimulation settings with varying levels of sparsity:

Experiment 1 (p > n): n = 50, p = 200, q = 5. 20 of the predictorsare randomly picked as active (sparse model).

Experiment 2 (p > n): n = 60, p = 100, q = 6. 40 of the predictorsare randomly picked as active (dense model).

Experiment 3 (p � n): n = 100, p = 500, q = 3. 10 of the predictorsare randomly picked as active (ultra-sparse model).


Simulation Study Metrics

As our point estimate for B, we take the posterior median B̂ = (B̂ij )p×q.We also perform variable selection by inspecting the 95% credibleintervals.

We compute the following metrics, averaged across 100 replications:

MSEest = 100× ||B̂−B||2F /(pq),MSEpred = 100× ||XB̂−XB||2F /(nq),

FDR = FP / (TP + FP),FNR = FN / (TN + FN),

MP = (FP + FN)/(pq),

where FP, TP, FN, and TN denote the number of false positives, truepositives, false negatives, and true negatives respectively.


Simulation Study

Experiment 1: n = 50, p = 200, q = 5. 20 active predictors

Method MSEest MSEpred FDR FNR MP

MBSP 1.36 117.52 0.0117 0 0.0013MBGL-SS 57.25 694.81 0.858 0.02 0.619LSGL 8.65 169.30 0.788 0 0.374SRRR 17.46 161.70 0.698 0 0.307







Table: Simulation results for MBSP-TPBN, compared with thee other methods, averagedacross 100 replications.


Yeast Cell Cycle Data Analysis

Transcription factors (TFs) are sequence-specific DNA binding proteinswhich regulate the transcription of genes from DNA to mRNA by bindingspecific DNA sequences. We want to know which TFs are significant.

In this yeast cell cycle data set (first studied by Chun and Keles (2010)):

mRNA levels are measured at 18 time points seven minutes apart(every 7 minutes for a duration of 119 minutes).

The 542× 18 response matrix Y consists of 542 cell-cycle-regulatedgenes from an α factor arrested method, with columns correspondingto the mRNA levels at the 18 distinct time points. The 542× 106design matrix X consists of the binding information of a total of 106TFs.

We fit the MBSP model to this data set. We assess its predictiveperformance using 5-fold cross validation and perform variable selectionfrom the 106 TFs.



Method Number of Proteins Selected MSPEMBSP 10 18.491MBGL-SS 7 20.093LSGL 4 22.819SRRR 44 18.204

Table: Results for analysis of the yeast cell cycle data set. The MSPE has been scaled by afactor of 100. In particular, all four models selected the three TFs, ACE2, SWI5, and SWI6 as

significant.

The SRRR method has the lowest MSPE but it recovers a non-parsimonious model. In contrast, MBSP has good predictive performanceand recovers a parsimonious model.



0 20 40 60 80 100 120

−0.

6−

0.4

−0.

20.

00.

20.

40.

6

ACE2

0 20 40 60 80 100 120

−0.

6−

0.4

−0.

20.

00.

20.

40.

6

HIR1

0 20 40 60 80 100 120

−0.

6−

0.4

−0.

20.

00.

20.

40.

6

NDD1

0 20 40 60 80 100 120

−0.

6−

0.4

−0.

20.

00.

20.

40.

6

SWI6

Figure: Plots of the estimates and 95% credible bands for four of the 10 TFs that were deemedas significant by the MBSP-TPBN model. The x-axis indicates time (minutes) and the y-axis

indicates the estimated coefficients.


Summary of MBSP Model

We have introduced a new Bayesian approach known as the MultivariateBayesian model with Shrinkage Priors (MBSP) for the multivariate linearregression model, Y = XB + E.

Our model produces a row-sparse estimate of the p × q matrix, B,allowing for sparse estimation and variable selection from the pvariables.

Our model can consistently estimate B even when p � n and pgrows at nearly exponential rate with n (i.e. p = O(en

d), 0 < d < 1.)

A wide variety of polynomial-tailed shrinkage priors may be used, soour model and our theoretical results are quite general.

We illustrated practical application of our model with the threeparameter beta normal family (MBSP-TPBN), using the horseshoeprior as a special case.


Future Work

Open problems:

Theoretical investigation of MBSP (and Bayesian multivariateregression models in general) when q → ∞ and when Σ is treated asunknown.

Moving beyond consistency, deriving a particular contraction rate ofthe MBSP’s posterior around B0.

Applying polynomial-tailed priors to reduced rank regression andpartial least squares regression.


Pre-print of Paper

A pre-print of the paper for this presentation is available at:https://arxiv.org/abs/1711.07635

Accepted pending minor revision at Journal of Multivariate Analysis.


References

Armagan, A., Clyde, M., and Dunson, D.B. (2011) “Generalized Beta Mixtures ofGaussians.” Advances in Neural Information Processing Systems 24, 523-531.

Armagan, A., Dunson, D.B., Lee, J., Bajwa, W., and Strawn, N. (2013) “PosteriorConsistency in Linear Models Under Shrinkage Priors.” Biometrika, 100(4):1011-1018.

Brown, P.J., Vannucci, M., and Fearn, T. (1998) “Multivariate Bayesian VariableSelection and Prediction.” Journal of the Royal Statistical Society: Series B, 60(3):627-641.

Carvalho, C.M., Polson, N.G., and Scott, J.G. (2010) “The Horseshoe Estimatorfor Sparse Signals.” Biometrika, 97(2):465-480.


References

Chen, L. and Huang, J.Z. (2012) “Sparse Reduced-Rank Regression forSimultaneous Dimension Reduction and Variable Selection.” Journal of theAmerican Statistical Association, 107(500): 1533-1545.

Li, Y., Nan, B., and Zhu, J. (2015) “Multivariate Sparse Group Lasso for theMultivariate Multiple Linear Regression with an Arbitrary Group Structure.”Biometrics, 71(2): 354-363.

Liquet, B., Mengersen, K., Pettitt, A.N., and Sutton, M. (2017) “Bayesian VariableSelection Regression of Multivariate Responses for Group Data.” Bayesian Analysis12(4): 1039-1067.

Tang, X., Xu, X., Ghosh, M., and Ghosh, P. (2017) “Bayesian Variable Selectionand Estimation Based on Global-Local Shrinkage Priors.” Sankhya A.


Questions?


Documents

High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University