48
High-Dimensional Multivariate Bayesian Linear Regression with Shrinkage Priors Ray Bai Department of Statistics, University of Florida Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University of Florida) MBSP March 20, 2018 1 / 48

High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

High-Dimensional Multivariate BayesianLinear Regression with Shrinkage Priors

Ray Bai

Department of Statistics, University of Florida

Joint work with Dr. Malay Ghosh

March 20, 2018

Ray Bai (University of Florida) MBSP March 20, 2018 1 / 48

Page 2: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Overview

1 Overview of High-Dimensional Multivariate Linear Regression

2 Multivariate Bayesian Model with Shrinkage Priors (MBSP)

3 Posterior Consistency of MBSPLow-Dimensional CaseUltrahigh-Dimensional Case

4 Implementation of the MBSP Model

5 Simulation Study

6 Yeast Cell Cycle Data Analysis

Ray Bai (University of Florida) MBSP March 20, 2018 2 / 48

Page 3: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Simultaneous Prediction and Estimation

There are many scenarios where we would want to simultaneously predictq continuous response variables y1, ..., yq:

Longitudinal data: The q response variables represent measurementsat q consecutive time points.

mRNA levels at different time pointschildren’s heights at different ages of developmentCD4 cell counts over time for HIV/AIDS patients

The data have a group structure: The q response variablesrepresent a “group.”

In genomics, genes within the same pathway often act together inregulating a biological system.

Ray Bai (University of Florida) MBSP March 20, 2018 3 / 48

Page 4: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Multivariate Linear Regression

Consider the multivariate linear regression model,

Y = XB + E,

where Y = (y1, ..., yq) is an n× q response matrix of n samples and qresponse variables, X is an n× p matrix of n samples and p covariates,B ∈ Rp×q is the coefficient matrix, and E = (ε1, ..., εn)

T is an n× q noise

matrix, where ε ii.i.d.∼ Nq(0, Σ), i = 1, ..., n.

Throughout, we assume that X is centered, so there is no intercept term.

Ray Bai (University of Florida) MBSP March 20, 2018 4 / 48

Page 5: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Multivariate Linear Regression

For the multivariate linear regression model,

Yn×q = Xn×pBp×q + En×q,

where E = (ε1, ..., εn)T , ε i

i.i.d.∼ Nq(0, Σ), i = 1, ..., n,

Σ represents the covariance structure of the q response variables.

We wish to estimate the coefficient matrix B.

Model selection from the p covariates is also often desired. This canbe done using multivariate generalizations of AIC, BIC, or Mallow’sCp.

Ray Bai (University of Florida) MBSP March 20, 2018 5 / 48

Page 6: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Multivariate Linear Regression

For the multivariate linear regression model, the usual maximum likelihoodestimator (MLE) is the ordinary least squares estimator,

B̂ = (XTX)−1XTY.

The MLE is only unique if p ≤ n.

It is well-known that the MLE is an inconsistent estimator of B ifp/n→ c, c > 0.

Variable selection using AIC, BIC, and Mallow’s Cp is infeasible forlarge p, since it requires searching over a model space of 2p models.

Ray Bai (University of Florida) MBSP March 20, 2018 6 / 48

Page 7: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

High-Dimensional Multivariate Linear Regression

To handle cases where p is large (including the p > n regime), frequentiststypically use penalized regression (e.g. Li et al. (2015), Vincent andHAnsen (2014), Wilms and Croux (2017)):

minB||Y−XB||22 + λ

p

∑i=1

||bi ||2,

where bi represents the ith row of B and λ > 0 is a tuning parameter.

The group lasso penalty, || · ||2, shrinks entire rows of B to exactly 0,leading to a sparse estimate of B and facilitating variable selectionfrom the p estimators.

We can use adaptive group lasso penalty to avoid overshrinkage ofbi , i = 1, ..., p.

Ray Bai (University of Florida) MBSP March 20, 2018 7 / 48

Page 8: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Bayesian High-Dimensional Multivariate Linear Regression

The Bayesian approach is to put a prior distribution on B, π(B). That is,given the model, Y = XB + E and data (X, Y), we have

π(B|Y) ∝ f (Y|X, B)π(B).

Inference can be conducted through the posterior, π(B|Y).

Ray Bai (University of Florida) MBSP March 20, 2018 8 / 48

Page 9: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Bayesian High-Dimensional Multivariate Linear Regression

To achieve sparsity and variable selection, a common approach is to placespike-and-slab priors on the rows of B (e.g. Brown et al. (1998), Liquet etal. (2017)):

bTi

i .i .d .∼ (1− p)δ{0} + pNq(0, τ2V), i = 1, ..., p.

δ{0} represents a point mass at 0 ∈ Rq, and V is a q × q symmetricpositive definite matrix.

τ2 can be treated as a tuning parameter, or a prior can be placed onτ2.

A prior can also be placed on p so that the model adapts to theunderlying sparsity. Usually, we put a Beta prior on p.

Ray Bai (University of Florida) MBSP March 20, 2018 9 / 48

Page 10: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Bayesian High-Dimensional Multivariate Linear Regression

For the spike-and-slab approach,

bTi

i .i .d .∼ (1− p)δ{0} + pNq(0, τ2V), i = 1, ..., p,

τ2 ∼ µ(τ2),p ∼ B(a, b),

Taking the posterior median will give a point estimate of B with rowsequal to 0T , thus recovering a sparse estimate of B and facilitatingvariable selection.

Due to the point mass at 0, this model can be very, very slow forlarge p.

Ray Bai (University of Florida) MBSP March 20, 2018 10 / 48

Page 11: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Bayesian High-Dimensional Multivariate Linear Regression

Due to the computational inefficiency of discontinuous priors, it is oftendesirable to put a continuous prior on the parameters of interest.

For the multivariate linear regression model,

Y = XB + E,

our aim to estimate B.

This requires putting a prior density on a p × q matrix.

A popular continuous prior to place on B is the matrix-normal prior.

Ray Bai (University of Florida) MBSP March 20, 2018 11 / 48

Page 12: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

The Matrix-Normal Prior

Definition

A random matrix X is said to have the matrix-normal density if X has thedensity function (on the space Ra×b):

f (X) =|U|−b/2|V|−a/2

(2π)ab/2e−

12 tr[U−1(X−M)V−1(X−M)T],

where M ∈ Ra×b, and U and V are positive semi-definite matrices ofdimension a× a and b× b respectively. If X is distributed as amatrix-normal distribution with pdf above, we writeX ∼ MNa×b(M, U, V).

Ray Bai (University of Florida) MBSP March 20, 2018 12 / 48

Page 13: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Multivariate Bayesian Model with Shrinkage Priors(MBSP)

By adding an additional layer in the Bayesian hierarchy, we can obtain arow-sparse estimate of B. This row-sparse estimate also facilitates variableselection from the p variables. Our model is specified as follows:

Y|X, B, Σ ∼ MNn×q(XB, In, Σ),B|ξ1, ..., ξp, Σ ∼ MNp×q(O, τ diag(ξ1, ..., ξp), Σ),

ξiind∼ π(ξi ), i = 1, ..., p,

where τ > 0 is a tuning parameter, and π(ξi ) is a polynomial-tailed priordensity of the form,

π(ξi ) = K (ξi )−a−1L(ξi ),

where K > 0 is the constant of proportionality, a is positive real number,and L is a a positive measurable, non-constant, slowly varying functionover (0, ∞).

Ray Bai (University of Florida) MBSP March 20, 2018 13 / 48

Page 14: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Examples of Polynomial-Tailed Priors

Prior π(ξi )/C L(ξi )Student’s t ξ−a−1

i exp(−a/ξi ) exp−a/ξiHorseshoe ξ−1/2

i (1 + ξi )−1 ξai /(1 + ξi )

Horseshoe+ ξ−1/2i (ξi − 1)−1 log(ξi ) ξai (ξi − 1)−1 log(ξi )

NEG (1 + ξi )−1−a {ξi/(1 + ξi )}a+1

TPBN ξu−1i (1 + ξi )−a−u {ξi/(1 + ξi )}a+u

GDP∫ ∞

0λ2

2 exp(−λ2ξi

2

)λ2a−1 exp(−ηλ)dλ

∫ ∞0 ta exp(−t − η

√2t/ξi )dt

HIB ξu−1i (1 + ξi )−(a+u) exp

{− s

1+ξi

}{ξi/(1 + ξi )}a+u

×{

φ2 + 1−φ2

1+ξi

}−1× exp

{− s

1+ξi

}{φ2 + 1−φ2

1+ξi

}−1

Table: Polynomial-tailed priors, their respective prior densities for π(ξi ) up to normalizingconstant C , and the slowly-varying component L(ξi ).

Ray Bai (University of Florida) MBSP March 20, 2018 14 / 48

Page 15: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Sparse Estimation of B: Examples

If π(ξj )ind∼ Inverse-Gamma(αj ,

γj

2 ), then the marginal density for B, π(B),under the MBSP model is proportional to

p

∏j=1

(||bj (τΣ)−1/2||22 + γj

)−(αj+q2 )

,

which corresponds to a multivariate t-distribution. Here bj denotes the jthrow of B.

Ray Bai (University of Florida) MBSP March 20, 2018 15 / 48

Page 16: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Sparse Estimation of B: Examples

If π(ξj ) ∝ ξq/2−1j (1 + ξj )−1, then the joint density π(B, ξ1, ..., ξp) under

the MBSP model is proportional to

p

∏j=1

ξ−1j (1 + ξj )

−1e− 1

2ξj||bj (τΣ)−1/2||22 ,

and integrating out the ξj ’s gives a multivariate horseshoe density function.

Ray Bai (University of Florida) MBSP March 20, 2018 16 / 48

Page 17: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Notation

For any two sequences of positive real numbers {an} and {bn} withbn 6= 0,

an = O(bn) if∣∣∣ anbn ∣∣∣ ≤ M for all n, for some positive real number M

independent of nan = o(bn) if limn→∞

anbn

= 0. Therefore, an = o(1) if limn→∞ an = 0.

For a vector v ∈ Rn, ||v ||2 :=√

∑ni=1 v

2i denote the `2 norm.

For a matrix A ∈ Ra×b with entries aij , ||A||F :=√

tr(ATA)

=√

∑ai=1 ∑b

j=1 a2ij denotes the Frobenius norm of A.

For a symmetric matrix A, we denote its minimum and maximumeigenvalues by λmin(A) and λmax(A) respectively.

Ray Bai (University of Florida) MBSP March 20, 2018 17 / 48

Page 18: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Posterior Consistency

Suppose that the data is generated from a true model,

Yn = XB0 + En,

where Yn := (Yn,1, ..., Yn,q) and En ∼ MNn×q(O, In, Σ).

Letting P0 denote the probability measure underlying the true modelabove, we define the following notion of posterior consistency:

Definition

(strong posterior consistency) Let Bn = {Bn : ||Bn −B0||F > ε},where ε > 0. The sequence of posterior distributions of Bn under priorπn(Bn) is said to be strongly consistent under the true model if, for anyε > 0,

Πn(Bn|Yn) = Πn(||Bn −B0||F > ε|Yn)→ 0 a.s. P0 as n→ ∞.

Ray Bai (University of Florida) MBSP March 20, 2018 18 / 48

Page 19: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Sufficient Conditions for Posterior Consistency

For our theoretical analysis, we assume that q < n is fixed and Σ isknown.

In practice, Σ is often unknown and can be estimated from the datausing an Inverse Wishart prior on Σ or by obtaining a separateestimate Σ̂ (e.g. the MLE) and plugging Σ̂ into our model as anempirical Bayes estimate.

Theory is developed separately for:

pn = o(n) (low-dimensional setting)

pn ≥ O(n) (ultrahigh-dimensional setting)

Ray Bai (University of Florida) MBSP March 20, 2018 19 / 48

Page 20: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Regularity Conditions for the Low-Dimensional Case

(A1) pn = o(n) and pn ≤ n for all n ≥ 1.

(A2) There exist constants c1, c2 so that

0 < c1 < lim supn→∞

λmin

(XT

n Xn

n

)≤ lim sup

n→∞λmax

(XT

n Xn

n

)< c2 < ∞.

(A3) There exist constants d1 and d2 so that

0 < d1 < λmin(Σ) ≤ λmax(Σ) < d2 < ∞.

Ray Bai (University of Florida) MBSP March 20, 2018 20 / 48

Page 21: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Sufficient Conditions for Posterior Consistency Whenp = o(n)

Theorem

Assume that conditions (A1)-(A3) hold. Then the posterior of Bn underany prior πn(Bn) is strongly consistent. That is, for any ε > 0,

Πn(Bn|Yn) = Πn(Bn : ||Bn −B0||F > ε|Yn)→ 0 P0 a.s. as n→ ∞

if

Πn

(Bn : ||Bn −B0||F <

∆nρ/2

)> exp(−kn)

for all 0 < ∆ <ε2c1d

1/21

48c1/22 d2

and 0 < k < ε2c132d2− 3∆c1/2

2

2d1/21

, where ρ > 0.

This theorem applies to any prior on Bn. Provided the prior satisfies theabove condition and p = o(n), the posterior is strongly consistent.

Ray Bai (University of Florida) MBSP March 20, 2018 21 / 48

Page 22: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

The MBSP Model

Recall the MBSP model:

Y|X, B, Σ ∼ MNn×q(XB, In, Σ),B|ξ1, ..., ξp, Σ ∼ MNp×q(O, τ diag(ξ1, ..., ξpn), Σ),

ξiind∼ π(ξi ), i = 1, ..., pn,

where τn > 0 and π(ξi ) is a polynomial-tailed density of the form,

π(ξi ) = K (ξi )−a−1L(ξi ),

To achieve posterior consistency, we require mild conditions on the slowlyvarying component L(·), τn > 0, and the true unknown coefficients matrixB0.

Ray Bai (University of Florida) MBSP March 20, 2018 22 / 48

Page 23: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Additional Assumptions under the MBSP Model

(i) For the slowly varying function L(t) in the priors for ξi , 1 ≤ i ≤ pn,limt→∞ L(t) ∈ (0, ∞). That is, there exists c0(> 0) such that L(t) ≥ c0

for all t ≥ t0, for some t0 which depends on both L and c0.

(ii) There exists M > 0 so that supj ,k |b0jk | ≤ M < ∞ for all n, i.e. the

maximum entry in B0 is uniformly bounded above in absolute value.

(iii) 0 < τn < 1 for all n, and τn = o(

1pnnρ

)for some ρ > 0.

Ray Bai (University of Florida) MBSP March 20, 2018 23 / 48

Page 24: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Posterior Consistency of MBSP (low-dimensional case)

Theorem

Suppose that we have the MBSP model with polynomial-tailed priors forξ1, ..., ξp. Provided that Assumptions (A1)-(A3) and (i)-(iii) hold, ourmodel achieves strong posterior consistency. That is, for any ε > 0,

Πn(Bn : ||Bn −B0||F > ε|Yn)→ 0 P0 a.s. as n→ 0.

Ray Bai (University of Florida) MBSP March 20, 2018 24 / 48

Page 25: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Ultrahigh-Dimensional Case

We have shown that the MBSP model achieves posterior consistencyunder mild conditions if pn = o(n).

What if pn > n and pn ≥ O(n)?

It turns out that with some additional regularity conditions on the modelsize and the design matrix, we can achieve posterior consistency in thisultrahigh-dimensional setting!

Ray Bai (University of Florida) MBSP March 20, 2018 25 / 48

Page 26: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Regularity Conditions for the Ultrahigh-dimensional Case

(B1) pn > n for all n ≥ 1, and log(pn) = O(nd ) for some 0 < d < 1.

(B2) The rank of Xn is n.

(B3) Let J denote a set of indices, where J ⊂ {1, ..., pn} such that|J | ≤ n. Let XJ denote the submatrix of X that contains the columnswith indices in J . For any such set J , there exists a finite constant

c̃1(> 0) so that lim infn→∞ λmin

(XTJXJn

)≥ c̃1.

(B4) There is finite constant c̃2(> 0) so that

lim supn→∞

λmax

(XT

n Xn

n

)≤ c̃2 < ∞.

(B5) There exist constants d1 and d2 so that

0 < d1 < λmin(Σ) ≤ λmax(Σ) < d2 < ∞.

(B6) The true model S∗ ⊂ {1, ..., pn} is nonempty for all n ands∗ = |S∗| = o(n/log(pn)).

Ray Bai (University of Florida) MBSP March 20, 2018 26 / 48

Page 27: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Sufficient Conditions for Posterior Consistency Whenlog p = o(n)

Theorem

Assume that conditions B1-B6 hold. Then the posterior of Bn under anyprior πn(Bn) is strongly consistent. That is, for any ε > 0,

Πn(Bn|Yn) = Πn(Bn : ||Bn −B0||F > ε|Yn)→ 0 P0 a.s. as n→ ∞

if

Πn

(Bn : ||Bn −B0||F <

∆nρ/2

)> exp(−kn)

for all 0 < ∆̃ <ε2c̃1d

1/21

48c̃1/22 d2

and 0 < k < ε2c̃132d2− 3∆̃c̃1/2

2

2d1/21

, where ρ > 0.

This theorem applies to any prior on Bn. Provided the prior satisfies theabove condition and log p = o(n), the posterior is strongly consistent.

Ray Bai (University of Florida) MBSP March 20, 2018 27 / 48

Page 28: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

The MBSP Model

Recall the MBSP model:

Y|X, B, Σ ∼ MNn×q(XB, In, Σ),B|ξ1, ..., ξp, Σ ∼ MNp×q(O, τ diag(ξ1, ..., ξpn), Σ),

ξiind∼ π(ξi ), i = 1, ..., pn,

where τn > 0 and π(ξi ) is a polynomial-tailed density of the form,

π(ξi ) = K (ξi )−a−1L(ξi ),

To achieve posterior consistency, we require mild conditions on the slowlyvarying component L(·), τn > 0, and the true unknown coefficients matrixB0.

Ray Bai (University of Florida) MBSP March 20, 2018 28 / 48

Page 29: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Additional Assumptions under the MBSP Model

(i) For the slowly varying function L(t) in the priors for ξi , 1 ≤ i ≤ pn,limt→∞ L(t) ∈ (0, ∞). That is, there exists c0(> 0) such that L(t) ≥ c0

for all t ≥ t0, for some t0 which depends on both L and c0.

(ii) There exists M > 0 so that supj ,k |b0jk | ≤ M < ∞ for all n, i.e. the

maximum entry in B0 is uniformly bounded above in absolute value.

(iii) 0 < τn < 1 for all n, and τn = o(

1pnnρ

)for some ρ > 0.

Note that these are the same conditions as in the low-dimensionalsetting!

The same rate for τn works for both low-dimensional andhigh-dimensional cases.

Ray Bai (University of Florida) MBSP March 20, 2018 29 / 48

Page 30: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Posterior Consistency of MBSP (ultrahigh-dimensionalcase)

Theorem

Suppose that we have the MBSP model with polynomial-tailed priors forξ1, ..., ξp. Provided that Assumptions (B1)-(B6) and (i)-(iii) hold, ourmodel achieves strong posterior consistency. That is, for any ε > 0,

Πn(Bn : ||Bn −B0||F > ε|Yn)→ 0 P0 a.s. as n→ 0.

Ray Bai (University of Florida) MBSP March 20, 2018 30 / 48

Page 31: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Three Parameter Beta Normal (TPBN) Family

A random variable y said to follow the three parameter beta density,denoted as TPB(u, a, τ), if

π(y) =Γ(u + a)

Γ(u)Γ(a)τaya−1(1− y)u−1 {1− (1− τ)y}−(u+a) .

In univariate regression, a global-local shrinkage prior of the form

βi |τ, ξiind∼ N(0, τξi ), i = 1, ..., n,

π(ξi )ind∼ Γ(u+a)

Γ(u)Γ(a)ξu−1i (1 + ξi )−(u+a), i = 1, ..., n,

may therefore be represented alternatively as

βi |νiind∼ N(0, ν−1

i − 1),

νiind∼ TPB(u, a, τ).

Ray Bai (University of Florida) MBSP March 20, 2018 31 / 48

Page 32: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Three Parameter Beta Normal (TPBN) Family

After integrating out νi in

βi |νiind∼ N(0, ν−1

i − 1),

νiind∼ TPB(u, a, τ),

the marginal prior for βi is said to belong to the three parameter betanormal (TPBN) family.

Special cases of the TPBN family include:

the horseshoe prior (u = 0.5, a = 0.5),

the Strawderman-Berger prior (u = 1, a = 0.5),

the normal-exponential-gamma (NEG) prior (u = 1, a > 0).

Ray Bai (University of Florida) MBSP March 20, 2018 32 / 48

Page 33: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Three Parameter Beta Normal (TPBN) Model

By Proposition 1 of Armagan et al. (2011), the TPBN prior can also bewritten as a hierarchical mixture of two Gamma distributions,

βi |ψi ∼ N(0, ψi ), ψi |ζi ∼ G(u, ζi ), ζi ∼ G(a, τ),

where ψi = ξiτ.

Using the TPBN family as our chosen prior and placing a conjugate prioron Σ, we can construct a specific variant of the MBSP model which wecall the MBSP-TPBN model.

Ray Bai (University of Florida) MBSP March 20, 2018 33 / 48

Page 34: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

MBSP-TPBN Model

Reparametrizing ψi = τξi , i = 1, ..., p, we have:

Y|X, B, Σ ∼ MNn×q(XB, In, Σ),B|ψ1, ..., ψp, Σ ∼ MNp×q(O, diag(ψ1, ..., ψp), Σ),

ψi |ζiind∼ G(u, ζi ), i = 1, ..., p,

ζii.i.d.∼ G(a, τ), i = 1, ..., p,

Σ ∼IW(d , kIq),

The MBSP-TPBN model admits a Gibbs sampler.

Ray Bai (University of Florida) MBSP March 20, 2018 34 / 48

Page 35: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Variable Selection

Although the MBSP model and the MBSP-TPBN model produce robustestimates for B, they do not produce exact zeros.

In order to use the MBSP model for variable selection, we recommendlooking at the 95% credible intervals for each entry bij in row i andcolumn j .

If the credible intervals for every single entry in row i , 1 ≤ i ≤ p,contain zero, then we classify predictor i as an irrelevant predictor.

If at least one credible interval in row i , 1 ≤ i ≤ p does not containzero, then we classify i as an active predictor.

Ray Bai (University of Florida) MBSP March 20, 2018 35 / 48

Page 36: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Simulation Study

For our simulation study, we implement the MBSP-TPBN model with thehorseshoe prior (a = u = 0.5), one of the most popular polynomial priors.

We also set:

τ = 1p√n log n

d = 3

k = variance of residuals, Y−XB(0), where B(0) is the initial guessin the Gibbs sampler (taken as a ridge estimator).

Ray Bai (University of Florida) MBSP March 20, 2018 36 / 48

Page 37: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Simulation Study

Our primary interest is in the p > n case. We consider three differentsimulation settings with varying levels of sparsity:

Experiment 1 (p > n): n = 50, p = 200, q = 5. 20 of the predictorsare randomly picked as active (sparse model).

Experiment 2 (p > n): n = 60, p = 100, q = 6. 40 of the predictorsare randomly picked as active (dense model).

Experiment 3 (p � n): n = 100, p = 500, q = 3. 10 of the predictorsare randomly picked as active (ultra-sparse model).

Ray Bai (University of Florida) MBSP March 20, 2018 37 / 48

Page 38: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Simulation Study Metrics

As our point estimate for B, we take the posterior median B̂ = (B̂ij )p×q.We also perform variable selection by inspecting the 95% credibleintervals.

We compute the following metrics, averaged across 100 replications:

MSEest = 100× ||B̂−B||2F /(pq),MSEpred = 100× ||XB̂−XB||2F /(nq),

FDR = FP / (TP + FP),FNR = FN / (TN + FN),

MP = (FP + FN)/(pq),

where FP, TP, FN, and TN denote the number of false positives, truepositives, false negatives, and true negatives respectively.

Ray Bai (University of Florida) MBSP March 20, 2018 38 / 48

Page 39: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Simulation Study

Experiment 1: n = 50, p = 200, q = 5. 20 active predictors

Method MSEest MSEpred FDR FNR MP

MBSP 1.36 117.52 0.0117 0 0.0013MBGL-SS 57.25 694.81 0.858 0.02 0.619LSGL 8.65 169.30 0.788 0 0.374SRRR 17.46 161.70 0.698 0 0.307

Experiment 2: n = 60, p = 100, q = 6. 40 active predictors

Method MSEest MSEpred FDR FNR MP

MBSP 10.969 172.84 0.0249 0 0.0107MBGL-SS 204.33 318.80 0.505 0.1265 0.415LSGL 44.635 188.81 0.544 0 0.479SRRR 242.67 193.64 0.594 0 0.587

Experiment 3: n = 100, p = 500, q = 3. 10 active predictors

Method MSEest MSEpred FDR FNR MP

MBSP 0.185 64.14 0.048 0 0.0011MBGL-SS 1.327 155.51 0.483 0.0005 0.092LSGL 0.2305 72.894 0.849 0 0.117SRRR 0.9841 49.428 0.688 0 0.104

Table: Simulation results for MBSP-TPBN, compared with thee other methods, averagedacross 100 replications.

Ray Bai (University of Florida) MBSP March 20, 2018 39 / 48

Page 40: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Yeast Cell Cycle Data Analysis

Transcription factors (TFs) are sequence-specific DNA binding proteinswhich regulate the transcription of genes from DNA to mRNA by bindingspecific DNA sequences. We want to know which TFs are significant.

In this yeast cell cycle data set (first studied by Chun and Keles (2010)):

mRNA levels are measured at 18 time points seven minutes apart(every 7 minutes for a duration of 119 minutes).

The 542× 18 response matrix Y consists of 542 cell-cycle-regulatedgenes from an α factor arrested method, with columns correspondingto the mRNA levels at the 18 distinct time points. The 542× 106design matrix X consists of the binding information of a total of 106TFs.

We fit the MBSP model to this data set. We assess its predictiveperformance using 5-fold cross validation and perform variable selectionfrom the 106 TFs.

Ray Bai (University of Florida) MBSP March 20, 2018 40 / 48

Page 41: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Yeast Cell Cycle Data Analysis

Method Number of Proteins Selected MSPEMBSP 10 18.491MBGL-SS 7 20.093LSGL 4 22.819SRRR 44 18.204

Table: Results for analysis of the yeast cell cycle data set. The MSPE has been scaled by afactor of 100. In particular, all four models selected the three TFs, ACE2, SWI5, and SWI6 as

significant.

The SRRR method has the lowest MSPE but it recovers a non-parsimonious model. In contrast, MBSP has good predictive performanceand recovers a parsimonious model.

Ray Bai (University of Florida) MBSP March 20, 2018 41 / 48

Page 42: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Yeast Cell Cycle Data Analysis

0 20 40 60 80 100 120

−0.

6−

0.4

−0.

20.

00.

20.

40.

6

ACE2

0 20 40 60 80 100 120

−0.

6−

0.4

−0.

20.

00.

20.

40.

6

HIR1

0 20 40 60 80 100 120

−0.

6−

0.4

−0.

20.

00.

20.

40.

6

NDD1

0 20 40 60 80 100 120

−0.

6−

0.4

−0.

20.

00.

20.

40.

6

SWI6

Figure: Plots of the estimates and 95% credible bands for four of the 10 TFs that were deemedas significant by the MBSP-TPBN model. The x-axis indicates time (minutes) and the y-axis

indicates the estimated coefficients.

Ray Bai (University of Florida) MBSP March 20, 2018 42 / 48

Page 43: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Summary of MBSP Model

We have introduced a new Bayesian approach known as the MultivariateBayesian model with Shrinkage Priors (MBSP) for the multivariate linearregression model, Y = XB + E.

Our model produces a row-sparse estimate of the p × q matrix, B,allowing for sparse estimation and variable selection from the pvariables.

Our model can consistently estimate B even when p � n and pgrows at nearly exponential rate with n (i.e. p = O(en

d), 0 < d < 1.)

A wide variety of polynomial-tailed shrinkage priors may be used, soour model and our theoretical results are quite general.

We illustrated practical application of our model with the threeparameter beta normal family (MBSP-TPBN), using the horseshoeprior as a special case.

Ray Bai (University of Florida) MBSP March 20, 2018 43 / 48

Page 44: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Future Work

Open problems:

Theoretical investigation of MBSP (and Bayesian multivariateregression models in general) when q → ∞ and when Σ is treated asunknown.

Moving beyond consistency, deriving a particular contraction rate ofthe MBSP’s posterior around B0.

Applying polynomial-tailed priors to reduced rank regression andpartial least squares regression.

Ray Bai (University of Florida) MBSP March 20, 2018 44 / 48

Page 45: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Pre-print of Paper

A pre-print of the paper for this presentation is available at:https://arxiv.org/abs/1711.07635

Accepted pending minor revision at Journal of Multivariate Analysis.

Ray Bai (University of Florida) MBSP March 20, 2018 45 / 48

Page 46: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

References

Armagan, A., Clyde, M., and Dunson, D.B. (2011) “Generalized Beta Mixtures ofGaussians.” Advances in Neural Information Processing Systems 24, 523-531.

Armagan, A., Dunson, D.B., Lee, J., Bajwa, W., and Strawn, N. (2013) “PosteriorConsistency in Linear Models Under Shrinkage Priors.” Biometrika, 100(4):1011-1018.

Brown, P.J., Vannucci, M., and Fearn, T. (1998) “Multivariate Bayesian VariableSelection and Prediction.” Journal of the Royal Statistical Society: Series B, 60(3):627-641.

Carvalho, C.M., Polson, N.G., and Scott, J.G. (2010) “The Horseshoe Estimatorfor Sparse Signals.” Biometrika, 97(2):465-480.

Ray Bai (University of Florida) MBSP March 20, 2018 46 / 48

Page 47: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

References

Chen, L. and Huang, J.Z. (2012) “Sparse Reduced-Rank Regression forSimultaneous Dimension Reduction and Variable Selection.” Journal of theAmerican Statistical Association, 107(500): 1533-1545.

Li, Y., Nan, B., and Zhu, J. (2015) “Multivariate Sparse Group Lasso for theMultivariate Multiple Linear Regression with an Arbitrary Group Structure.”Biometrics, 71(2): 354-363.

Liquet, B., Mengersen, K., Pettitt, A.N., and Sutton, M. (2017) “Bayesian VariableSelection Regression of Multivariate Responses for Group Data.” Bayesian Analysis12(4): 1039-1067.

Tang, X., Xu, X., Ghosh, M., and Ghosh, P. (2017) “Bayesian Variable Selectionand Estimation Based on Global-Local Shrinkage Priors.” Sankhya A.

Ray Bai (University of Florida) MBSP March 20, 2018 47 / 48

Page 48: High-Dimensional Multivariate Bayesian Linear Regression ...raybai.net/wp-content/uploads/2019/03/MBSP-slides.pdf · Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University

Questions?

Ray Bai (University of Florida) MBSP March 20, 2018 48 / 48