Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Clinical Workload Prediction in Patient

Embed Size (px)

Citation preview

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    1/19

    Full Terms & Conditions of access and use can be found athttp://www.tandfonline.com/action/journalInformation?journalCode=uiie20

    Download by: [COMSATS Headquarters] Date: 21 March 2016, At: 01:19

    IIE Transactions

    ISSN: 0740-817X (Print) 1545-8830 (Online) Journal homepage: http://www.tandfonline.com/loi/uiie20

    Bayesian component selection in multi-responsehierarchical structured additive models with anapplication to clinical workload prediction inpatient-centered medical homes

    Issac Shams, Saeede Ajorlou & Kai Yang

    To cite this article: Issac Shams, Saeede Ajorlou & Kai Yang (2015) Bayesian componentselection in multi-response hierarchical structured additive models with an application to

    clinical workload prediction in patient-centered medical homes, IIE Transactions, 47:9, 943-960,DOI: 10.1080/0740817X.2014.982840

    To link to this article: http://dx.doi.org/10.1080/0740817X.2014.982840

    Accepted author version posted online: 23Dec 2014.Published online: 23 Dec 2014.

    Submit your article to this journal

    Article views: 91

    View related articles

    View Crossmark data

    http://crossmark.crossref.org/dialog/?doi=10.1080/0740817X.2014.982840&domain=pdf&date_stamp=2014-12-23http://crossmark.crossref.org/dialog/?doi=10.1080/0740817X.2014.982840&domain=pdf&date_stamp=2014-12-23http://www.tandfonline.com/doi/mlt/10.1080/0740817X.2014.982840http://www.tandfonline.com/doi/mlt/10.1080/0740817X.2014.982840http://www.tandfonline.com/action/authorSubmission?journalCode=uiie20&page=instructionshttp://www.tandfonline.com/action/authorSubmission?journalCode=uiie20&page=instructionshttp://dx.doi.org/10.1080/0740817X.2014.982840http://www.tandfonline.com/action/showCitFormats?doi=10.1080/0740817X.2014.982840http://www.tandfonline.com/loi/uiie20http://www.tandfonline.com/action/journalInformation?journalCode=uiie20

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    2/19

    IIE Transactions (2015) 47, 943–960

    Copyright   C “IIE”

    ISSN: 0740-817X print / 1545-8830 online

    DOI: 10.1080/0740817X.2014.982840

    Bayesian component selection in multi-response hierarchical

    structured additive models with an application to clinical

    workload prediction in patient-centered medical homes

    ISSAC SHAMS1,∗

    , SAEEDE AJORLOU1 and KAI YANG2

    1Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI 48109, USAE-mail: [email protected] of Industrial and Systems Engineering, Healthcare Systems Engineering Group, Wayne State University, Detroit, MI 48202, USA

    Received July 2013 and accepted October 2014

    Motivated by a large health caredata obtained from the U.S. Veterans Health Administration (VHA), we developa multivariate versionof hierarchical structured additive regression (STAR) models that involves a set of health care responses defined at the lowest level of the hierarchy, a set of patient factors to account for individual heterogeneity, and a set of higher level effects to capture dependencebetween patients within the same medical home team and facility. We show how a special class of such models can equivalentlybe represented and estimated in structural equation modeling framework. We then propose a Bayesian component selection with aspike and slab prior structure that allows inclusion or exclusion single effects as well as grouped coefficients representing particularmodel terms. A simple parameter expansion is used to improve mixing and convergence properties of Markov chain Monte Carlosimulation. The proposed methods are applied to a real-world application of the VHA patient centered medical home (PCMH) dataand help to provide a good prediction of clinical workload portfolio for a certain mix of health care professionals based on patientkey demographic, diagnostic, and medical attributes.

    Keywords: Health care, statistical modeling, variable selection, multilevel model, Bayesian, multivariate, Markov chain Monte Carlo

    1. Introduction

    Recent years have seen enormous changes in U.S. healthcare systems due to advancements in database systems thatrapidly collect and organize electronic health records. Stan-dard operations management based medical practices havemoved from relatively ad hoc and subjective to data-drivendecision making and evidence-based health care. Tradi-tional fee-for-service payment methods have transitionedinto models that tie reimbursement to outcomes such aspay-for-performance and bundled payment. The complex-ity and abundance of health care data have grown thanks tothe development of new data-gathering techniques such ascapturing devices, sensors, and mobile applications. Sub-sequently, there have been more incentives for hospitals toreduce costs and promote quality by using advanced dataanalytics tools that help find insights from large, noisy, het-erogeneous, longitudinal, and hierarchical health care data.

    ∗Corresponding authorColor versions of one or more of the figures in the article can befound online at www.tandfonline.com/uiie.

    Many kinds of health care data, including clinical data,billing/claims data, and patient-specific data, involve hi-erarchical (nested) or clustered structures. For example, ina study assessing differences in mortality rates across hos-pitals, data are randomly collected on samples of patientsnested within each hospital. In this application, there aretwo levels of hierarchy (level-1 for patients and level-2 forhospitals), and for each level, a set of specific covariatesexists (such as age, gender, and severity of illness at thefirst level and hospital size and hospital teaching status at

    the second level) that might have a relationship with theoutcome. To handle these hierarchically structured data,multilevel models (also known as hierarchical linear mod-els, variance component models, random effect models,or split plot designs) have been proposed and applied indifferent fields, including psychometrics, biostatistics, andeconometrics (Goldstein, 2011). The basic idea is to link co-variates at higher levels to the predictor variables at lowerlevels by imposing another set of regressions in which thelower level (regression) coefficients are explained by higherlevel predictors.

    0740-817X   C 2015 “IIE”

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    3/19

    944   Shams et al.

    The assumption of parametric form for the covariatesin the hierarchical linear model makes it rather restricted.For example, in longitudinal growth studies where repeatedmeasures of the response variable (e.g., height) are clusteredwithin individuals, the relation between age and height isoften found to be exponential. To relax the linearity con-

    straints, covariates with nonparametric structure (such aslocal regression or smoothing spline) or semiparametricstructure (such as partially linear models or varying coeffi-cient models) can be incorporated in the multilevel frame-work at each level of the hierarchy (Goldstein, 2011). Onesuch extension is generalized additive mixed models, whichenjoy the nonparametric properties of additive models anddistributional flexibility of generalized linear mixed mod-els. Another more recent class of this type is the hierarchicalversion of STructured Additive Regression (STAR) mod-els (Lang   et al ., 2013) that offers a broad and rich classof complex regressions containing several important sub-classes as special cases, such as generalized additive mixed

    models, state space models for longitudinal studies, geoad-ditive models (Kammann and Wand, 2003), and varyingcoefficient models (Hastie and Tibshirani, 1993).

    As in many areas of statistical modeling and machinelearning, the problem of variable selection (also known asfeature selection, attribute selection, model selection, andvariable subset selection) has become an important issue inmultilevel models. Variable selection often aims to choosea subset of relevant covariates from a possibly large set of candidates that might include many redundant or irrelevantfeatures. Due to its practical importance, this problem hasattracted many researchers from diverse fields, leading to avast amount of literature on selecting predictors of regres-

    sion models. Classic methods in this area basically rely on (i) p-value such as stepwise deletion; or (ii) information criteriasuch as the Akaike Information Criterion (AIC), BayesianInformation Criterion (BIC), and, more recently, focusedinformation criterion (Claeskens and Hjort, 2003), amongothers. However, such approaches usually suffer from lackof stability and perform poorly in selecting random-effectcomponents (Breiman, 1996). In addition, they involve acombinatorial optimization comparing 2 p+q different mod-els( p and q are the numbers of fixed and covariance param-eters, respectively) that is NP-hard and might be infeasibleto solve even when   p + q     sample size is fixed (Pu andNiu, 2006). To address such drawbacks, regularization (orshrinkage) methods have been introduced that focus onselecting variables simultaneously with model estimationusing some data-oriented penalty functions. Popular ex-amples may include the Least Absolute Shrinkage and Se-lection Operator (LASSO; Tibshirani (1996)) or SmoothlyClipped Absolute Deviation (SCAD; Fan and Li (2001))and modifications such as hierarchical or random LASSO.To get an overview of variable selection in linear models, seethe review paper by Chen  et al . (2013). Variable selectionis also of great importance in high dimensional data suchas DNA microarrays or functional Magnetic Resonance

    Imaging (MRI) data (see Fan and Lv (2010) for a review).Likewise, various studies have been devoted to variable se-lection in nonparametric additive models and semipara-metric linear models (see, for example, Huang et al . (2010)and Kundu and Dunson (2013)). Multivariate variable se-lection has also been investigated in a number of studies

    such as Brown et al . (1998) and Cai et al . (2005).Compared with classic methods that are primarily based

    on Bayes factors, approaches for Bayesian variable selec-tion are mostly built around spike and slab priors. Thebasic idea is to introduce a binary latent variable  I  j  associ-ated with each regression coefficient so that the variable isforced to be zero when  I  j  is in the spike part or remain un-changed if  I  j  is in the slab part. The posterior distributionof   I  j   is then interpreted as marginal posterior probabil-ities for inclusion or exclusion of the respective covariate.See stochastic search variable selection George and McCul-loch (1993) and mixture of Zellner’s  g  priors Liang  et al .(2008) as popular examples, and a recent review paper by

    O’Hara and Sillanpää (2009).In multilevel models, however, the problem of select-

    ing the random effects is more complicated since itinvolves boundary problems that can arise from either non-negative constraints on fixed effect parameters or positivesemi-definite constraints on covariance matrices. To date,approaches for variable selection in this class mainly per-tain to linear (or generalized linear) mixed models such asthe generalized information criterion of Pu and Niu (2006)and Bayesian methods of Spiegelhalter etal . (2002), amongothers (see Müller et al . (2013) for a review).

    In contrast with variable selection, component (or func-tion) selection deals with selecting an appropriate subset of 

    covariates and, at the same time, determining whether lin-ear or more flexible functional forms of covariates have tobe chosen. Research in this area was started by Antoniadisand Fan (2001), who proposed a group SCAD penalty forregularization in wavelet approximation. Lin and Zhang(2006) developed the COmponent Selection and SelectionOperator (COSSO) estimator in additive smoothing splineanalysis of variance models with a fixed number of co-variates. Recently, by extending the non-negative garroteestimator of Breiman (1995), Marra and Wood (2011) de-veloped a single-step shrinkage approach for function se-lection in generalized additive models.

    In this article, consistent with the idea of modeling mul-tivariate outcomes in multilevel data structures (Goldstein,2011, Chapter 6), we first extend hierarchical STAR modelsintroduced in Lang et al . (2013) to include multivariate re-sponse variables from the exponential family distributions.This way, we will be able to simultaneously model the rela-tionship of several responses on a set of structured additivepredictors accounting for possible correlation among thedependent variables. Then, we propose spike and slab priorsfor automatic variable selection and model choice withina Bayesian hierarchical framework similar to the work of Scheipl   et al . (2012). We apply our model to real-world

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    4/19

    Bayesian modeling of patient-centered medical homes   945

    health care data obtained from the Department of VeteranAffairs (VA). The application analyzes Patient-CenteredMedical Home (PCMH) project data gathered from a largenumber of medical facilities during fiscal year 2011–2012.Separate data tables from: (i) patient health conditions andcare utilization and (ii) patient demographic information

    are first combined to form patient level data. The pa-tientlevel data is further aggregated to the provider andstation levels to help predict a patient’s total care demandson primary and nonprimary care on a yearly basis. Bycombining these multilevel data sources, our proposal canassist health professionals in making operational decisionssuch as determining the number of primary care physiciansbased on expected clinic visits or expected clinical work-loads for those visits.

    The main contributions we make in this article in-clude formulating a multivariate version of the hierarchi-cal STAR model, bridging the connection between multi-variate hierarchical STAR models and generalized latent

    variable models, proposing a Bayesian function selectionroutine for the multivariate hierarchical STAR model basedon spike and slab priors and applying our proposal torealworld data from the Veterans Health Administration(VHA) PCMA project to demonstrate its performance andapplicability, and produce findings that convey key publicand medical implications.

    The rest of the article is organized as follows. Section 2 re-views some literature on the PCMH and outlines the prob-lem statement. Section 3 introduces some background onstructured additive regression based on a Bayesian Psplineand its hierarchical version. Section 4 describes the pro-posed multivariate extension to hierarchical STAR models

    followed by a Bayesian variable selection procedure. Sec-tion 5 provides an illustrative application of our methodson the VHA PCMH data. Section 6 includes concludingremarks and directions for future research.

    2. Overview of PCMH

    The PCMH has been emerged as a new model for deliv-ery system reform that has the potential to improve pri-mary care quality with better outcomes and at lower costs.The model is a patient-oriented, team-based approach con-sisting of different providers such as physician, registerednurse, nutritionist, and clerk that delivers accessible, co-ordinated, and comprehensive care in the context of thepatient’s family and community (Stange  et al ., 2010). Themedical home concept originated during the 1960s in pe-diatrics but did not find its way to adult general practiceuntil 2004. Theoretically, the PCMH model entails a broadset of fundamental principles such as having a physician-directed medical practice taking responsibility for all of the continuing care, enhanced access to care through openscheduling systems with expanded hours and personalizedcommunications, and an appropriate payment system that

    recognizes the added value provided to patients beyondthe traditional fee-for-service encounters (Rittenhouse andShortell, 2009).

    As of 2007, there were some literature that examinedthe prevalence and effectiveness of medical homes. For in-stance, Fisher (2008) outlined some recommendations for

    the success of medical homes such as sharing informationacross health care providers, extending the performancemeasures to include patients’ experiences with care and as-sessments of outcomes, and establishing a PCMH-basedpayment system that shares savings among all providers in-volved. Another study within the Group Health system inSeattle showed that a medical home prototype led to 29%fewer emergency visits, 6% fewer hospitalizations, and totalsavings of $10.30 perpatientper month over a 21-month pe-riod (Reid  et al ., 2010). Practically, as of December 2009,there were about 26 pilot projects involving the medicalhome concept being directed in 18 states. They involveover 14 000 physicians and approximately 5000 000 pa-

    tients (Bitton etal ., 2010). Of interest, the VHA launched anationwide 3-year program in April 2010 to create PCMHsin more than 900 primary care clinics. Early results in-dicated dramatic improvements such as reducing the ap-pointment waiting time from as long as 90 days downto a day and decreasing the percentage of inappropriateemergency department visits from 52% to 12% (Klein andFund, 2011).

    However, there are difficulties in fully achieving the ben-efits of the PCMH model in practice. It has been found thatmore efforts are required in the PCMH model to fully lever-age the electronic health record technology and to developnew business rules and staffing structures than initially en-

    visioned (Rittenhouse and Shortell, 2009; Ajorlou  et al .,2014). From an operations management point of view, akey success factor in designing any health care delivery sys-tem is to achieve a balance between supply and demand of care services. This issue is even more critical for the PCMHmodel since the clinical supply and demand is portfolio innature. Unlike health demands, the supply of health careservices can be treated as deterministic and be calculatedbased on head counts and available service hours from allprofessional lines within a PCMH team on an annual ba-sis. However, the estimation of clinical workload portfoliobased on key patient factors is a challenging task and, to thebest of our knowledge, our work is the first attempt tacklingthis problem within the OR/MS and IE community.

    3. Background

    3.1. STAR models based on Bayesian Psplines

    Let ( yi , xi , vi ),   i   = 1, . . . , n,  denote the   i th sampled vec-tor of data, where   yi    is the response variable,   xi   =(xi 1, xi 2, . . . , xi p)

    is a vector of continuous covariates,and   νi   = (νi 1, νi 2, . . . , νis )

    is a vector of further (mostly

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    5/19

    946   Shams et al.

    categorical) predictors. STAR models (Fahrmeir   et al .,2004) assume that, given   xi    and   νi , the distributionof   yi    belongs to an exponential family   π ( yi |xi , νi , φ) =

    c( yi , φ)exp( yi θ i −b(θ i )

    φ  ),   where  b(·),  c(·),  θ i , and  φ   are deter-

    mined by the type of distribution. The conditional expectedvalue µi   =  E ( yi |xi , νi ) is related to a semiparametric addi-

    tive predictor   ηi   by   µi   =  g(ηi ) via a fixed (known) linkfunction g (·) as in generalized linear models. The additivepredictor ηi  has the form

    ηi   =   f 1(xi 1) + · · · +   f  p(xi p) + vi γ ,   (1)

    in which   f 1, . . . ,   f  p   are unknown nonlinear (possiblysmooth) functions of the continuous covariates, and   ν i γ represents the usual linear part of the model. Following theBayesian version of P(enalized)splines (Lang and Brezger,2004), the unknown function   f  j  is approximated by a poly-nomial spline of degree  r   defined over a set of (not nec-essarily equally spaced) knots  xmin j    =ζ 

    0 j  

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    6/19

    Bayesian modeling of patient-centered medical homes   947

    function evaluated at the observation x(1) j    . If both covariates

    are continuous, a more flexible approach can be based on atwo-dimensional Pspline, in which the unknown interactionsurface can be approximated by the tensor product of the

    corresponding one dimensional B-splines as   f  j (x(1) j    , x

    (2) j    ) =M 1 j 

    m1=1M 2 j 

    m2=1 β j ,m1m2 B  j ,m1 (x

    (1)

     j    )B  j ,m2 (x

    (2)

     j    ). The related de-sign matrix   X  j    is then   n × (M 1 j  × M 2 j ) and it consistsof products of basis functions. The appropriate priors for

    β j   =

    β j ,11, . . . , β j ,M 1 j  M 2 j 

    are commonly found in spatialstatistics.

    Another common application in multilevel analysis isrelated to random slopes that appear when combining re-gression equations of higher levels with the lower levels toform a compound representation (Goldstein, 2011, Chap-ter 2). For example, in our case study of the VA PCMH,we would like to model the heterogeneity in the slopeof the relationship between health care demand and pa-tient age among all PCMH teams. Then, a random slope

    with regard to index variable   x(1) j    , which indicates the

    teams here, can be incorporated as   f  j (x j ) = h(x(1) j    )x

    (2) j    with

    h(x(1) j    ) = β jc  ∼  N (0, τ 

    2 j  ). Following this, the design matrix

    X  j  is given by  diag(x(2)

    1   , . . . , x(2)n   )X 

    (1) j    , where   X 

    (1) j    is a 0/1

    incidence matrix.

    4. Proposed methods

    4.1. Multiresponse hierarchical STAR model 

    When we want to simultaneously study multiple responsevariables, a multivariate model should be developed tocapture additional correlations among different measure-ments. One key advantage of such modeling lies in itsability to control the type I error rate better comparedwith carrying out a series of univariate tests. In the con-text of multilevel analysis, different responses can be in-corporated by placing them in a separate “response” levelat the lowest level of the hierarchy. A series of  d   dummyvariables, one for each response, is then defined and en-tered into regression equations of higher levels. For simplic-ity, we first focus on the three-level structure,  response(h)

    within  patienti   within (medical home)  team j , with regu-

    lar predictors, and then show how this can be extendedto the STAR context. A model with more than three lev-els is just a straightforward extension of what we proposehere.

    Suppose there are   H   response variables in the lowest

    level. We define d (h)hi j   = 1 if the hth response is modeled and

    zero otherwise (Goldstein, 2011, Chap. 6). Let   x p,i j   andzq, j  denote the  pth and   qth covariate in the patient level

    and team level, respectively. Let u(h)0, j   and u

    (h) p, j  represent the

    hth random intercept and  hth random slope of the   pthpredictor in the patient level, one-to-one. Then we model

    the outcome as

     yhi j   =

    h

    d (h)hi j  β

    (h)0   +

    h

    d (h)hi j 

     p p=1

    β(h)

     p   x p,i j 

    + h

    d (h)hi j 

    Q

    q=1

    β(h)

    q  zq, j  +

    h

    d (h)hi j 

     p=1

    Q

    q=1

    β(h)

     p,q   x p,i j zq, j 

    +

    h

    d (h)hi j 

    P  p=1

    u(h) p, j x p,i j   +

    h

    d (h)hi j  u

    (h )0, j   +

    h

    d (h)hi j  ε

    (h)i j    , (6)

    u(1)0, j 

    ...u

    (1)P , j 

    ...

    ...

    .

    ..u

    (H )P , j 

    ∼  N (0, u ) , u  =

    τ 2(1)uo   · · · · · · · · · · · ·   τ 

    (1)(H )uo,P 

    ...  . ..

      ...

    τ (1)uo,P    · · ·   τ 

    2(1)u P    τ 

    (1)(H )u P , P 

    ...  . ..

      ......

      . ..  ...

    .

    ..   . . .  .

    ..τ 

    (1)(H )uo, P    · · · · · · · · · · · ·   τ 

    2(H )u P 

    ,

    (7)

    ε(1)i j ...

    ε(H )i j 

    ∼  N (0, ε) , ε  =

    σ 2(1)

    ε   · · ·   σ (1)(H )

    ε

    ...  . . .

      ...

    σ (1)(H )

    ε   · · ·   σ 2(H )

    ε

    .

    (8)The first term in Equation (6) shows the grand mean

    for each of the response variables followed by patient-levelpredictors and team-level predictors, and then cross-level

    interactions (effect modifiers) are included followed by ran-dom slopes and then random intercept terms, and finallypatient-level residuals are included. Note that there is nolevel-1 residual specified since level-1 exists only to definethe multivariate structure. The random effects are definedin Equation (7) with a general unstructured covariance uthat contains the pairwise covariances between each set of these random effects for the intercept and slopes withineach of the responses and between the response variables.The patient-level residuals are defined in Equation (8) withcovariance structure   ε  that would include all variancesand covariances between patient-level residuals. Taking amatrix form, we can rewrite (6) as

     yhi j   =

    h

    d (h)hi j  Z 

    T  j B

    (h) X i j  +

    h

    d (h)hi j  U 

    (h)T 

     j    X i j 

    +

    h

    d (h )hi j  ε

    (h)i j    ,   (9)

    where we have

    Z  j   = [1, z1, j , . . . , zQ, j ]T , X i j   = [1, x1,i j , . . . , xP ,i j ]

    T ,

    U (h) j    =

    u

    (h)

    0, j , u(h)

    1, j , . . . , u(h)P , j 

    T ,   (10)

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    7/19

    948   Shams et al.

    B(h) =

    β(h)0   β

    (h )

    1   · · ·   β(h)

    β(h)

    1  β

    (h )1,1   · · ·   β

    (h)P ,1

    ......

    ...

    β(h)

    Q  β

    (h)1,Q   · · ·   β

    (h)P ,Q

    .   (11)

    Note that   ¯ β(h)1   , . . . ,

     ¯ β(h )P    in the first row of Equation (11)

    show regression coefficients for patient-level predictors,whereas  β (h

    )

    1  , . . . , β(h

    )

    Q  placed in the first column of Equa-

    tion (11) indicate coefficients for team-level variables.To extend this within the STAR framework where the co-

    variates are represented by a linear combination of Bsplinebasis functions, we simplify Equation (6) for a particularoutcome h as

     y(h)hi j   = β

    (h)0   +

    P  p=1

    β

    (h)

     p   + u(h) p, j 

    x p,i j  +

    Qq=1

    β(h)

    q  zq, j 

    +P 

     p=1

    Qq=1

    β(h)

     p,q x p,i j zq, j  + u(h)0, j   + ε

    (h)i j 

    for h = 1, . . . , H .   (12)

    We assume that, for response   h, patient-level covari-

    ate   x p, p  = 1, . . . , P , is represented by a set of    ¯ M (h) p   =

     ¯ k p +  ¯ r   polynomial splines of degree ¯ r   over   ¯ k p + 1 knots

    ζ 0(h)

     p   < ζ 1(h)

     p   < · · · < ζ k j (h

    ) p   . Similarly, team-level predictor

    zq , q  = 1, . . . , Q, is represented by   M (h)q   = kq  + r   poly-

    nomial splines of degree   r   over a domain defined by

    ζ 

    0(h)

    q   < ζ 

    1(h)

    q   < · · · < ζ 

    k j (h)

    q   . Hence, a hierarchical STARmodel with a multivariate response has the form

     y(h)hi j   = β

    (h)0   +

    P  p=1

    M (h)

     pm1=1

    β

    (h)

    m1 p+ u

    (h )m1, p, j 

    (h)

    m1 p(x p,i j )

    +

    Qq=1

    M (h)

    qm2=1

    β(h)

    m2qB (h

    )m2q

    (zq, j ) +

    P  p=1

    Qq=1

    M (h)

     pm1=1

    M (h)

    qm2=1

    β(h)

    m1m2 pqB (h

    )m1 p

    (x p,i j )B (h)m2q

    (zq, j ) + u(h)0, j   + ε

    (h)i j    ,

    h = 1, . . . , H .   (13)

    u(1)0, j 

    ...

    ...

    ...u

    (H )

    M (H )P    ,P , j 

    ∼  N (0, u ) , u  =

    τ 2(1)uo   · · · · · · · · ·   τ 

    (1)(H )u

    o,

    (H )P 

      ,P 

    ...

      . . .  ...

    ...  . ..

      ......

      . ..  ...

    τ (1)(H )u

    o,

    (H )P 

      ,P 

      · · · · · · · · ·   τ 2(H )uM 

    (H )P 

      ,P 

    ,

    (14)

    ε(1)i j ...

    ε(H )i j 

    ∼  N (0, ε) , ε   =

    σ 2(1)

    ε   · · ·   σ (1)(H )

    ε

    ...  . . .

      ...

    σ (1)(H )

    ε   · · ·   σ 2(H )

    ε

    .

    (15)The  B (·) and  β  (·) in equation (13) represent basis func-

    tions and Bspline coefficients, respectively. Random-effectsplines are defined in Equation (14). For a particular out-come, the patient-level random effects present each patient’sdeviance from the average intercept  u 0, j  and from the av-erage slope of each splines (u1, j , . . . , um1, p, j ). The patient-level covariance matrix includes the pairwise covariancesbetween each set of spline random effects for the interceptand slopes within each of the response variables as well asbetween the response variables. The patient-level residualsare defined in Equation (15) with covariance structure  ε.Although covariances described in Equations (14) and (15)are in a general unstructured format, special forms such asa Toeplitz- or Kronecker-type structure can be taken based

    on different applications.Following Section 3.2, the interaction effect between

    patient-level and team-level covariates is modeled withvarying coefficient h(x p,i j )zq, j   if  z  is categorical or throughnonparametric two-dimensional surface fitting of   f (x p, zq )by the tensor product of two univariate Bsplines as in Equa-tion (13) if z is continuous. If variable selection is not lookedat, the most commonly used priors for the latter case are es-tablished with the next four nearest neighbors on a regularlattice as

    β(h )

    m1m2 pq|· ∼  N 

    1

    4

    β

    (h)

    (m1−1)m2 pq + β

    (h)

    (m1+1)m2 pq + β

    (h)

    m1(m2−1) pq

    + β(h)

    m1(m2+1) pq

    ,

    τ 2(h

    ) pq

    4

    ,   (16)

    for   m1  = 2, . . . ,   ¯ M (h) p   − 1, m2  = 2, . . . M 

    (h)q   − 1. This can

    be seen as a direct generalization of a first-order randomwalk in one dimension. Other types of priors such as Kro-necker product of penalty matrices of the main effectsK  pq , j   =  K  p, j  ⊗  K q, j   can also be applied (see Lang andBrezger (2004)).

    4.2. Relationship with a structural equation model 

    Here we show how the multilevel spline model with a mul-tivariate response can equivalently be represented and esti-mated in the structural equation modeling framework. Forsimplicity we choose a model with only level-2 predictors,but this can be extended to more general cases with higher-level predictors and possible interactions such as the onewe developed in Section 4.1. In addition, we pick the linearspline model as a special case to help better understand theapproach, but this can easily be generalized to other typesof splines, such as the one we exploit in this article.

    Generally, Structural Equation Models (SEMs) involvetwo specific parts with distinct objectives: a measurement

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    8/19

    Bayesian modeling of patient-centered medical homes   949

    equation and a structural equation (Kline, 2011). In the

    measurement equation, each of the responses y(h) j    loads on

    the latent variables   f (h)

    m   , m = 0, 1, . . . , M (h). The intercept

    term for response h is   f (h)

    0   and the loadings for any of the

    measurements   y(h) j    on this latent variable have a value of 

    one. The other   M (h

    ) factors serve as the slopes for eachpiece on domain x p  defined by the linear splines

    s(h)m, pj   =

    0 if   s p j   ≤ s(h)

    (m−1), p

    s pj  − s(h)

    (m−1), p   if   s(h)

    (m−1), p   s(h)m, p

    .   (17)

    Applying the same  M (h) + 1 pieces, m = 0, 1, . . . , M (h

    ),as above, the measurement equation can be written as

     y(h) j    =   f 

    (h)0   +

    m:s

    (h)m, p ≤s pj 

    s(h

    )m, p − s

    (h)

    (m−1), p

      f (h

    )m

    +

    m:s(h )

    (m−1), p

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    9/19

    950   Shams et al.

    around the origin that imposes very strong shrinkage on thecoefficients and the other part being a wide slab that forcesvery little shrinkage on the coefficients (Ishwaran and Rao,2005). The posterior mixture weights for the spike (or slab)component of a specific coefficient or coefficient batch canbe interpreted as the posterior probability of its exclusion

    from (or inclusion in) the model.According to Section 4.1, we note that any multiresponse

    hierarchical STAR model of form (13) can be written ina unifying form   y = η + ε, where  η = η0 + X 1β1 + · · · +X  pβ p, with  η0  showing offset terms (e.g., grand means of multivariate responses) and effects that are not under selec-tion procedure. Then the conventional spike and slab priorstructure is given by the following hierarchical Bayesianmodel

    β j |δ j , ρ2 j 

    prior∼   N (0, υ2 j ) with υ

    2 j   = ρ

    2 j δ j ,

    δ j |ω

    prior

    ∼   ωI 1(δ j ) + (1 − ω)I v0 (δ j ),   (21)ρ2 j 

    prior∼   −1(aρ , bρ ),

    and ωprior∼   Beta(aω, bω).

    This structure is called a NormalMixture of InverseGammas (NMIG) prior that places a bimodal prior onthe hypervariance υ2 j  of the coefficients that leads to a spike

    and slab–type prior on the STAR coefficient themselves.I z (·) is an indicator function that takes one in  z and zerootherwise and   v0   is a very small positive constant. Thisway,   δ   will be one with probability   ω   and close to zero

    with probability (1 − ω). Hence, the implied prior for (hy-per)variance υ2 j   is a bimodal mixture of inverse gamma dis-

    tributions, with one part focused on very small values—thespike with   δ j   = v0 —and a second diffuse part with moremass on larger values—the slab with  δ  j   = 1. The mixtureweights   ω, in addition, follow a beta prior that capturesany prior knowledge about the sparsity of coefficient   β j (Scheipl et al ., 2012).

    It is found that prior structure (21) does not workwell forcoefficient batches in the STAR models that are associatedwith spline basis functions or random effects. Briefly, theproblem is that a small hypervariance for a batch of coef-ficients entails small coefficient values and  vice versa. This

    problematic dependence between a vector of coefficientsand their associated hypervariances makes the MCMCsampler unlikely to switch between basins of attractionaround the two spike and slab modes. To reduce the de-pendence, a multiplicative parameter expansion for   β j   isrecommended that improves the mixing properties of   δ j and boosts the shrinkage characteristics of the resultingprior compared with Equation (21). The idea is to expand

    β j  as β j  = α j j, where scalar α j prior∼   NMIG

    v0, ω, aρ , bρ

    ,

    is given as Equation (21), and it is independent of   j . El-ements of the  M  j -dimensional vector  j  are then assigned

    as

     jm |r jm  ∼  N (r j m, 1), r jm  ∼1

    2I 1(r j m) +

    1

    2I −1(r jm ),

     j   = 1, . . . , p ; m = 1, . . . , M  j ,   (22)

    which corresponds to a mixture of two i.i.d. Gaussian den-

    sity with mean   ±1 and equal mixture weights. The cur-rent approach resolves the mixing problems of  δ j  since theMarkov blankets of both   δ j   and   ρ j  now include only  α j with dimension one instead of vector β  j .

    The MCMC posterior inference and component se-lection is performed by a blockwise Metropolis within-Gibbs sampler that reduces to a standard Gibbs schemewhen responses are Gaussian (see Appendix A). The fullconditional densities (FCD) for parameters   ω, ρ2 j , δ j ,

    and conditional means   r   =

    r l , l   : 1, . . . ,L

      of normalvariables   | rl   ∼  N (rl , 1) , rl   = ±1 are given in closedform regardless of the choice of exponential familyfor the responses (Appendix A). The full condition-

    als of   α   and     are based on the conditional de-sign matrices   X α  =  X blockdiag

    1, . . . , p

      and   X   =

    X blockdiag

    1e1, . . . , 1epα, where  1e   is a  e × 1 vector of 

    ones and  X  =

    X 1, . . . X  p

    is the concatenation of the de-signs for the model terms as in Equation (2). Under theGaussian assumption of the responses, these are given asfollows

    α|· ∼  N (µα ,α)   where

    α   =

    1

    φX T α X α + diag

    δρ2

    −1−1,µ j   =

    1

    φα X 

    T α y   (23)

    and

    |· ∼  N (µ,) where

     =

    1

    φX T  X  + I

    −1,µ j   =

    1

    φX T  y + r

    .   (24)

    If the response variables are not Gaussian, the penal-ized iteratively reweighted least squares (P-IWLS) is usedwithin a Metropolis–Hastings iteration to sample from  αand   (Scheipl et al ., 2012). The posterior inclusion prob-ability  P 

    δ j   = 1

     y can then be employed to decide uponinsignificant, intermediate, and important model terms.

    5. Application to the PCMH data

    5.1. Description of data set 

    The data we used in this study were gathered from a largenumber of VA medical facilities across the nation that un-dertook a PCMH project as a way to reform their healthcare delivery system. The goal of our study here is to predictpatients’ annual care demands on primary care and nonpri-mary care with the help of patient-level and provider-levelattributes. Since patient data files are recorded separately,we first combined data tables belonging to patient health

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    10/19

    Bayesian modeling of patient-centered medical homes   951

    conditions (such as comorbidities) and patient care uti-lization (such as health care workload) with those tablesassociated with patient demographic and socioeconomicinformation in order to form the patient-level data. Thesepatient-level data were further aggregated to provider-leveland station-level data to create a three-level hierarchical

    structure. At each level of the hierarchy we had a set of riskfactors that were selected based on relevant medical litera-ture and confirmed by a group of VA health professionals.

    We collected a random sample of 10 000 outpatientsfrom 260 VA medical facilities through the nation duringfiscal year 2011–2012. All patient visits to primary care andwomen’s health were assembled for a total capture period of a year. Visits to other primary care–related clinics, such asinternal medicine or geriatric primary care, were excludedfrom the analysis because health services requested by suchvisits are generally not rendered through medical homes;instead they are fulfilled by a specific physician, a licensedpractical nurse, or a registered nurse.

    5.1.1. Study variables

    We identified and calculated two response variables forhealth care workloads generated by each unique patientduring the fiscal year 2011–2012. Particularly, we usedthe Relative Value Unit (RVU) to measure the primarycare and nonprimary care workloads (Dummit, 2009). TheRVU schema has been widely used for reimbursement andeach value is assigned to a particular service (as definedby a coding system called Current Procedural Terminology(CPT)) rendered by a provider. The values were adjustedby geographic regions so that, for example, a 99213 CPTcode (refers to office/other outpatient services) performed

    in Manhattan was worth more than when performed inDallas. Simply put, the primary care RVU represents theresources needed to provide all primary care services of a patient during a year, and nonprimary care RVU refersto all of the non primary care workload during the year,which could be from one or many visits to outpatient careunits. One advantage of using RVUs in our approach asopposed to simple face-to-face visit counts lies in its abilityto further accommodate workloads that are generated bytelephone encounters.

    The predictor variables were organized in three levels:level-1 was the patient level, on which patient’s demo-graphic and socioeconomic attributes are included; level-2was the PCMH team level, on which covariates such asassigned provider’s experience and frequency of times thatthe patient has changed his/her assigned provider werecollected; and level-3 was the VA facility level, on whichonly one continuous covariate, ZIP code–based distancebetween patient’s home and his/her assigned facility, wascollected. The detailed descriptions of the variables anddata types along with their summary statistics are shown inTable 1. In the table, enrollment priority is assigned basedon the veteran’s severity of service-related disabilities andthe VA income means test: groups 1, 2, 3 are generally veter-

    ans with service-related disabilities of > 50%, between 30%and 50%, and between 20% and 30%, respectively; 4, catas-trophically disabled veterans; 5, low income or Medicaid;6, Agent Orange or Gulf War veterans; 7, non-service con-nected with income being below HUD (The U.S. Depart-ment of Housing and Urban Development); and 8, non-

    service connected with income being above HUD. Careassessment need score is a general illness severity measureranging from 0 (lowest risk) to 99 (highest risk) that reflectsthe likelihood of hospitalization or death. Accxx indica-tors are aggregated condition categories determined basedon the various ICD-9-CM (International Classification of Disease, ninth version, Clinical Modification) codes thatare assigned to a patient at each visit during the fiscal year2011–2012. Note that acc codes are not mutually exclusiveas most patients have more than one acc assigned during ayear. Acc 28 is related to neonatal diseases and is absent inthe studied population.

    5.1.2. Descriptive statistics

    Descriptive statistics are also summarized in Table 1. Inbrief, the vast majority of the outpatients were male andelderly, living near their assigned VA medical facility. Themost commonly occurring condition was screening (about92%), followed by nutritional diseases (about 70%) andheart diseases (about 66%). In order to find the distribu-tion of the response variables, we built Quantile–Quantile(QQ) plots of primary care and nonprimary care RVUsagainst parametric densities with positive support such asGaussian, lognormal, chi-squared, gamma, and Weibull.In addition to QQ plots, we checked the approximate fit by

    the maximum likelihood method. Based on both criteria,the lognormal distribution was found to be the most appro-priate choice for both responses. The QQ plots for primarycare and nonprimary care RVUs along with bootstrappedpoint-wise confidence envelopes at a 0.95 accuracy rate aredisplayed in Fig. 1 and Fig. 2, respectively.

    5.1.3. Preprocessing

    To preempt numerical problems in model fitting, the datawere preprocessed as follows.

    1. Missing values in CAN score, provider’s experience, pa-tient marital status, and facility distance were imputedwith the hot-deck method (Andridge and Little, 2010).

    2. Error values in age (e.g., greater than 130) and dis-tance (e.g., greater than 1500 miles) were identified andremoved.

    3. The scale of distance and length of stay were changed toa natural logarithm since their distributions are stronglypositively skewed (skewness greater than two), which canlead to volatile estimation results on a standard scale.

    Following these steps, the number of records was reducedto 9935.

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    11/19

    952   Shams et al.

    Table 1. Description of the predictors and response variables included in the study (n  = 10 000)

    Variable Description Summary statistics

    Patient-level predictorsgen Gender Male (93.55%), Female (6.45%)age Age (years) Mean: 61.74, StdDev: 15.19, Min: 20, Max: 98

    mar Marital status Married (55.41%), Not married (15.43%),Previously married (28.62%), Unknown(0.54%)

    ins Insurance status Insured (58.46%), Not insured (41.54%)emp Employment status Active military service (0.15%), Employed full

    time (21.03%), Employed part time (5.07%),Not employed (37.18%), Retired (33.19%),Self-employed (2.41%), Unknown (0.97%)

    prio Enrollment priority Group 1 (24.86%), Group 2 (8.23%), Group 3(12.35%), Group 4 (2.78%), Group 5(27.58%), Group 6 (4.44%), Group 7 (2.62%),Group 8 (17.14%)

    los Length of stay (days) Mean: 0.88, StdDev: 6.18, Min: 0, Max: 210can Care assessment need score Mean: 55, StdDev: 28.01, Min: 0, Max: 99

    team-ind Index of PCMH team 1301 categories; 800 000 054 (0.06%),1000 003 172 (0.05%), . . .acc1-ind Has been diagnosed with infectious or

    parasitic condition?Yes (12.45%), No (87.55%)

    acc2-ind Has been diagnosed with malignantneoplasm?

    Yes (10.28%), No (89.72%)

    acc3-ind Has been diagnosed with benign/insitu/uncertain neoplasm?

    Yes (10.79%), No (89.21%)

    acc4-ind Has been diagnosed with diabetes? Yes (28.7%), No (71.3%)acc5-ind Has been diagnosed with nutritional or

    metabolic disease?Yes (70.01%), No (29.99%)

    acc6-ind Has been diagnosed with liver disease? Yes (5.07%), No (94.93%)acc7-ind Has been diagnosed with gastrointestinal

    condition?Yes (33.84%), No (66.16%)

    acc8-ind Has been diagnosed with musculoskeletal orconnective tissue condition?

    Yes (59.93%), No (40.07%)

    acc9-ind Has been diagnosed with hematologicalcondition?

    Yes (10.22%), No (89.78%)

    acc10-ind Has been diagnosed with cognitive disorders? Yes (5.25%), No (94.75%)acc11-ind Has been diagnosed with substance abuse? Yes (23.25%), No (76.75%)acc12-ind Has been diagnosed with mental condition? Yes (37.38%), No (62.62%)

    acc13-ind Has been diagnosed with developmentaldisability?

    Yes (0.89%), No (99.11%)

    acc14-ind Has been diagnosed with neurologicalcondition?

    Yes (16.73%), No (83.27%)

    acc15-ind Has been diagnosed with cardio-respiratoryarrest?

    Yes (1.4%), No (98.6%)

    acc16-ind Has been diagnosed with heart disease? Yes (66.36%), No (33.64%)

    acc17-ind Has been diagnosed with cerebrovascularcondition?

    Yes (6.25%), No (93.75%)

    acc18-ind Has been diagnosed with vascular condition? Yes (11.69%), No (88.31%)acc19-ind Has been diagnosed with lung disease? Yes (18.12%), No (81.88%)

    acc20-ind Has been diagnosed with eye condition? Yes (38.66%), No (61.34%)acc21-ind Has been diagnosed with ear, nose, and throat

    condition?Yes (37.82%), No (62.18%)

    acc22-ind Has been diagnosed with urinary systemdisease?

    Yes (15.89%), No (84.11%)

    acc23-ind Has been diagnosed with genital systemdisease?

    Yes (21.9%), No (78.1%)

    (Continued on next page)

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    12/19

    Bayesian modeling of patient-centered medical homes   953

    Table 1. Description of the predictors and response variables included in the study (n  = 10 000) (Continued )

    Variable Description Summary statistics

    acc24-ind Has been diagnosed withpregnancy-related condition?

    Yes (0.19%), No (99.81%)

    acc25-ind Has been diagnosed with skin or

    subcutaneous condition?

    Yes (23.69%), No (76.31%)

    acc26-ind Has been diagnosed with injury,poisoning, or complications?

    Yes (15.09%), No (84.91%)

    acc27-ind Has been diagnosed with symptoms,signs, or ill-defined conditions?

    Yes (59.95%), No (40.05%)

    acc29-ind Has been diagnosed with transplants,openings, or amputations condition?

    Yes (1.44%), No (98.56%)

    acc30-ind Has been diagnosed withscreening/history?

    Yes (92.08%), No (7.92%)

    Team-level predictorsfac-ind Index of PCMH facility 260 categories; Dallas VA Medical Center

    (0.84%), San Diego Community-basedOutpatient Clinic (0.64%), . . .

    prov.pos Assigned provider position Primary care physician (68.74%), Nurse

    practitioner (15.90%), Attending physician(8.87%), Assistant physician (6.49%)prov.exp Assigned provider experience (years) Mean: 8.55, StdDev: 7.79, Min: 0, Max: 41prov.chng # times the patient has changed his/her

    assigned providerMean: 0.75, StdDev: 0.90, Min: 0, Max: 9

    prov.fte Provider full time equivalent Mean: 0.85, StdDev: 0.24, Min: 0, Max: 1

    Facility-level predictorsfac.dist Distance between patient’s home and

    his/her assigned facility (miles)Mean: 79.910, StdDev: 744.3, Min: 0.018,

    Max: 12430Patient-level responses

    pcrvu Primary care relative value unit Mean: 3.96, StdDev: 2.82, Min: 0.17, Max:36.74

    npcrvu Nonprimary care relative value unit Mean: 14.93, StdDev: 22.83, Min: 0.06,Max: 371.6

    Fig. 1. QQ plot of primary care RVU with 95% confidence bands.   Fig. 2.  QQ plot of nonprimary care RVU with 95% confidencebands.

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    13/19

    954   Shams et al.

    5.2. Modeling 

    We used natural logarithm transformation for both re-sponse variables (primary care relative value unit or“pcrvu” and nonprimary care relative value unit or“npcrvu”) in order to convert them into Gaussian form. Wedistinguished four levels of hierarchy: responses (level-1)

    were nested in  patienti   (level-2), patients were nested inPCMH  team j  (level-3), and PCMH teams were nested inVA medical   facilityk  (level-4). The following four-level hi-erarchical STAR model was formulated:

    Level − 1 :  y(h)i jk   = d 

    (1)i j kln ( pcrvu) + d 

    (2)i j kln (npcrvu)

    Level − 2 :  ln ( pcrvu) = 1η(1)

    0   +   f  (1)

    1   (age) +   f  (1)

    2   (los)

    + f  (1)3   (can ) +   f  

    (1)4   (age ) acc1 + · · · +   f  

    (1)32  (age )

    acc30 +   f  (1)33  (can , los) + · · · + V 

    (1)γ (1) + ε(1)

    = 1η(1)0   + X 

    (1)1   β

    (1)1   + · · · + V 

    (1)γ (1) + ε(1)

    Level − 2 :  ln (npcrvu) = 1η(2)

    0   + X (2)

    1   β(2)

    1

    + · · · + V (2)γ (2) + ε(2)

    Level − 3 :  η(1)

    0   = 1η(1)

    0,0 +   f  (1)

    0,1 ( pr ov.exp) +   f  (1)

    0,2 ( pr ov. f te)+ f  

    (1)0,3 ( pr ov.chng) +   f  

    (1)0,4 ( pr ov. pos )

    + f  (1)0,5 ( pr ov.exp) pr ov. pos + · · · + V 

    (1)0   γ 

    (1)0   + ε

    (1)0

    = 1η(1)

    0,0 + X (1)

    0,1β(1)

    0,1 + · · · + V (1)

    0   γ (1)

    0   + ε(1)

    0

    Level − 3 :  η(2)0   = 1η

    (2)0,0 + X 

    (2)0,1β

    (2)0,1 + · · · + V 

    (2)0   γ 

    (2)0   + ε

    (2)0

    Level − 3 :  β(1)1   =   f  

    (1)3,1 ( pr ov.exp) + V 

    (1)3   γ 

    (1)3   + ε

    (1)3

    =  X (1)

    3,1β(1)

    3,1 + V (1)

    3   γ (1)

    3   + ε(1)

    3

    Level − 3 :  β(2)1   =   f  

    (2)3,1 ( pr ov.exp) + V 

    (2)3   γ 

    (2)3   + ε

    (2)3

    =  X (2)3,1β

    (2)3,1 + V 

    (2)3   γ 

    (2)3   + ε

    (2)3

    Level − 4 :  η(1)0,0  =  V 

    (1)0,0γ 

    (1)0,0 + ε

    (1)0,0

    Level − 4 :  η(2)0,0  =  V 

    (2)0,0γ 

    (2)0,0 + ε

    (2)0,0

    .

    (25)

    The top-level equation contains the two responses. Thelevel-2 equations are STAR models for logged primary andnonprimary care workloads that are regressed on possiblynonlinear effects of patient age, care assessment need score,and length of stay using Psplines. We also included interac-tion effects between age, CAN score, priority, and all dis-ease types and between CAN score and length of stay witha two-dimensional surface. The categorical covariates onthe patient level along with their possible interactions wereencoded as dummy variables and subsumed in V (·) with pa-rameters γ (·). Note that here we used the same set of effectsfor both response regressions, but this may change in other

    applications with a bivariate response. The first and secondlevel-3 equations model patient specific variables offset bythe team-level covariates such as provider experience and itsinteraction with provider position plus random intercepts

    ε(·)0   . In addition, the linear or index terms on this level, such

    as provider position, are included in  V (·)0  . The third and the

    fourth level-3 equations model slope specific heterogeneity

    of age plus additional linear terms  V (·)3  and random slopes

    ε(·)

    3   . Finally, team-specific intercepts were modeled throughlevel-4 equations containing the logarithm of average facil-

    ity distance V (·)0,0 and facility random intercepts  ε

    (·)0,0.

    5.3. Analyses

    We performed sensitivity analyses for componentselection with regards to different hyperparametersettings; i.e.,   v0  = 0.00 025, 0.005, 0.01 and (aρ , bρ ) =(5, 25), (5, 50), (10, 35). We also evaluated the predictionperformance of models with and without higher-level hier-

    archies based on deviance values obtained for a test subsetcontaining 1000 observations.

    5.4. Results

    The maximal model contained approximately 121 modelterms with 640 coefficients in total. The hyperparame-ters were set to (aω, bω) = (1, 1), (aρ , bρ ) = (5, 25), andv0  = 0.00 025. Since we convert our responses to Gaus-sian, a very flat hyperprior φ  ∼ −1(10−4, 10−4) was chosenfor the error variance. The estimates were constructed onMCMC samples from 10 parallel chains with a burn-in run

    of 1000 iterations each, followed by a sampling phase of 15 000 iterations, with every 10th iteration used. For mod-eling smooth terms we used cubic Pspline basis functionswith 20 equidistant inner knots over the range of the co-variates plus second-order difference penalties penalizingdeviations from linearity. For linear/polynomial terms weused orthogonal basis functions of the associated degreewithout an intercept. For modeling index effects we em-ployed dummy variables with sum to zero contrasts. Thecorrelation structures of the random effects (“teamind” and“facind”)were setto identity here, but more complex classessuch as autoregressive or spatial correlations can also beapplied.

    The model terms with posterior inclusion probabilityP (δ j   = 1| y) greater than 0.10 are listed in Table 2 for theprimary care RVU and in Table 3 for the nonprimary care

    Table 2.  Posterior means of marginal inclusion probabilities forprimary care relative value unit

    Term P  

    δ j   = 1 y

    Team, random intercept 1.000Facility, random intercept 0.762Care assessment need score, linear 0.614Has been diagnosed with

    screening/history, factor

    0.583

    Has been diagnosed with symptoms,signs, or ill-defined conditions, factor

    0.335

    Has been diagnosed with nutritional ormetabolic disease, factor

    0.309

    Has been diagnosed withmusculoskeletal or connective tissuecondition, factor

    0.282

    Has been diagnosed with heart disease,factor

    0.247

    Marital status 0.105Linear (age) : smooth (prov.exp) 0.100

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    14/19

    Bayesian modeling of patient-centered medical homes   955

    Table 3.  Posterior means of marginal inclusion probabilities fornonprimary care RVU

    Term P  

    δ j   = 1  y

    Team, random intercept 1.000Care assessment need score, linear 1.000

    Age, linear 1.000Age, smooth 1.000Has been diagnosed with benign/in

    situ/uncertain neoplasm, factor1.000

    Has been diagnosed with eye condition,factor

    1.000

    Has been diagnosed with mentalcondition, factor

    0.999

    Has been diagnosed with skin orsubcutaneous condition, factor

    0.998

    Facility, random intercept 0.998Care assessment need score, smooth 0.997Has been diagnosed with symptoms,

    signs, or ill-defined conditions, factor0.993

    Has been diagnosed withscreening/history, factor

    0.978

    Has been diagnosed with malignantneoplasm, factor

    0.976

    Enrollment priority, factor 0.953Has been diagnosed with ear, nose, and

    throat condition, factor0.946

    Has been diagnosed with injury,poisoning, or complications, factor

    0.851

    Has been diagnosed withmusculoskeletal or connective tissuecondition, factor

    0.805

    Linear (care assessment need score) :factor (enrollment priority)

    0.537

    Marital status 0.394

    RVU. Compared with the nonprimary care RVU, the modelfor the primary care RVU is rather sparse, with only 10terms with inclusion probability larger than 0.10. In bothmodels, including the team and facility random interceptsaccounting for hierarchical heterogeneity turns out to beimperative. Four other terms are also common in the twomodels; that is, linear part of CAN score, marital status,whether the patient has been diagnosed with a muscu-loskeletal or connective tissue condition, and whether thepatient has had a screening or history of disease. In termsof disease variables, the nonprimary care additive predic-tor is almost entirely dominated by cancer, eye, mental,skin, ear/nose/throat, and injury/poisoning, whereas nu-trition/metabolic and heart diseases are more prominentin the primary care additive predictor. The posterior meanof the nonparametric additive predictor η associated with anumber of selected effects along with 90% credible intervalsare illustrated in Figs 3 to 5 for the primary care RVU andin Figs. 6 to 9 for the nonprimary care RVU. As shown inFig. 3, the care assessment need score effect on the primarycare RVU is increasing from about   −0.2 to   +0.2 with a

    Fig. 3. Linear (top) and nonlinear (bottom) effects of care assess-ment need score on the primary care predictor with 90% credibleintervals.

    zero effect around 50. However, on the nonprimary careRVU, the CAN score has a greater effect changing from−1 to +1 (Fig. 6). The effects of comorbidities are shownin Figs. 4 and 8. As expected, having a comorbid condi-tion is always associated with greater clinical workload inboth primary and nonprimarycare settings. The interaction

    Fig. 4.  Effects of different comorbid conditions on the primarycare predictor with 90% credible intervals.

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    15/19

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    16/19

    Bayesian modeling of patient-centered medical homes   957

    Fig. 9. Interaction effects of careassessment needscore and enroll-ment priority (top), age and assigned provider position (bottom)on the non-primary care predictor with 90% credible intervals.

    In the test set containing 1000 independent patients, theselected covariate set was the same as in Table 2 for the pri-mary care RVU, except that there was no interaction effectidentified; for the nonprimary care workload prediction,the model included exactly the same terms as shown inTable 3. This finding ensured the stability of our approachand reinforced its internal validity (or reproducibility) with

    related samples underlying the same population.We then performed predictive performance evaluation

    with different hyper-parameter settings. To this end, the

    mean posterior deviance (1/T )T 

    t   −2l ( y|η(t), φ(t)), the av-

    erage of twice the negative loglikelihood of the observationsover the saved MCMC iterations, was calculated and saved.The obtained results confirm that the prediction accuracy isvery robust across all the parameter combinations for bothprimary care and nonprimary care workloads. However,variable selection is sensitive to varying hyper-parameters,especially to the choice of   v0. Generally, we observe thatvery small values of  v0 allow small effects to be included inthe model, whereas larger values of  v0  perform more con-servatively. The model sparsity is found to be more sensitivewith regard to v0  than toward (aρ , bρ ).

    Examining hierarchical versus nonhierarchical model-ing, we notice that the mean posterior deviance is muchsmaller when we include random intercepts from level-3and level-4 hierarchies. Specifically, for the primary careRVU in the test set the reductions in deviance are 186 and53 units with regards to the team and facility intercepts,respectively with the null deviance equal to 1932. For thenonprimary care workload these cuts are found to be 197and 64 units. Ignoring the hierarchical structure of data,

    Fig. 10. Interaction effects of length of stay and enrollment prior-ity (top) and effect of marital status (bottom) on the nonprimarycare predictor with 90% credible intervals. (nmarried and ‘pmar-ried’ stand for not married and previously married).

    which introduces nested correlations among observations,can result in a biased prediction of both outcomes. Finally,in terms of prediction quality, the reduced models con-sisting of the selected covariates produce about 68% and73% predictive  R2 (see Appendix B) for the primary careworkload and the nonprimary care workload, respectively,showing a practically good fit.

    6. Conclusions

    In this article, we proposed a Bayesian function selectionapproach based on spike and slab priors for the hierar-chical structured additive models with a multivariate re-sponse. The prior setting adopted in our work is a Bayesianhierarchical structure with a bimodal density on the hy-pervariance of the coefficient blocks with one part beinga narrow spike around the origin and the other part beinga wide slab. We demonstrated how one can parameterize aspecial class of multiresponse hierarchical structured addi-tive model—that is, a multivariate linear multilevel splinemodel—within a standard structural equation modelingframework and thus bridge the connection between mul-tivariate multilevel STAR models and generalized latentvariable models.

    We then applied our methods to PCMH data obtainedfrom a large number of VA medical facilities during fiscalyear 2011–2012. Our work is the first attempt to developa portfolio-based demand prediction model for PCMHwithin the OR/MS or IE community. We aggregatedthree levels of hierarchical data including information from

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    17/19

    958   Shams et al.

    outpatients, the medical team responsible for providing careto the patients, and the VA facilities. We found that the setsof chosen predictors introduced by the model are differentfor the primary care and the nonprimary workloads. Ourfindings also confirmed that taking hierarchical heterogene-ity into account is associated with better prediction accu-

    racy, especially when the data have more than two levels.Some methodological directions based on our approach

    can be investigated in future research. One challenging ex-tension would be to develop Bayesian model choice andcomponent selection for multiresponse hierarchical autol-ogistic or auto Poisson regressions particularly used inecology or hierarchical seemingly unrelated regressions ineconomics. Another direction would be Bayesian functionselection in semiparametric quantile regressions with non-normal random effects modeled by a Dirichlet processmixture.

    Acknowledgement

    The authors thank the editor and three anonymous refereesfor their constructive comments.

    Funding

    This research is supported by the National Science Foun-dation, Division of Civil, Mechanical, and ManufacturingInnovation (CMMI) under grant number 1233504.

    References

    Ajorlou, S., Shams, I. and Yang, K. (2014) An analytics approach todesigning patientcentered medicalhomes. Health Care Management

    Science, 18(1), 3–18.

    Andridge, R.R.and Little,R.J. (2010) A reviewof hotdeck imputationforsurvey non-response. International Statistical Review, 78(1), 40–64.

    Antoniadis, A. and Fan, J. (2001) Regularization of wavelet approxi-mations.   Journal of the American Statistical Association,  96(455),

    939–967.Bitton, A., Martin, C. and Landon, B.E. (2010) A nationwide survey of 

    patient centered medical home demonstration projects.  Journal of General Internal Medicine, 25(6), 584–592.

    Breiman, L. (1995) Better subset regression using the nonnegative gar-

    rote. Technometrics, 37(4), 373–384.Breiman, L. (1996) Heuristics of instability and stabilization in model

    selection. Annals of Statistics, 24(6), 2350–2383.

    Brown, P.J., Vannucci, M. and Fearn, T. (1998) Multivariate Bayesianvariable selection and prediction.   Journal of the Royal Statistical 

    Society: Series B (Statistical Methodology), 60(3), 627–641.Cai, J., Fan, J., Li, R. and Zhou, H. (2005) Variable selection for multi-

    variate failure time data. Biometrika, 92(2), 303–316.

    Chen, Y., Du, P. and Yuedong, W. (2013) Variable selection in linearmodels. Computational Statistics, 6(1), 1–9.

    Claeskens, G. and Hjort, N.L. (2003) The focused information criterion.Journal of the American Statistical Association, 98(464), 900–916.

    Dummit, L. (2009) Relative value units (RVUs). Available at http://www.nhpf.org/library/the-basics/Basics RVUs 02-12-09.pdf. Ac-cessed on March 18, 2014.

    Fahrmeir, L., Kneib, T. and Lang, S. (2004) Penalized structured additiveregression for space-time data: a Bayesian perspective.  Statistica

    Sinica, 14(3), 731–762.

    Fan, J. and Li, R. (2001) Variable selection via nonconcave penalizedlike-lihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.

    Fan, J. and Lv, J. (2010) A selective overview of variable selection in high

    dimensional feature space. Statistica Sinica, 20(1), 101–148.Fisher, E.S. (2008) Building a medical neighborhood for the medical

    home. New England Journal of Medicine, 359(12), 1202–1205.

    George, E.I. and McCulloch, R.E. (1993) Variable selection via Gibbssampling. Journal of the American Statistical Association , 88(423),

    881–889.Goldstein, H. (2011) Multilevel Statistical Models, 4th edition, John Wi-

    ley & Sons, Sussex, UK.Hastie, T. and Tibshirani, R. (1993) Varying-coefficient models. Journal 

    of the Royal Statistical Society : Series B (Statistical Methodology),

    55(4), 757–796.Huang, J., Horowitz, J.L. and Wei, F. (2010) Variable selection in

    nonparametric additive models.   Annals of Statistics,  38(4), 2282–2313.

    Ishwaran, H. and Rao, J.S. (2005) Spike and slab variable selection:frequentist and Bayesian strategies. Annals of Statistics, 33(2), 730–773.

    Kammann, E. and Wand, M.P. (2003) Geoadditivemodels. Journal of theRoyal Statistical Society: Series C  (Applied Statistics), 52(1), 1–18.

    Klein, S. and Fund, C. (2011)  The Veterans Health Administration: Im- plementing Patient-Centered Medical Homes in the Nation’s Largest

    Integrated Delivery System, Commonwealth Fund.

    Kline, R.B. (2011) Principles and Practice of Structural Equation Model-ing, 3rd edition, Guilford Press, New York.

    Kundu, S. and Dunson, D.B. (2014) Bayes variable selection in semipara-metric linear models. Journal of the AmericanStatistical Association,

    109(505), 437–447.Lang, S. and Brezger, A. (2004) Bayesian P-splines. Journal of Computa-

    tional and Graphical Statistics, 13(1), 183–212.

    Lang, S., Umlauf, N., Wechselberger, P., Harttgen, K. and Kneib, T.(2013) Multilevel structured additive regression. Statistics and Com-

     puting, 24(2), 223–238.Liang, F., Paulo, R., Molina, G., Clyde, M.A. and Berger, J.O. (2008)

    Mixtures of  g  priors for Bayesian variable selection.  Journal of the

    American Statistical Association, 103(481), 410–423.Lin, Y. and Zhang, H.H. (2006) Component selection and smoothing in

    multivariate nonparametric regression.  Annals of Statistics,  34(5),2272–2297.

    Marra, G. and Wood, S.N. (2011) Practical variable selection for gener-alized additive models.  Computational Statistics & Data Analysis,55(7), 2372–2387.

    Müller, S., Scealy, J. and Welsh,A. (2013) Model selection in linear mixedmodels. Statistical Science, 28(2), 135–167.

    O’Hara, R.B. and Sillanpää, M.J. (2009) A review of Bayesian variableselection methods: what, how and which.  Bayesian Analysis,  4(1),85–117.

    Pu, W. and Niu, X.-F. (2006) Selecting mixed-effects models based on ageneralized information criterion. Journal of Multivariate Analysis,

    97(3), 733–758.

    Rabe-Hesketh, S., Skrondal, A. and Pickles, A. (2004) Generalized multi-level structural equation modeling. Psychometrika, 69(2), 167–190.Reid, R.J., Coleman, K., Johnson, E.A., Fishman, P.A., Hsu, C., So-

    man, M.P., Trescott, C.E., Erikson, M. and Larson, E.B. (2010)

    The group health medical home at year two: cost savings, higherpatient satisfaction, and less burnout for providers.  Health Affairs,

    29(5), 835–843.Rittenhouse, D.R. and Shortell, S.M. (2009) The patient-centered med-

    ical home: will it stand the test of health reform?   Journal of theAmerican Medical Association, 301(19), 2038–2040.

    Scheipl, F., Fahrmeir, L. and Kneib, T. (2012) Spike-and-slab priors for

    function selection in structured additive regression models. Journal of the American Statistical Association, 107(500), 1518–1532.

    Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der Linde, A. (2002)Bayesian measures of model complexity and fit.   Journal of the

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    18/19

    Bayesian modeling of patient-centered medical homes   959

    Royal Statistical Society : Series B (Statistical Methodology), 64(4),583–639.

    Stange, K.C., Nutting, P.A., Miller, W.L., Jaén, C.R., Crabtree, B.F.,Flocke, S.A. and Gill, J.M. (2010) Defining and measuring

    the patient-centered medical home.   Journal of General Internal Medicine, 25(6), 601–612.

    Steele, F. (2008) Multilevel models for longitudinal data.  Journal of the

    Royal Statistical Society: Series A   (Statistics in Society),   171(1),

    5–19.Tibshirani, R. (1996) Regression shrinkage and selection via the lasso.

    Journal of theRoyalStatisticalSociety: Series B (StatisticalMethod-ology), 58(1), 267–288.

    Appendices

    Appendix A

    The MCMC algorithm in Section 4.3 is described as follows.

    Initiate  ρ2(0), δ(0), φ(0), ω(0), and  β (0) (via IWLS if the response is non-Gaussian)

    Calculate  α (0),(0), X (0)αfor  iterations t  = 1, . . . , T  do

    for blocks b = 1, . . . ,bα  do

    update α(t)b

      by its FCD (formula (23) if Gaussian or IWLS if non-Gaussian)

    set  X (t)   =  X blockdiag

    1e1, . . . , 1ep

    α(t)

    update r(t)1   , . . . , r

    (t)L

      via their FCD: P 

    r(t)l    = 1

    · = 1 + exp−2(t)l  −1 , l  = 1, . . . ,Lfor blocks b = 1, . . . ,b do

    update (t)b

      by its FCD (Equation (24) if Gaussian or IWLS if non-Gaussian)for model terms   j   = 1, . . . , p  do

    rescale (t) j    and α

    (t) j    by  j   →

      M  j M  j i    | ji |

     j   and α j   →M  j 

    i    | ji |M  j 

    α j 

    set  X (t)α   =  X blockdiag

    (t)1   , . . . ,

    (t) p

    update ρ

    2(t)1   , . . . , ρ

    2(t) p   from their FCD:   ρ

    2(t) j  · ∼ −1 aρ  +   12 , bρ  +

      α2(t) j 

    2δ(t) j 

    update δ(t)1   , . . . , δ

    (t) p   from their FCD:

    δ(t) j   =1

    ·P 

    δ(t) j   =v0

    ·   = v1/20   exp

    (1−v0)

    2v0

    α2(t) j 

    ρ2(t) j 

    update ω(t) from its FCD:   ω(t) · ∼ Beta

    aω +

     p j 

    I 1

    δ

    (t) j 

    , bω +

     p j 

    I v0

    δ

    (t) j 

    if   y is Gaussian then

    update φ (t) from its FCD:   φ(t) · ∼ −1

    aφ  +

      n2

    , bφ  +

    ni 

     yi −η

    (t)i 

    22

  • 8/19/2019 Bayesian Component Selection in Multi Response Hierarchical Structured Additive Models With an Application to Cli…

    19/19

    960   Shams et al.

    Appendix B

    The predictive R2 can be defined similar to model-basedR2 in order to assess the linear correlationbetween outcome yo and its prediction  y p. It is bounded to the interval [0, 1].If we denote the arithmetic means of the observed and

    predicted outcomes as ¯  yo and ¯  y p, respectively, the predictiveR2 is given as

    R2 p  =

    ni 

     y p,i  −  ¯  y p

    ( yo,i  −  ¯  yo) n

     y p,i  −  ¯  y p

    2 ni   ( yo,i  −  ¯  yo)

    2

    Note that, unlike the model-based   R2, the   R2 p   can-not be interpreted as the percentage of variance ex-plained because the decomposition of variance hold-ing for estimated values ˆ y   does not apply for predictedvalues  y p.

    Biographies

    Issac Shams is a postdoctoral research fellow in the Departmentof Industrial and Operations Engineering at the University of Michigan. He received his B.Sc. and M.Sc. in Industrial Engineering

    from Iran Universityof Science andTechnology in 2008 and2011respec-tively, and his Ph.D. in Industrial and Systems Engineering from Wayne

    State University in 2014. His research interests include healthcare-drivenstatistical modeling, statistical network analysis, and statistical learning

    for knowledge discovery and process improvement. He is a member of ASQ, IIE, INFORMS, ASA, and IMS.

    Saeede Ajorlou is a postdoctoral visiting scholar in the Departmentof Industrial and Operations Engineering at the University of Michi-

    gan. She received her B.Sc. in Computer Engineering from Mazan-daran University of Science and Technology in 2007, her M.Sc. inIndustrial Engineering from Iran University of Science and Technol-

    ogy in 2008, and her Ph.D. in Industrial and Systems Engineeringfrom Wayne State University in 2014. Her research focus is on the

    application and development of operations research methods in mod-eling and control of stochastic systems in healthcare operations and

    production and operations management. She is a member of IIE andINFORMS.

    Kai Yang is a Professor in the Department of Industrial and System

    Engineering and the Director of Healthcare Systems Engineering Groupat Wayne State University. His areas of research include statistical meth-

    ods in quality and reliability, healthcare systems engineering, and en-gineering design methodologies. His research has been funded by such

    organizations as NSF, VA, GM, Ford, and Siemens. He is currently aleading faculty member in the U.S. Veteran Administration’s (VA) Cen-ter for Applied System Engineering, which is a nationwide VA initia-

    tive to use industrial engineering to improve healthcare industry since2009. In that project he is leading many studies involving healthcare ac-

    cess improvement, healthcare data analytics, readmission reduction, real-time location system in healthcare, and patient-centered medical homes.

    He obtained both his M.S. and Ph.D. degrees from the University of Michigan.