Multivariate dynamical systems models for estimating causal ...med.stanford.edu/content/dam/sm/scsnl/documents/Ryali...Third, causal interactions between brain regions 31 can change

1Q1

2

34567

8

910111213141516171819202122232425

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

NeuroImage xxx (2010) xxx–xxx

YNIMG-07670; No. of pages: 17; 4C:

Contents lists available at ScienceDirect

NeuroImage

j ourna l homepage: www.e lsev ie r.com/ locate /yn img

Multivariate dynamical systems models for estimating causal interactions in fMRI

Srikanth Ryali a,⁎, Kaustubh Supekar b,c, Tianwen Chen a, Vinod Menon a,d,e,⁎a Department of Psychiatry & Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USAb Graduate Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, CA 94305, USAc Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA 94305, USAd Program in Neuroscience, Stanford University School of Medicine, Stanford, CA 94305, USAe Department of Neurology & Neurological Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA

⁎ Corresponding authors. Department of PsychiatryWelch Rd, Room 201, Stanford University School of MediUSA. Fax: +1 650 736 7200.

E-mail addresses: [email protected] (S. Ryali), me

1053-8119/$ – see front matter © 2010 Published by Edoi:10.1016/j.neuroimage.2010.09.052

Please cite this article as: Ryali, S., et al., M(2010), doi:10.1016/j.neuroimage.2010.09.

a b s t r a c t
a r t i c l e i n f o
26

27

28

29

30

31

32

33

34

35

36

37

38

Article history:Received 5 June 2010Revised 15 September 2010Accepted 21 September 2010Available online xxxx

Keywords:CausalityDynamical systemsVariational BayesBilinearExpectation maximizationKalman smootherDeconvolution

39

40

41

42

43

44

45

Analysis of dynamical interactions between distributed brain areas is of fundamental importance forunderstanding cognitive information processing. However, estimating dynamic causal interactions betweenbrain regions using functional magnetic resonance imaging (fMRI) poses several unique challenges. For one,fMRI measures Blood Oxygenation Level Dependent (BOLD) signals, rather than the underlying latentneuronal activity. Second, regional variations in the hemodynamic response function (HRF) can significantlyinfluence estimation of casual interactions between them. Third, causal interactions between brain regionscan change with experimental context over time. To overcome these problems, we developed a novel state-space Multivariate Dynamical Systems (MDS) model to estimate intrinsic and experimentally-inducedmodulatory causal interactions between multiple brain regions. A probabilistic graphical framework is thenused to estimate the parameters of MDS as applied to fMRI data. We show that MDS accurately takes intoaccount regional variations in the HRF and estimates dynamic causal interactions at the level of latent signals.We develop and compare two estimation procedures using maximum likelihood estimation (MLE) andvariational Bayesian (VB) approaches for inferring model parameters. Using extensive computer simulations,we demonstrate that, compared to Granger causal analysis (GCA), MDS exhibits superior performance for awide range of signal to noise ratios (SNRs), sample length and network size. Our simulations also suggest thatGCA fails to uncover causal interactions when there is a conflict between the direction of intrinsic andmodulatory influences. Furthermore, we show that MDS estimation using VB methods is more robust andperforms significantly better at low SNRs and shorter time series than MDS with MLE. Our study suggests thatVB estimation of MDS provides a robust method for estimating and interpreting causal network interactionsin fMRI data.

46

& Behavioral Sciences, 780cine, Stanford, CA 94305-5778,

[email protected] (V. Menon).

lsevier Inc.

ultivariate dynamical systems models for e052

© 2010 Published by Elsevier Inc.

4748

64

65

66

67

68

69

70

71

72

73

74

75

76

77

Introduction

Functional magnetic resonance imaging (fMRI) has emerged as apowerful tool for investigating human brain function and dysfunction.fMRI studies of brain function have primarily focused on identifyingbrain regions that are activated during performance of perceptual orcognitive tasks. There is growing consensus, however, that localiza-tion of activations provides a limited view of how the brain processesinformation and that it is important to understand functionalinteractions between brain regions that form part of a neurocognitivenetwork involved in information processing (Bressler and Menon,2010; Friston, 2009c; Fuster, 2006). Furthermore, evidence is nowaccumulating that the key to understanding the functions of anyspecific brain region lies in disentangling how its connectivity differs

78

79

80

81

82

from the pattern of connections of other functionally related brainareas (Passingham et al., 2002). A critical aspect of this effort is tobetter understand how causal interactions between specific brainareas and networks change dynamically with cognitive demands(Abler et al., 2006; Deshpande et al., 2008; Friston, 2009b; Goebel etal., 2003; Mechelli et al., 2003; Roebroeck et al., 2005; Sridharan et al.,2008). These and other related studies in the literature highlight theimportance of dynamic causal interactions for understanding brainfunction at the systems level.

In recent years, several methods have been developed to estimatecausal interactions in fMRI data (Deshpande et al., 2008; Friston et al.,2003; Goebel et al., 2003; Guo et al., 2008; Rajapakse and Zhou, 2007;Ramsey et al., 2009; Roebroeck et al., 2005; Seth, 2005; Smith et al.,2009; Valdes-Sosa et al., 2005). Of these, Granger causal analysis (GCA)(Roebroeck et al., 2005; Seth, 2005) and dynamic causal modeling(DCM) (Friston et al., 2003) are among the more commonly usedapproaches thus far. There is a growingdebate about the relativemeritsand demerits of these approaches for estimating causal interactionsusing fMRI data (Friston, 2009a,b; Roebroeck et al., 2009). The main

stimating causal interactions in fMRI, NeuroImage

http://dx.doi.org/10.1016/j.neuroimage.2010.09.052

mailto:[email protected]

mailto:[email protected]


http://www.sciencedirect.com/science/journal/10538119


83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122Q2123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165166

167168

169170171

172

173

Fig. 1. Probabilistic graphical model for multivariate dynamical system (MDS). Allconditional interdependencies in MDS can be inferred from this model. The statevariables s(t) are modeled as a linear dynamical system. The non-diagonal elements ofmatrices A and C represent the intrinsic and modulatory connection strengthsrespectively. The diagonal elements of D represent the weight of external stimulus ati-th node.Q(m,m) is the state noise variance atm-th node. Each element of A, C andDhasprecision of α. Each element of α follows Gamma distribution with parameters co and doThe prior for 1/Q(m,m) follow Gamma distribution with parameters ao and bo. y(t)represents the observed BOLD signal, the elements of B represent weightscorresponding to the basis functions for HRFs and R(m,m) is the observation noisevariance atm-th node. Each element of B has precision ofα. Each element of α follow theGamma distribution with parameters co and do The prior for 1/R(m,m) follows theGamma distribution with parameters ao and bo. The random variables are indicated asopen circles and deterministic quantities as rectangles.

2 S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx

limitations of GCA highlighted by this debate are that: (1) GCAestimates causal interactions in the observed Blood-Oxygenation-Level-Dependent (BOLD) signals, rather than in the underlyingneuronal responses; (2) GCA may not be able to accurately recovercausal interactions because of regional variations in hemodynamicresponse; and (3) GCA does not take into account the experimentallyinduced modulatory effects while estimating causal interactions(Friston, 2009a,b). The main limitations of DCM highlighted in thisdebate are that: (1) DCM is a confirmatory method wherein severalcausal models are tested and the model with the highest evidence ischosen. This is problematic if the number of regions under investigationis large since a large number of models need to be tested increasesexponentially with the increase in the number of regions; (2)conventional DCMuses a deterministic model to describe the dynamicsof the latent neuronal signals whichmay not be adequate to capture thedynamics of the underlying neuronal processes (a stochastic versionwas recently proposed (Daunizeau et al., 2009)); and (3) theassumptions used byDCM for deconvolution of hemodynamic responsehave not yet been adequately verified (Roebroeck et al., 2009). Here, wedevelop a newmethod that incorporates the relativemerits of both GCAand DCMwhile attempting to overcome their limitations.

We propose a novel multivariate dynamical systems (MDS)approach (Bishop, 2006) for modeling causal interactions in fMRIdata. MDS is based on a state-space approach which can be used toovercome many of the aforementioned problems associated withestimating causal interactions in fMRI data. State-space models havebeen successfully used in engineering applications of control systemsand machine learning (Bishop, 2006) but their use in neurosciencehas been limited. Notable examples of state-space models includeHidden Markov models (HMM) which are widely used in speechrecognition applications (Rabiner, 1989) and Kalman filters for objecttracking (Koller and Friedman, 2009). Critically, state-space modelscan be represented as probabilistic graphical models (Koller andFriedman, 2009), which as we show below (Fig. 1), greatly facilitaterepresentation and inference for causal modeling of fMRI data.

Critically, MDS estimates causal interactions in the underlyinglatent signals, rather than the observed BOLD-fMRI signals. In order toestimate causal interactions from the observed fMRI data, it isimportant to take into account variations in hemodynamic responsefunction (HRF) across different brain regions (David et al., 2008). MDSis a state-spacemodel in which a “state equation” is used tomodel theunobserved states of the system and an “observation equation” is usedtomodel the observed data as a function of latent state signals (Fig. 1).The state equation is a vector autoregressive model incorporatingboth intrinsic and modulatory causal interactions. Intrinsic interac-tions reflect causal influences independent of external stimuli andtask conditions, while modulatory interactions reflect contextdependent influences. The observation models produce BOLD-fMRIsignals as a linear convolution of latent signals and basis functionsspanning the space of variations in HRF.

The latent signals and unknown parameters that characterize causalinteractions between brain regions are estimated using two differentapproaches. In the first approach, we use expectation maximization(EM) to obtainmaximum likelihood estimates (MLE) of the parametersand test the statistical significance of the estimated causal relationshipsbetweenbrain regions using a nonparametric approach.We refer to thisapproach as MDS-MLE. In the second approach, we use a VariationalBayes (VB) approach to compute the posterior distribution of latentvariables and parameters which cannot be computed analytically usinga fully Bayesian approach. We refer to this approach as MDS-VB. Byrepresenting MDS as probabilistic graphical network (Fig. 1), we showthat MDS-VB provides an elegant analytical solution for computing theposterior distributionsandderiving causal connectivity estimateswhichare sparse and more readily interpretable.

We first describe our MDS model and discuss MLE and VBapproaches for estimating intrinsic and modulatory causal interac-

Please cite this article as: Ryali, S., et al., Multivariate dynamical syste(2010), doi:10.1016/j.neuroimage.2010.09.052

tions between multiple brain regions. We test the performance ofMDS using computer-simulated data sets as a function of networksize, fMRI time points and signal to noise ratio (SNR). We evaluateperformance of ourMDSmodels with extensive computer simulationsand examine several metrics, including sensitivity, false positive rateand accuracy, in terms of correctly identifying both intrinsic andmodulatory causal interactions. Finally, we contrast our results withthose obtained with GCA.

Methods

Notation: In the following sections, we representmatrices by uppercase letters and scalars and vectors by lower-case letters. Randommatrices are represented by bold face letters whereas random vectorsand scalars are represented by bold face lower-case letters.

MDS Model

Consider the following state-space model to represent the multi-variate fMRI time series

s tð Þ = As t−1ð Þ + ∑Jj=1vj tð ÞCjs t−1ð Þ + Du tð Þ + w tð Þ ð1Þ

xm tð Þ = sm tð Þ sm t−1ð Þ…:sm t−L + 1ð Þ½ �′ ð2Þ

ym tð Þ = bmΦxm tð Þ + em tð Þ ð3Þ

In Eq. (1), s(t) is a M×1 vector of latent signals at time t of Mregions, A is an M×M connection matrix wherein A(m,n) denotes thestrength of intrinsic causal connection (which is independent of

ms models for estimating causal interactions in fMRI, NeuroImage


174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276277278279280281282283284

285286287

288289

290291292

293294295296

3S. Ryali et al. / NeuroImage xxx (2010) xxx–xxx

external stimuli or task condition) from n-th region tom-th region. Cj isan M×M connection matrix ensued by modulatory input vj(t), J is thenumber of modulatory inputs. The non-diagonal elements of Cjrepresent the coupling of brain regions in the presence of modulatoryinputvj(t). Therefore, latent signals s(t) inM regions at time t is a bilinearfunction of modulatory inputs vj(t)and its previous state s(t-1). D is anM×M diagonal matrix wherein D(i, i) denotes external stimuli strengthto i-th region. u(t) is an M×1 binary vector whose elements representthe external stimuli tom-th region under investigation.w(t) is anM×1state noise vector whose distribution is assumed to be Gaussiandistributed with covariance matrix Q(w(t)∼N(0,Q)). Additionally,state noise vector at time instances 1,2,….,T (w(1),w(2)…w(T)) areassumed to be identical and independently distributed (iid). Eq. (1)represents the time evolution of latent signals inM brain regions. Morespecifically, the latent signals at time t, s(t), is expressed as a linearcombination of latent signals at time t-1, external stimulus at time t(u(t)), bilinear combination of modulatory inputs vj(t), j=1,2..J and itsprevious state, and state noise w(t). The latent dynamics modeled inEq. (1) gives rise to observed fMRI time series represented by Eqs. (2)and (3).

We model the fMRI time series in regionm as a linear convolutionof HRF and latent signal sm(t) in that region. To represent this linearconvolution model as an inner product of two vectors, the past Lvalues of sm(t) are stored as a vector. xm(t) in Eq. (2) represents anL×1 vector with L past values of latent signal at m-th region.

In Eq. (3), ym(t) is the observed BOLD signal at t ofm-th region.Φ isa p×Lmatrix whose rows contain bases for HRF. bm is a 1×pcoefficient vector representing the weights for each basis function inexplaining the observed BOLD signal ym(t). Therefore, the HRF inm-thregion is represented by the product bmΦ.The BOLD response in thisregion is obtained by convolving HRF (bmΦ) with the L past values ofthe region's latent signal (xm(t)) and is represented mathematicallyby the vector inner product bmΦ xm(t). Uncorrelated observationnoise em(t) with zeromean and variance σm

2 is then added to generatethe observed signal ym(t). em(t) is also assumed to be uncorrelatedwith w(τ), at all t and τ. Eq. (3) represents the linear convolutionbetween the embedded latent signal xm(t) and the basis vectors forHRF. Here, we use the canonical HRF and its time derivative as bases,as is common in most fMRI studies (Penny et al., 2005; Smith et al.,2009).

Eqs. (1)–(3) together represent a state-space model for estimatingthe causal interactions in latent signals based on observed multivar-iate fMRI time series. This model can be seen both as a multivariateextension of univariate time series models (Makni et al., 2008; Pennyet al., 2005), and also as an extension of GCA wherein vectorautoregressive model for latent, rather than BOLD-fMRI, signals areused to model the causal interactions among brain regions. Further-more, our MDSmodel also takes into account variations in HRF as wellas the influences of modulatory and external stimuli in estimatingcausal interactions between the brain regions.

Estimating causal interactions between M regions specified in themodel is equivalent to estimating the unknown parameters A and Cj,j=1,2..J. In order to estimate A and Cjs, the other unknown parametersD, Q , {bm}m=1

M and {σm2}m=1

M and the latent signal {s(t)} t=1T based on

the observations {ym(t)}m=1M , t=1,2..T, where T is the total number of

time samples, needs to be estimated. We use the followingMLE and VBmethods for estimating the parameters of the MDS model.

Maximum Likelihood Estimation (MLE)

EstimationMaximum likelihood estimates of MDS model parameters and

latent signals are obtained by maximizing the log-likelihood of theobserved fMRI data. We use EM algorithm to estimate the unknownparameters and latent variables of the model. EM algorithm is aniterativemethod consisting of two steps viz., E-step andM-step. In the


E-step, the posterior distribution of latent variables is computed giventhe current estimates of parameters. In the M-step, given the currentposterior distribution of latent variables, the parameters of the modelare estimated by maximizing the conditional expectation of log ofcomplete likelihood given the data. The E and M steps are repeateduntil convergence. The log-likelihood of the data is guaranteed toincrease or remain the same for every iteration of E and M steps. Also,the application of EM algorithm asymptotically gives maximumlikelihood estimates of the parameters. In the E step, the posteriordistributions are obtained by using Kalman filtering and smoothingalgorithms (Bishop, 2006). The detailed equations for E and M stepsare given in Appendix A. We refer this solution to MDS using MLE asMDS-MLE.

InferenceThe statistical significance of intrinsic (A(m,n)) and modulatory

(Cj(m,n), j=1,2.., J) causal connections estimated using the EMapproach was tested using a Bootstrap method. In this approach,the distribution of connection strengths, under the null hypothesisthat there are no connections between the regions, was generated byestimating A and C from 100 surrogate data sets constructed from theobserved data. A surrogate data set was obtained by applying a Fouriertransform to observed signal at them-th region and then randomizingits phase response by adding a random phase shift at every frequency.The phase shifts were obtained by randomly sampling in the interval[0, 2π]. Inverse Fourier transform was then applied to generate oneinstance of surrogate data (Prichard and Theiler, 1994). Randomiza-tion of the phase response destroys the causal interactions betweenthe brain regions while preserving their power spectra. EM algorithmwas then run on this surrogate data to obtain A and C under the nullhypothesis. This procedure was repeated on 100 surrogate data setsand the empirical distributions under null hypothesis were obtainedfor elements of A and C. The statistical significance of each connectionwas then estimated using these distributions at p value of 0.01. withBonferroni correction to account for multiple comparisons.

Variational Bayes (VB)

Estimation of posterior distributionsIn this approach, we use a VB framework to obtain the posterior

distributions of the unknown parameters and latent variables. LetΘ = A;C1; ::CJ;D;Q ;R;B

� �represent the unknown parameters and

S = s tð Þ; t = 1;2;…Tf g be the latent variables of the model. Given theobservations Y={y(t), t=1,2,…T} and the probabilistic model, theBayesian approach aims to find the joint posterior p S;ΘjYð Þ. However,obtaining this posterior distribution using a fully Bayesian approach isanalytically not possible for most models including MDS. In the VBapproach, we make analytical approximation to p S;ΘjYð Þ. Let q S;ΘjYð Þbe any arbitrary probability distribution then the log of the marginaldistribution of observations Y can be written as (Bishop, 2006)

log P Yð Þ = L qð Þ + KL q j jpð Þ ð4Þ

where

L qð Þ = ∫dSdΘq S;ΘjYð Þ log p Y; S;Θð Þq S;Θ jYð Þ ð5Þ

KLðq jjpÞ = −∫dSdΘq S;ΘjYð Þ log p S;Θ jYð Þq S;Θ jYð Þ ð6Þ

KL(q||p) is the Kullback–Leibler divergence between q S;ΘjYð Þ andp S;ΘjYð Þ. KL(q||p)≥0, with equality, if and only if, q S;ΘjYð Þ =p S;ΘjYð Þ. Therefore, L(q) serves as a lower bound on the log of theevidence (log P(Y)). The maximum of this lower bound occurs whenKL divergence is zero for which the optimal choice of q S;ΘjYð Þ is



297298299300

301302303

304

305306

307308

309310311

312

313

314315316317318319320321322323324325326327328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351352353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379


p S;ΘjYð Þ: Since p S;ΘjYð Þ is not tractable, certain assumptions on theform of q S;ΘjYð Þ are made and then the optimal distribution is foundbymaximizing the lower bound L(q). In this work, we assume that theposterior distribution q S;ΘjYð Þ factorizes over S and Θ, i.e.,

q S;ΘjYð Þ = qS SjYð ÞqΘ ΘjYð Þ ð7Þ

We note that no further assumptions are made on the functionalform of these distributions qS SjYð Þ and qΘ(Θ|Y). These quantities areobtained by taking functional derivatives of L(q) with respect toqS SjYð Þ and qΘ(Θ|Y). It can be shown that

log qS SjYð Þ∝EΘ logp Y; S;Θð Þð Þ ð8Þ

log qΘ ΘjYð Þ∝ES logp Y; S;Θð Þð Þ ð9Þ

Eqs. (8) and (9) are respectively VB-E and VB-M steps. Expecta-tions are computed with respect to qΘ(Θ|Y) in Eq. (8) and withrespect to qS SjYð Þ in Eq. (9). In the VB-E step, the distribution of latentsignal s(t), for each t, is updated given the current distribution of theparameters Θ. For reasons described below, s(t) has a Gaussiandistribution and in this step updating the distribution amounts toupdating the mean and variance of the Gaussian distribution.Therefore, in the VB-E step, estimating means of s(t) at every t isequivalent to estimating the latent signals. In the VB-M step, thedistributions for model parameters Θ are updated given the updatedistributions for latent signal s(t). These VB-E and VB-M steps arerepeated until convergence. Note that we do not make anyassumptions about the factorization of Θ and S. Any furtherconditional independencies in these sets are derived from theprobabilistic graphical model of MDS shown in Fig. 1. The details ofthe derivation of the posterior probabilities using the graphical modelare given in Appendix-B. Fig. 2 shows a flow chart of various stepsinvolved in both MDS-VB and MDS-MLE methods.

Choice of priors and inferenceThe Bayesian approach allows the specification of both informative

and non-informative priors on the model parameters. The specifica-tion of these priors helps in regularizing the solution and avoids overfitting in the case where number of parameters to be estimated islarge compared to the number of observations. This is generally thecase where the number of brain regions to be modeled is large. Sincewe do not have a priori information on these parameters, we specifynon-informative conjugate priors on these parameters. Here, webriefly explain the notion of non-informative conjugate priors(Bishop, 2006). Let z be a Gaussian random variable with mean μ

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

Fig. 2. Flow chart showing major steps in implementation of MDS. Weinerdeconvolution is used to get an initial estimate of latent signals and a least squareestimation procedure is used to find an initial estimation of model parameters. Theestimates of latent signals and model parameters are refined in E and M steps,respectively. These steps are repeated until convergence. The significance of modelparameters is then assessed in the inference step.


and variance σ2. If σ2→∞ or 1/σ2→0 (precision), the distributionbecomes flat and the random variable z can take any value between−∞ and ∞with equal probability. Here, we refer to such distributionsas non-informative. Let x and y be two random variables withprobabilities p(x) and p(y) respectively. p(y) is said to be a conjugateprior for p(x) if the functional form of the posterior p(x|y) is same asthat of p(y). Specifying conjugate priors leads to elegant analyticalsolutions, and it also allows us to specify priors in such a way that weget sparse and interpretable solutions. For example, one can specifythe Gaussian priors, on the elements of connection matrices A and C. Ifthe prior on each element of A (or Cj) is

Ai;j eN 0;1λi;j

!ð10Þ

where, λi, j is the prior precision for Ai, j. Such a specification of priorshelps in automatic relevance determination (ARD) of the connectionsAi, j between the regions (Tipping, 2001). During the learning processof A and λ's, a significant proportion of λi, j 's go towards infinity andthe corresponding connections Ai, j 's have posterior distributionswhose mean values shrink towards its prior mean which is zero. Theelements of the matrix Awhich do not have significant values becomevery small and only the elements which are significant survive.Therefore, adopting this procedure helps in automatically identifyingthe relevant entries of the matrix A and hence the name “Automaticrelevance determination”. This is very important because unlike theMLE approach, inference on connection weights (A and Cj 's) is nowstraightforward. The details of prior specification for various para-meters are given in Appendix-B. We test the significance ofparameters by thresholding the corresponding posterior probabilitiesat a p-value of 0.01 with Bonferroni correction to account for multiplecomparisons.

Simulated data sets

Data sets with modulatory effects and external stimuliWe assess the performance of MDS using a number of computer-

simulated data sets generated at various SNRs (10, 5 and 0 dB), fordifferent number of brain regions or nodes (M=2, 3 and 5) and fordifferent number of time samples (T=200, 300 and 500).

Fig. 3 shows the intrinsic and modulatory connectivity of threenetworks with 2, 3 and 5 nodes. For example, in the two nodesnetwork (Fig. 3A), node 1 receives an external input and there is anintrinsic causal connection from node 1 to node 2 with a weight of−0.3 (A(2,1)=−0.3). A modulatory input induces a connection fromnode 1 to node 2 with weight of 0.5 (C(2,1)=0.5) and whose sign isopposite to that of intrinsic connection. Similarly in the five-nodestructure (Fig. 3C), node 1 receives the external input and has causalinfluences on nodes 2, 3 and 4 (matrix elements A). Nodes 4 and 5have bidirectional influences. Modulatory inputs induce causalinfluences from node 1 to 2 and from node 3 to 2 (matrix C). Notethat all three networks have intrinsic and modulatory connectionsfrom node 1 to 2 with weights −0.3 and 0.5 respectively. Wesimulated networks with these weights to explicitly test whetherMDS could recover these interactions which could be missed by GCAbecause of opposing signs of the weights of intrinsic and modulatoryconnections. We set the autocorrelations of the time series (diagonalelements of matrix A) to 0.7 (Ge et al., 2009; Roebroeck et al., 2005).Our simulations also modeled variations in HRFs across regions. Fig. 4shows the simulated HRFs at each of the network nodes in thestructure. The HRFs were constructed in such a way that the directionof hemodynamic delays is confounded with the direction of latentsignal delay, making the task of recovering network parameters morechallenging. For example in the 5-node network, node 1 drives nodes2, 3 and 4 at the level of latent signals. But the HRF at node 1 peaks



400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422423424

425

426

427

428

429

430

431

432

433

434

435

436

437

Fig. 3. Simulated models with intrinsic and experimentally induced modulatoryconnections for (A) 2-node, (B) 3-node and (C) 5-node networks. Intrinsic connectionsare shown in solid lines and modulatory connections are shown in broken lines andhighlighted with the connecting back circles. A(i,j) and C(i,j) are the weights of intrinsicand modulatory connections, respectively, between nodes i and j. D(i,i) is the strengthof external stimulus to the i-th node.


later than that in the nodes 2, 3 and 4. These HRFs were simulatedusing a linear combination of canonical HRF and its temporalderivative. Analogous to most fMRI data acquisition paradigms, weassume that the sampling interval, also referred to as repetition time

Fig. 4. Variable regional hemodynamic response function used in the simulatio


(TR), is 2 s. This corresponds to an embedding dimension of L=16,which is also the length of the HRF. Note that the duration of thecanonical HRF is approximately about 32 s.

Fig. 5 shows the experimental and modulatory inputs applied tothe nodes shown in Fig. 3. The external input was simulated to reflectan event-related fMRI design with stimuli occurring at randominstances with the constraint that the time difference between twoconsecutive events is at least 2 TR (Fig. 5A). This input can also be aslow event or block design. In the MDS framework, there is norestriction on the nature of the experiment design. The modulatoryinput is assumed to be a boxcar function (Fig. 5B). The modulatoryinputs indicate the time periods wherein the network configurationcould change because of context specific influences such as changes inattention, alertness and explicit experimental manipulations.

The simulated data sets were generated using the model describedin the Eqs. (1)–(3). The latent noise covariance was fixed at Q=0.1IM,where IM is identity matrix of size M. The observed noise variance atm-th region for a given SNR was computed as

σ2m = Var ymð Þ10−0:1SNR ð11Þ

We assume that the canonical HRF and its temporal derivativespan the space of HRFs. Therefore, they constitute the rows ofΦwhichwould be a 2×16 matrix. The coefficients of the matrices A and C foreach network structure are shown in Fig. 3.

We generated 25 data sets for each SNR, network structure andtime samples. The performance of the method was assessed using theperformance metrics described in the next section.

Data sets without modulatory effects and external stimuliTo examine the performance of the methods when there are no

modulatory or external stimulus effects, we simulated 25 data setsfor a 3-node network shown in Fig. 3B at 5 dB SNR. We set theweights to the same values as in the previous case except that thereare no causal interactions from modulatory inputs. The weightscorresponding to external stimuli were all set to zero. The diagonal

ns for each node in (A) 2, (B) 3 and (C) 5 node networks shown in Fig. 3.



438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473474

475476

477478479480481482

483

484

485

486

487

488

489

490

491

492

493

494

495

496497

Fig. 5. Onset and duration of event related experimental stimuli (A) and modulatory inputs (B) used in the simulations.


elements (autocorrelations) in matrix A were set to 0.8, 0.7 and 0.6,respectively. These data sets were created to provide a moreappropriate, albeit less general, comparison of MDS with GCA.

Effects of fMRI down-samplingInteractions between brain regions occur at finer time scales

compared to the sampling intervals of fMRI signals. fMRI signals aretypically sampled at TR=1 or 2 s while neuronal processes occur atmillisecond resolution. To investigate the effects of fMRI downsampling on MDS performance, we adopted the approach describedby Deshpande et al. (2009).We generated data sets with 1 mssampling interval at 0 dB network shown in Fig. 3A. We obtainedneuronal signals in node 1 and node 2 with a delay of dn millisecondsbetween them. In this case, node 1 drives node 2 under both intrinsicand modulatory conditions with weights shown in Fig. 3A. Theautocorrelations in node1 and 2 were set to 0.8 and 0.7 respectively.We then convolved neuronal signal at node 1 with a canonical HRFgenerated again at 1 KHz sampling rate and then re-sampled tosampling interval of TR=2 s to obtain fMRI signal. In node 2, weconvolved the “neuronal” signal with the HRF which was delayed bydh milliseconds with respect to the HRF in node 1, and again re-sampled to TR=2 s to obtain fMRI signal. We obtained simulated datasets at various neuronal delays dn={0, 200, 400, 600, 800, 1000} andHRF delays dh={0, 500, 2500} milliseconds (Deshpande et al., 2009).We also examined two cases for HRF delays: (1) HRF delay is in thesame direction of neuronal delay and (2) HRF delay is in the oppositedirection of neuronal delay. The second case represents the scenariowhere HRF confounds the causal interactions at neuronal level. Wegenerated 25 simulated data sets for each combination of dn and dh.Supplementary Table S1 summarizes the characteristics of each dataset used for evaluating the performance of MDS.

Performance metrics

The performance of MDS in discovering the intrinsic and modulatorycausal interactions in simulated data sets was assessed using various


performancemetrics such as sensitivity, false positive rate and accuracy incorrectly identifying causal intrinsic andmodulatory interactions, where :

sensitivity ¼ TPTPþ FN

ð12Þ

false positive rate ¼ FPTNþ FP

ð13Þ

accuracy =TP + TN

TP + FP + FN + TNð14Þ

where, TP is the number of true positives, TN is the number of truenegatives, FN is the number of false negatives and FP is the number offalse positives. These performance metrics are computed for each ofthe 25 data sets and then are averaged to obtain the overallperformance.

Results

Applying MDS – An example

We first illustrate the performance of MDS-MLE and MDS-VB bycomputing the estimated intrinsic and modulatory connections andthe deconvolved (or estimated) latent signals of a five node networksimulated at 5 dB noise shown in Fig. 3C. The MDS approach, usingeither MLE or VB, simultaneously estimates both latent signals andunknown parameters in the model using E and M steps, respectively.The left and right panels in Fig. 6, respectively, show the actual andestimated latent signals and the actual and estimated BOLD signal atthe five nodes in the network using MDS-MLE and MDS-VB. Theestimated BOLD signal ym at m-th node using these methods wascomputed as follows

ym = bm′Φxm tð Þ ð15Þ



498

499500501502503504505506507508509510511512513514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

Table 1 t1:1

Mean Q14square error (MSE) between actual and estimated neuronal and actual andestimated BOLD signals using MLE-MBDS and VB-MBDS at five nodes of the network.

t1:2t1:3Nodes Neuronal signals BOLD signals

t1:4MLE VB MLE VB

t1:51 0.024 0.023 0.027 0.027t1:62 0.024 0.024 0.015 0.014t1:73 0.019 0.019 0.025 0.024t1:84 0.017 0.017 0.021 0.02t1:95 0.018 0.017 0.02 0.02


Where, bm′ are the estimated coefficients (using MLE or VB)corresponding to the basis functions spanning the subspace of HRFsand xm′ (using MLE or VB) is the estimated latent signal at the m-thnode. As shown in this figure, both MDS-MLE and MDS-VB were ableto recover the latent and BOLD signals at this SNR. Table 1 shows themean square error (MSE) between the estimated and latent signalsand estimated and actual BOLD-fMRI responses in each node usingthese twomethods. TheMSE in estimating these signals is very low byboth methods. Fig. 7A and B, respectively, shows the estimatedintrinsic and modulatory causal interactions by MDS-MLE and MDS-VB in the simulated five node network. MDS-VB correctly identifiedboth intrinsic (solid lines) and modulatory connections (dotted lines)in this network as shown in Fig. 7B. MDS-MLE also correctly recoveredboth intrinsic and modulatory networks but it introduced anadditional false modulatory connection from node 3 to node 1 asshown in Fig. 7A.

We next compare the performance of MDS with that of GCA usingthe same simulated data. This analysis was performed using themultivariate GCA toolbox developed by Seth (Seth, 2010). We appliedGCA on the same data set to verify whether it can recover the causalconnections (either intrinsic or modulatory). As shown in Fig. 7C, GCAlikely missed both the intrinsic and modulatory interactions fromnode 1 to 2 but it was able to recover modulatory interactions fromnode 3 to 4 in addition to other connections. However, unlike MDS,GCA cannot distinguish between intrinsic and modulatory interac-tions. GCA missed the connection from node 1 to 2 because these

Fig. 6. Left panel: actual and estimated latent signals at each of the nodes of the 5-node netwMDS using both MLE and VB approaches, accurately recovered latent signals and predicted


nodes have both intrinsic and modulatory interactions with opposingactions. Since GCA does not model these interactions separately, thenet connection strength between these nodes is not significant. On theother hand, MDS models these interactions explicitly and therefore isable to recover both types of connections. This example demonstratesthat GCA cannot recover all the connections under these conditionswhile both MDS methods could recover all the connections and at thesame time differentiate between the different types of interactions.

Performance of MDS on simulated data with modulatory effects andexternal stimuli

We evaluated the performance of MDS-MLE and MDS-VB onsimulated data sets by computing sensitivity, false positive rate and

ork shown in Fig. 3C. Right panel: estimated and actual BOLD-fMRI signals at each node.the fMRI signals based on the estimated model parameters and latent signals.



536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

Fig. 7. (A) Intrinsic and modulatory connections estimated by MDS using Maximumlikelihood estimates (MDS-MLE) and (B) estimates Variational Bayes estimates (MDS-VB). (C) Causal interactions estimated by Granger Causal Analysis (GCA). MDS-VBcorrectly identified both intrinsic and modulatory connections. MDS-MLE correctlyestimated all the intrinsic and modulatory connections in the five node network butalso introduced a false modulatory connection from node 3 to node 1. GCA missed bothintrinsic andmodulatory connections from node 1 to 2 for reasons described in the text.


accuracy in finding intrinsic and modulatory causal interactions as afunction of SNR, network size and the number of time samples.Figs. 8–10 respectively show the performance of MDS-MLE and MDS-VB for time samples T=500, 300, 200. For each T and network size,the performance of MDS-MLE and MDS-VB is evaluated for SNR=0, 5and 10 dB. In each of these figures panels A, B and C show theperformance of MDS-MLE and MDS-VB with respect to sensitivity,false positive rate and accuracy in identifying the intrinsic andmodulatory interactions for 2, 3 and 5 node networks, respectively.The performance of these methods improvedwith the increase in SNRand time samples (T).

Between the two MDS methods, MDS-VB showed superiorperformance compared to MDS-MLE across all the SNRs, time samplesand network size. MDS-VB showed significantly greater performanceat low SNR and for shorter time series. For example, for SNR=0 dB,T=200 time points and number of nodes=5 (Fig. 10C), thesensitivity of MDS-VB in recovering intrinsic and modulatoryinteractions was about 0.75 and 0.6, respectively, while MDS-MLEhas sensitivity of only about 0.3 and 0.5, respectively. The accuracy ofMDS-VB is also high (N0.8) under all conditions (panel (C) in Figs. 8–10) because the sensitivities are high and false positive rates of thismethod are very low (panels A and B in Figs. 8–10). More generally, incases with high noise, lower sample length, and larger network sizeMDS-VB consistently outperforms MDS-MLE.

Comparison of MDS and GCA on simulated data with modulatory effectsand external stimuli

Finally, we compared the performance of GCA with MDS methodson the same data sets and performance metrics. Fig. 8–10, respec-tively, shows the comparative performance of GCA with MDS-VB andMDS-MLE for T=500, 300 and 200 at various SNRs and network sizes.The results suggest that performance of GCA is poor compared to bothMDS methods with respect to sensitivity and accuracy in indentifyingcausal interactions between brain nodes. Since GCA cannot distin-guish between intrinsic and modulatory interactions, we computedthe performance metrics by considering both the connection types.The performance of GCA declined with the decrease in SNR fornetworks of size 3 and 5 (Figs. 8–10). The performance of GCA isworse even for a 2-node network as shown in Figs. 8A, 9A and 10A.


The sensitivity of GCA for this network is less than 10% becauseintrinsic and modulatory interactions have weights with oppositesigns. Since GCA does not model these interactions explicitly, it doesnot detect the interactions between the two nodes. On the other hand,both MDS methods showed better sensitivity and accuracy inidentifying both types of interactions at all SNRs for this network.

Comparison of MDS and GCA on simulated data in the absence ofmodulatory effects and external stimuli

Table 2 shows the relative performance of MDS-VB, MDS-MLE andGCA on 25 data sets simulated for a 3-node network at 5 dB SNRwithout any modulatory effects and external stimuli. The perfor-mance of GCA improved and recovered the causal network withsensitivity of 0.9, FPR of 0 and accuracy of 0.98. In this case, theperformance of GCA is comparable to MDS, suggesting that in theabsence of modulatory effects and external stimuli GCA can performas well as MDS even in the presence of HRF variations.

Effects of fMRI down-sampling on MDS performance

We examined the performance of MDS-VB on simulated data inwhich latent signals were generated at various delays using asampling interval of 1 ms and convolved with various delays in theHRF. Causal interactions were then estimated based on the observedtime series obtained at sampling interval of 2 s. We examined theperformance of MDS-VB under four different cases:

No latent signal delay but HRF is delayed by 500 and 2500 msIn this case, there are no causal interactions between the two

nodes with respect to the latent signals but the observed fMRI timeseries are delayedwith respect to each other because of delays in theirrespective HRFs. In this case, MDS-VB was accurate in that it did notrecover any causal interactions (both intrinsic and modulatory)despite variations in HRF.

No HRF delaysIn this case, HRFs in both nodes are identical. As shown in Fig. S1

(A), the sensitivity of MDS-VB in recovering both intrinsic andmodulatory interactions is above 0.9 (left panel) with FPRs below 0.1(middle panel) and therefore with accuracies of above 0.9 (rightpanel) at all the latent signal delays. The performance of MDS-VBimproved with the increase in latent signal delays.

Latent and HRF delays both in the same directionIn this scenario, HRF delays do not confound the causal interac-

tions between the nodes at the latent signal level. For HRF delay of 500and 2500 ms, the performance metrics shown in Figs. S1 (B) and (C)suggest that MDS-VB is able to recover both intrinsic and modulatorycausal interactions reliably. For the HRF delay of 2500 ms, there is asmall drop in sensitivity for latent signal delays of 800 and 1000 ms.

HRF delays oppose the delays in latent signalsThis is the most difficult situation for any method because HRF

delays confound the causal interactions at latent signal level. Fig. S2(B) shows the performance of MDS-VB when HRF in node 2 peaks500 ms before the HRF in node 1 while node1 drives node 2 at latentsignal level. The performance of MDS-VB improved with the increasein latent signal delay. Fig. S2(C) shows the performance of MDS-VB forHRF delay of 2500 ms. Although the sensitivities for latent signaldelays from 200 to 600 ms are higher (left panel) but areaccompanied by greater false positives (middle panel) and thereforehave poor accuracies (right panel). The performance of MDS-VBimproved for latent signal delays of 800 and 1000 ms in recoveringcausal interactions.



631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

Fig. 8. (A) Sensitivity, false positive rate (FPR) and Accuracy in identifying causal intrinsic and modulatory connections in the 2-node network (shown in Fig. 3A) at SNRs of 0, 5 and10 dB using MDS-MLE, MDS-VB and GCA. (B) Similar results for 3-node (shown in Fig. 3B) and (C) 5-node (shown in Fig. 3C) networks. The performance of MDS is superior to GCAfor all 3 networks and SNRs. Among MDS methods, the performance of MDS-VB is superior to MDS-MLE. Sample size T=500 time points.


Discussion

We have developed a novel dynamical systems method to modelintrinsic and modulatory interactions in fMRI data. MDS uses a vectorautoregressive state-space model incorporating both intrinsic andmodulatory causal interactions. Intrinsic interactions reflect causalinfluences independent of external stimuli and task conditions, whilemodulatory interactions reflect context dependent influences. Ourproposed MDS method overcomes key limitations of commonly usedmethods for estimating the causal relations using fMRI data.

Critically, causal interactions in MDS are modeled at the level oflatent signals, rather than at the level of the observed BOLD-fMRIsignals. Our simulations clearly demonstrate that this has the addedeffect of eliminating the confounding effects of regional variability inHRF. The parameters and latent variables of the state-space modelwere estimated using two different methods. In the MDS-MLEmethod, the statistical significance of the parameters of the stateequation, which represent the causal interactions between multiplebrain nodes, was tested using a Bootstrap method. In the MDS-VBmethod, we used non-informative priors to facilitate automaticrelevance detection. We first discuss findings from our simulations,and show that MDS-VB provides the robust and accurate solutions


even at low SNRs and smaller number of observed samples (timepoints). We then contrast the performance of MDS with the widelyused GCA method. In this context, we highlight instances where GCAworks reasonably well and where it fails. Finally, we discuss severalimportant conceptual issues concerning the investigation of dynamiccausal interactions in fMRI, contrasting MDS with other recentlydeveloped methods.

Performance of MDS on simulated data sets—contrasting MLE and VBapproaches

In the following sections, we evaluate and discuss the performanceof MDS under various scenarios. Importantly, we demonstrate, for thefirst time, that VB approaches provide better estimates of modelparameters than MLE based approaches. We investigated theperformance of MDS-MLE and MDS-VB on simulated data setsgenerated at SNRs of 0, 5 and 10 dB for network structures of sizes 2,3 and 5 and time samples of 200, 300 and 500. We simulated regionalHRF variations in such a way that the directions of hemodynamicresponse delays were in opposite direction to the delays in the latentsignals (Fig. 4). HRF delays could therefore influence the estimation ofcausal interactions when applied directly on the observed BOLD-fMRI



672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

Fig. 9. (A) Sensitivity, FPR and Accuracy in identifying causal intrinsic and modulatory connections in the 2-node network (shown in Fig. 3A) at SNRs of 0, 5 and 10 dB using MDS-MLE, MDS-VB and GCA. (B) Similar results for 3-node (shown in Fig. 3B) and (C) 5-node (shown in Fig. 3C) networks. The performance of MDS is superior to GCA for all 3 networksand SNRs. Among MDS methods, the performance of MDS-VB is superior to MDS-MLE. In contrast to Fig. 8, the sample size (T) here is 300 time points.


signals. This makes the problem of estimating causal interactionsparticularly challenging and provides novel insights into strengths andweaknesses of multiple approaches used here.

The performance of MDS was found to be robust when testedunder various simulated conditions. Specifically, MDS was able toreliably recover both intrinsic and modulatory causal interactionsfrom the simulated data sets and its performance was found to besuperior to the conventional approaches such as GCA. Among MDSmethods, the performance of MDS-VB was found to be superior toMDS-MLE with respect to performance metrics such as sensitivity,false positive rate and accuracy in identifying intrinsic and modula-tory causal interactions (Figs. 7–10). MDS-VB showed significantlyimproved performance over MDS-MLE, especially at adverse condi-tions such as low SNRs, large network size and for less number ofobserved samples (Fig. 10C).

The superior performance of MDS-VB can be attributed to theregularization imposed by priors in this method. Our priors not onlyregularized the solution but also helped in achieving sparse solutions.By using sparsity promoting priors, the weights corresponding toinsignificant links are driven towards zero and therefore enableautomatic relevance detection (Tipping, 2001). This approach is notonly useful for regularizing solutions when the number of unknown


parameters is high, but also for providing sparse and interpretablesolutions. This feature of VB can be especially important in analyzingnetworks with large number of nodes, an aspect often overlooked inmost analyses of causality in complex networks.

Another advantage of Bayesian analysis lies in computing thestatistical significance of the network connections estimated by MDSmethods. In the MLE approach, we need to resort to a Bootstrapapproach, which can be computationally expensive. MDS-VB, on theother hand, provides posterior probabilities of each model parameter,as opposed to point estimates in MLE, which can be used to computetheir statistical significance. From a computational perspective, MDS-VB is several orders ofmagnitude faster thanMLE-MLE because it doesnot require nonparametric tests for statistical significance testing.Taken together, these findings suggest that MDS-VB is a superior andmore powerful method than MDS-MLE.

Comparison with GCA

We demonstrated the importance of modeling the influence ofboth external and modulatory stimuli for estimating the causalnetworks by applying GCA on a five-node network. On this data set,GCA failed to detect both the modulatory and intrinsic connections



714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

Fig. 10. (A) Sensitivity, FPR and Accuracy in identifying causal intrinsic and modulatory connections in the 2-node network (shown in Fig. 3A) at SNRs of 0, 5 and 10 dB using MDS-MLE, MDS-VB and GCA. (B) Similar results for 3-node (shown in Fig. 3B) and (C) 5-node (shown in Fig. 3C) networks. The performance of MDS is superior to GCA for all 3 networksand SNRs. Among MDS methods, the performance of MDS-VB is superior to MDS-MLE. In contrast to Fig. 8, the sample size (T) here is 200 time points.

t2:1

t2:2t2:3

t2:4

t2:5

t2:6


between nodes 1 and 2 (Fig. 7C). As mentioned earlier, GCA missedthis connection because the network has both intrinsic and modula-tory connections between these nodes but with weights of oppositesigns. Therefore, in GCA the net strength of this connection is verysmall and it did not survive a conservative test of statisticalsignificance. Our MDS methods, on the other hand, uncovered boththese connections. This phenomenon is most obvious in oursimulations of the 2-node network wherein GCA could not find causalinteractions between the nodes (Figs. 8A, 9A and 10A). In this networkalso both intrinsic and modulatory connections have weights withopposite signs. These results demonstrate the importance of explicitlymodeling the influence of external and modulatory stimuli. Overall,the performance of GCA, when applied on the data sets generated at

740

741

742

743

744

745

746

747 Q3748

Table 2Relative performance of MDS and GCA in the absence of modulatory effects.

Method Sensitivity FPR Accuracy

MDS-VB 0.98 0.02 0.98MDS-MLE 0.92 0.03 0.96GCA 0.9 0 0.98


various SNR, networks and for different number of observations, wasfound to be inferior when compared to both MDS methods. This wastrue with respect to both sensitivity and accuracy in indentifyingcausal interactions between multiple brain nodes (Figs. 8–10). Whencompared to MDS, the performance of GCA drops significantly atlower SNRs. These results suggest that MDS is more robust againstobservation noise than GCA. Therefore, our simulations suggest thatMDS-VB outperforms GCA for networks consisting of less than 6nodes. More extensive simulations however are needed to comparethe performance of MDS with GCA, for larger networks.

Conventional GCA methods do not take into account dynamicchanges in modulatory inputs and their effect on context dependentcausal interactions. In order to compare GCA more directly with MDS,we examined causal interactions in the absence of modulatoryinfluences. As expected, in this case, the performance of GCA wascomparable to that of MDS. Together, these findings suggest that GCAcan accurately recover causal interactions in the absence of modula-tory effects. Although newer dynamic GCA methods have beenproposed, they appear to be designedmore for improving estimationsof causal interactions rather than examining context dependentdynamic changes in causal interactions (Havlicek et al., 2010;Hemmelmann et al., 2009; Hesse et al., 2003; Sato et al., 2006).



749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844


Further simulation studies are needed to assess how well dynamicGCA can estimate context specific modulatory effects.

We next contrast our findings using GCA andMDS in the context ofthe equivalence of ARIMA and structural time series models. GCA isbased on autoregressive integrated moving average (ARIMA) modelsproposed by Box and Jenkins whereas MDS is a structural time seriesmodel (Box et al., 1994). In the econometrics literature, it is wellknown that the linear structural time series models have equivalentARIMA model representations (Box et al., 1994). This equivalence hasbeen under-appreciated in the neuroimaging literature, as demon-strated by the recent discussion regarding the relative merits of GCAand DCM (Friston, 2009a,b; Roebroeck et al., 2009). Our detailedsimulations suggest that, under certain conditions, GCA is able torecover much of the causal network structure in spite of the presenceof HRF delay confounds. This is most clearly illustrated using thesimulations shown in Fig. 7C, where we found that GCA could recoverthe network structure except for the intrinsic/modulatory connectionfrom node 1 to 2. Our simulations also suggest that GCA may not beable to uncover causal connections when there is a conflict betweenintrinsic and modulatory connections (Figs. 7C, 8–10) but for othercases it is able to recover the underlying networks. In estimating thecausal interactions, the estimated model order for GCA using theAkaike information criterion (AIC) was more than 3. Note that in oursimulations, the causal interactions at the latent signal level weregenerated using VAR with model order 1 (Eq. (1)). It is plausible thatin GCA, the higher model order is used to compensate for variations inHRF delay and experimental effects such as context specificmodulatory connections (Deshpande et al., 2009). Our simulationssuggest that this is indeed that case and that optimal model orderselection in GCA results in improved estimation of causal interactionsbetween nodes. Nevertheless, structural time series based models likeMDS and DCM can provide better interpretation of network structuresince they can distinguish between intrinsic and context specificmodulatory causal interactions in latent signals.

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

875

876

Effects of down-sampling on MDS performance

In most fMRI studies, data are typically acquired at sampling ratesof about 2 s (or TR=2 s). However, dynamical interactions betweenbrain regions occur at faster time scales of 10–100 ms. To examine theeffects of downsampling fMRI data on the performance of MDS, wefirst simulated interactions between nodes at a sampling rates of1KHz and then re-sampled the time series to 0.5 Hz after convolvingwith region-specific HRFs. MDS-VB was then applied on these datasets to estimate causal interactions between nodes.We also examinedthe influence of HRF delays on the estimation of causal interactionsunder four scenarios (Figs. S1 and S2), similar to the strategy used byDeshpande and colleagues (Deshpande et al., 2009) to study the effectof HRF variability on GCA. In the first scenario, there were no causalinteractions between nodes but HRFs were delayed between thenodes. In this case, MDS-VB performed accurately and did not inferany false causal interactions. This shows that MDS-VB can model andremove the effects HRF variation while estimating causal interactionsat latent signal level. In the second scenario, we introduced causalinteractions between nodes, but without HRF variations. MDS reliablyestimated causal interactions for various delays in latent signals (Fig.S1A). In the third scenario, we introduced causal interactions betweennodes and also varied HRFs such that delays in latent signals and HRFswere in the same direction. In this case, MDS was able to recover bothintrinsic and modulatory causal interactions accurately (Figs. S1B,S1C). In the fourth scenario, when delays in latent signal and HRFopposed each other performance dropped significantly, just as withGCA (Deshpande et al., 2009). Further research is needed to examinewhether causal interactions under this scenario are inherentlyunresolvable by MDS and other techniques such as DCM.


Comparison of MDS with other approaches

As noted above, like GCA, MDS can be used to estimate causalinteractions between a large numbers of brain nodes. Unlike GCA,however, causal interactions are estimated on the underlying latentsignals while simultaneously accounting for regional variations in theHRF. Furthermore, unlike GCA, MDS can differentiate betweenintrinsic and stimulus-induced modulatory interactions. Like DCM,MDS takes into account regional variations inHRFwhile estimating thecausal interactions between brain regions. And like DCM, MDS alsoexplicitly models external and modulatory inputs, allowing us tosimultaneously estimate intrinsic and modulatory causal interactionsbetween brain regions. Unlike DCM, however, MDS does not requirethe investigator to test multiple models and choose one with thehighest model evidence. This overcomes an important limitation ofDCM - as the number of brain regions of interest increases, anexponentially large number of models needs to be examined; as aresult, the computational burden in evaluating these models andidentifying the appropriatemodel can becomeprohibitively high.MDSovercomes such problems and as our study illustrates, MDS incorpo-rates the relative merits of both GCA and DCM while attempting toovercome their limitations.

Both DCM and MDS are state-space modes but DCM uses adeterministic state model, (although a stochastic version has beenrecently developed (Daunizeau et al., 2009)) whereas MDS employs astochastic model. Modeling latent interactions as a stochastic processis important for taking into account intrinsic variations in latent signalsthat are not induced by experimental stimuli. Another importantdifference is that MDS uses empirical basis functions to modelvariations in HRF whereas DCM uses a biophysical Balloon model(Friston et al., 2003). Since the Balloon model is a nonlinear model,several approximations are required to solve it. In contrast, empiricalHRF basis functions allow MDS to use a linear dynamical systemsframework. The relative accuracy of these approaches is currently notknown.

One important advantage of MDS is that it does not assume thatthe fMRI time series is stationary unlike methods based on vectorautoregressive modeling such as GCA. This is important because thedynamics of the latent signals can be altered significantly byexperimental stimuli, leading to highly non-stationary signals. InGCA, the time series is tested for stationarity by either examining theautocorrelation of the time series or by investigating the presence ofunit roots. If the time series is found to be nonstationary then onecommonly used approach to remove non-stationarity is to replace theoriginal time series with a difference of the current and previous timepoints (Seth, 2010). A problem with the use of such a manipulation isit acts as a high-pass filter that can significantly distort the estimatedcausal interactions (Bressler and Seth, 2010).

Two methods based on dynamical systems based approach formodeling fMRI data have been proposed recently (Ge et al., 2009;Smith et al., 2009). Smith and colleagues used a switching lineardynamical systemsmodel wherein modulatory inputs were treated asrandomvariables (Smith et al., 2009). In contrast,MDSmodels themasdeterministic quantities which are known for a given fMRI experi-ment. Modeling modulatory inputs as unknown random variables isuseful for fMRI experiments in which the occurrence of modulatoryinputs is unknown. However, formost fMRI studiesmodulatory inputsare known and modeling them as unknown quantities unnecessarilyincreases the number of parameters to be estimated. Also, theswitching dynamical systems model makes additional assumptionsin computing the probability distributions of the state variables(Murphy, 1998). Further, Smith and colleagues used anMLE approachto estimate latent signals and model parameters. As we show in thisstudy, compared to MLE, a VB approach yields more robust,computationally efficient and accurate model estimation even whenthe SNR and the number of time points are low, as is generally the case



877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957958959

960961

962963

964965966


with fMRI data. Another difference is that Smith and colleaguescombine intrinsic and modulatory matrices i.e., for every j-thmodulatory input, the connection matrix (Aj=A+Cj) is estimatedfrom the data. In MDS, we estimate intrinsic A and modulatorymatrices Cj separately which explicitly dissociates intrinsic andmodulatory effects on causal interactions between brain regions.Another difference lies in testing the statistical significance of theestimated causal connections. In MLE- MDS, we use a non-parametricapproach and in MDS-VB, posterior probabilities of the modelparameters are used for testing the significance of the causalinteractions. Finally, it should be noted that the performance of thismethod under varying SNRs and sample sizes is not known since nosimulations were performed.

Ge et al. (2009) used a different state-space approach to estimatecausal interactions in the presence of external stimuli. They usedvector autoregressive modeling for the state equation to model causalinteractions among brain regions, whereas the observationmodel wasnonlinear. They used an extended Kalman filtering approach toestimate the state variables and model parameters. This method wasapplied on local field potential data, so its usefulness for fMRI data isunclear. However, there are several differences betweenMDS and thisapproach. MDS has been developed explicitly for fMRI data to accountfor HRF variations in brain regions while simultaneously estimatingcausal interactions. In the work of Ge and colleagues, both statevariables and unknown model parameters were treated as statevariables and extended Kalman filtering is used to obtain maximumlikelihood estimates of these variables (Ge et al., 2009). In MDS, wehave taken a different approach - state variables are separated frommodel parameters. This allowed us to use sparsification promotingpriors in the MDS-VB approach. Our results on simulated data suggestthat MDS-VB outperforms MDS-MLE especially at low SNRs andsmaller number of temporal observations. Finally, Ge and colleaguesused Kalman filtering to estimate state variables while in MDS weused Kalman smoothing for estimating the latent signals (Ge et al.,2009). In Kalman smoothing, both past and future data is used toestimate latent signals whereas filtering uses only past values toestimate the current values. In general, smoothing provides betterestimates of latent signal than the filtering approach (Bishop, 2006).Finally, although Ge and colleagues validated their approach on twothree-node toy models, the performance of this method is not knownunder varying conditions such as different SNRs, network sizes andnumber of data samples.

967968

969970

971972

973974

975976

977978

979980

981982

Conclusions

The Bayesian multivariate dynamical system framework we havedeveloped here provides a robust method for estimating andinterpreting causal network interactions in simulated BOLD-fMRIdata. Extensive computer simulations demonstrate that this MDSmethod is more accurate and robust than GCA and among the MDSmethods developed here MDS-VB exhibits superior performance overMDS-MLE. Critically, MDS estimates both intrinsic and experimentallyinduced modulatory interactions in the latent signals, rather than theobserved BOLD-fMRI signals. Unlike DCM, our proposed MDSframework does not require testing multiple models and maytherefore be more useful for analyzing networks with a large numberof nodes and connections. One limitation of this work is that oursimulations were based on data sets created using the same model asthe one used to estimate causal interactions. In this vein, preliminaryanalysis using simulations with delayed latent signals at millisecondtemporal resolution suggests that MDS can accurately recoverintrinsic and modulatory causal interactions in the presence ofconfounding delays in HRF. Future studies will examine theperformance of MDS using more realistic simulations in which causalinfluences are generated independently of any one particular model,


as well as the application of MDS real experimental fMRI data (Menonet al., in preparation).

Acknowledgments

This research was supported by grants from the National Institutesof Health (HD047520, HD059205, HD057610) and the NationalScience Foundation (BCS-0449927).

Appendix A

In this appendix, we provide detailed equations for estimating themodel parameters and latent states of MDS using an expectationmaximization algorithm.

Solving MDS Using Maximum Likelihood Estimation

The state space and observation Eqs. (1)–(3) can be expressed inthe standard state form so that Kalman filtering and smoothingrecursive equations can be used to estimate the probability distribu-tion of latent signals which constitutes the E-step in our EM algorithm(Penny et al., 2005).

Let

x tð Þ = s′ tð Þs′ t−1ð Þ…::s′ t−L + 1ð Þ½ �′ ðA:1Þ

The Eqs. (1)–(3) can be written in terms of x(t) as

x tð Þ = G tð Þx t−1ð Þ + Du tð Þ + w tð Þ ðA:2Þ

y tð Þ = BΦx tð Þ + e tð Þ ðA:3Þ

where,

G tð Þ = A + ∑J

j=1vj tð ÞCj; G tð Þ = A + ∑

J

j=1vj tð ÞCj

A =A 0M� L−1ð ÞΨ 0M

!; Cj =

Cj 0M� L−1ð ÞΨ 0M

! ðA:4Þ

Ψ is the M(L−2)×ML delay matrix that fills the lower rows of Ã.Similarly,

D =D 0M� L−1ð Þ

0M L−2ð Þ�ML 0M

� �ðA:5Þ

U tð Þ = u′ tð Þ 01;M L−1ð Þ� �

′ ðA:6Þ

w tð Þ = w′ tð Þ 01;M L−1ð Þ� �

′ ðA:7Þ

w tð Þ eN 0; Q�

ðA:8Þ

Q =Q 0M� L−1ð Þ

0M L−2ð Þ�ML 0M

� �ðA:9Þ

B =b1 ⋯ 0⋮ ⋱ ⋮0 ⋯ bM

24 35 ðA:10Þ

Φ =⋯

⋮ ⋱ ⋮⋯

24 35 ðA:11Þ

e tð Þ = e1 tð Þ e2 tð Þ−−−em tð Þ½ �′ ðA:12Þ

e tð Þ eN 0;Rð Þ ðA:13Þ



983984

985986987

988989990991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

100810091010

1011

10121013

10141015

10161017

10181019

102010211022

1023

1024

1025

1026

102710281029

1030

10311032

10331034

103510361037

1038Q41039

1040

1041

1042

1043

1044

1045

1046

1047

10481049

1050

1051

1052

105310541055

1056

1057

1058

105910601061

10621063

10641065

10661067

10681069


R = diag σ21;σ

22;…:σ2

M

� ðA:4Þ

Let x(0) be the initial state and uncorrelated with state noise w tð Þand observation noise e(t) and is normally distributed with mean μoand covarianceΣo. The Eqs. (9) and (10) are now in standard linearstate-space system. Therefore, Kalman filtering and smoothingrecursions can be used to carry out the E-step.

E-step

In E-step, the probability of latent signal x(t), t=1,2…..,T giventhe observed data y(t), t=1,2…..,T is computed. Since the state,observation noise and initial state x(0) are assumed to be Gaussian,the latent signals x(t), t=1,2…..,T are also normally distributedwhose means and covariances can be estimated by forward (filtering)and backward (smoothing) recursion steps in Kalman estimationframework. In the filtering step, mean and covariance of x(t) arecomputed given the observations y(τ), τ=1,2….., t. In the smoothingstep, the above means and covariances are updated such that theyrepresent mean and covariance at each time t given all the data y(t),t=1,2…..,T.

Kalman filteringIn the filtering step, the goal is to compute the following posterior

distribution of x(t) given the observations y(τ), τ=1,2….., t and theparameters of the model

p x tð Þjy 1ð Þ; y 2ð Þ…:y tð Þð Þ = N xtt ;Σtt

� ðA:15Þ

The mean and covariance of this distribution can be computedusing the following forward recursive steps.

xt−1t = Gxt−1

t−1 + D U tð Þ ðA:16Þ

Σt−1t = G tð ÞΣt−1

t−1G′ + Q ðA:17Þ

K tð Þ = Σt−1t E′ EΣt−1

t E′ + R� −1 ðA:18Þ

where, E = BΦ

xtt = xt−1t + K tð Þ y tð Þ−Ext−1

t

� ðA:19Þ

Σtt = Σt−1

t −K tð ÞEΣt−1t ðA:20Þ

The above recursion is initialized using x10=μo and Σ1

0=Σo.

Kalman SmoothingIn the smoothing step, the goal is to compute the posterior

distribution of x(t) given all the observations y(τ), τ=1,2…..,T andthe parameters

p x tð Þjy 1ð Þ; y 2ð Þ…:y Tð Þð Þ = N xTt ;ΣTt

� ðA:21Þ

The mean xtT and covariance Σt

T at each time t can be estimatedusing the following backward recursions.

xTt = xtt + Jt xTt + 1−G tð Þxtt�

ðA:22Þ

ΣTt = Σt

t + Jt ΣT + 1t −Σt−1

t

� J′t ðA:23Þ


where, Jt is defined as

Jt = ΣttG′ tð Þ Σt−1

t

� −1 ðA:24Þ

The above backward recursions are initialized noting the fact thatfor t=T, xT

T and ΣTT can be obtained from Eqs. (21) and (22)

respectively.

M-step

In the M-step, the goal is to find the unknown parameters Θ={A,C1,..,CJ,D,Q,B,R} given the data and the current posterior distributionsof x(t), t=1,2…..,T. The parameters Θ can be estimated bymaximizing the expected complete log-likelihood of the data andtherefore the resulting estimates are called as maximum likelihoodestimates.

The complete log-likelihood of the data is given by

L = logp x 1ð Þ; x 2ð Þ…x Tð Þ; y 1ð Þ; y 2ð Þ…y Tð ÞjΘð Þ

= logp x 1ð Þ jΘð Þ + ∑T

t=2logp x tð Þjx t−1ð Þ;Θð Þ + ∑

T

t=1logp y tð Þjx tð Þ;Θð Þ

ðA:25Þ

Estimation of A, Cj's and DThe complete log-likelihood that depends on the parameters A, C

and D is given by

L A;C1; ::CJ ;D�

∝−0:5 ∑T

t=2xs tð Þ− A + ∑

J

j=1vj tð ÞCj

!Fx t−1ð Þ−du tð Þ

!′

�Q−1 xs tð Þ− A + ∑J

j=1vj tð ÞCj

!Fx t−1ð Þ−du tð Þ

! ðA:26Þ

where, F = IM 0M� L−1ð Þ� �

; d = diag Dð Þ and xs(t)=x(1:M,t)=s(t).Taking expectations of Eq. (A.26) with respect to p(x(t)|y(1),

y(2)….,y(T)) and then differentiating the resulting equation with A,C and d results in the following coupled linear equations:

A C1; ::CJ dh i ∑

T

t=2F tð ÞP t−1ð ÞF tð Þ′ ∑

T

t=2F tð ÞxTt−1u tð Þ

∑T

t=2xTt−1u tð ÞF tð Þ′ ∑

T

t=2u tð Þ2

0BBBB@1CCCCA

= ∑T

t=2Ps t; t−1ð ÞF tð Þ′ ∑

T

t=2ms tð Þu tð Þ

" #ðA:27Þ

where,

P tð Þ = ΣTt + xTt

� xTt�

′ ðA:28Þ

ms tð Þ = xTt 1 : Mð Þ ðA:29Þ

F tð Þ = IM v1 tð ÞIM…_:vJ tð ÞIMh i′

F ðA:30Þ

(ms(t) is first M elements of xtT) ,

P t; t−1ð Þ = Jt−1ΣTt + xTt

� xTt�

′ ðA:31Þ

Ps(t, t−1) is the first M×M sub matrix of P(t, t−1).



1070

1071

1072

1073

107410751076

1077

1078Q51079Q6

1080

1081

1082

1083

1084

1085

108610871088

1089

1090

1091

1092

1093

1094

1095

1096

10971098

1099

1100

1101

11021103

110411051106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117111811191120112111221123112411251126112711281129113011311132

1133

1134

1135

1136

11371138

113911401141

1142

11431144

1145114611471148

114911501151

1152

1153

115411551156

1157

1158

115911601161

1162

11631164


Estimation of QTaking expectations of Eq. (A.26) with respect to p(x(t)|y(1),

y(2)….,y(T)) and then differentiating the resulting equation with Q,the estimate of Q is given by

Q =1

T−1∑T

t=2ðPs tð Þ−Ps t; t−1ð ÞF′G tð Þ′−ms tð Þu tð Þ′d−G tð ÞFPs t; t−1ð Þ

+ G tð ÞFP t−1ð ÞF′G tð Þ′ + G tð ÞFxTt−1u′tð Þd−du tð Þm′s tð Þ

+ du tð Þ xTt−1′ F′G tð Þ′ + d′u tð Þ2d�

ðA:32Þ

Where,Ps(t) is the first M×M sub matrix of P(t). Note that the estimated

values of A A�

; C C�

and d (d) obtained by solving Eq. (34) are usedin Eq. (35) in place of A, C and d.

Estimation of BEach row vector bm, m=1,2…..,M can be estimated (we assume

the noise covariance matrix R to be diagonal) by maximizing theconditional expectation of the complete log-likelihood given inEq. (A.25). By taking the derivative of the conditional expectationand equating it to zero, the estimate of bm is given by

bm′ = Φ ∑T

t=1Pm tð ÞΦ′

!−1

Φ ∑T

t=1ym tð ÞxTt mð Þ ðA:33Þ

Where, Pm(t)=E(sm(t)s′m(t)|y(1),y(2)….y(T)) that can be easilyfound from P(t).

ym(t) and xtT(m) are m-th elements of the vectors y(t) and xt

T

respectively.

Estimation of RThe diagonal observation covariance matrix R can be estimated by

maximizing the conditional expectation of the complete log-likeli-hood given in Eq. (A.25). The estimation of diagonal components of Rare given by

R m;mð Þ = 1T∑T

t=1ðy2m tð Þ−2bm′ Φym tð ÞxTt mð Þ + trace bm′ ΦPr tð ÞΦbm

� ;

m = 1;2;…MðA:34Þ

Estimation of μo and Σo

The maximum likelihood estimates of initial state mean μo andcovariance Σo are given by

μo = xT1 ðA:35Þ

Σo = ΣT1 ðA:36Þ

The above E and M steps are repeated until the change in log-likelihood of the data between two iterations is below a specifiedthreshold.

Appendix B

Solving MDS using VB framework

In VB, the goal is to find the posterior distributions of latentvariables qS SjYð Þ and parameters qΘ(Θ|Y) by maximizing the lowerbound on the log evidence L(q) given in Eq. (5).


VB-E-step

In this step, the posterior distributions of latent variables qS SjYð Þare estimated given the current posterior probability of the modelparameters qΘ(Θ|Y). As in MLE approach, we compute the posteriorsof embedded latent signals x(t) fromwhich the posterior of s(t) can beobtained. The distribution over these latent variables is obtained usinga sequential algorithm similar to Kalman smoothing which we used inthe E-step of MLE approach. In the VB version of Kalman smoothing,the point estimates of the parameters are replaced by theirexpectations of the type E(ZWZ′) where Z is some parameter of themodel and W a matrix. Although, these expectations are straightfor-ward to compute, but are computationally expensive for higher ordermodels.We, therefore, use the approximation E(AWA′)=E(A)WE(A′),which gives qualitatively similar results and is computationallyefficient. This approach was also taken in (Cassidy and Penny,2002). As a result, the VB-E step is same as the E-step in MLEapproach. Therefore, mean and covariance of x(t) are given by xt

T andΣtT.

VB-M step

In this step, the posterior distributions of model parameters qΘ(Θ|Y) are estimated given the current posterior probability of the latentvariablesqS SjYð Þ . Using the probabilistic graphical model in Fig. 1, onecan show that the joint posterior distribution of parameters qΘ(Θ|Y)further factorizes as

qΘ ΘjYð Þ = q A;C1; ::CJ ;D;Q�

q B;Rð Þ ðB:1Þ

In this work, we also assume that the state and observation noisecovariance matrices (Q&RÞ to be diagonal. Therefore, the distributionof the elements in the rows of A;C1; ::CJ ;D&B can be inferredseparately. Consider the state equation for the m-th node

sm tð Þ = am + ∑J

j=1vj tð Þcj;m

!sm t−1ð Þ + dmu tð Þ + wm tð ÞÞ;

wm tð Þ eN 0;βmð Þ

ðB:2Þ

where, am and cj,m are m-th rows of A and Cj respectively andβm = 1

Q m;mð Þ : In terms of embedded signal x(t), the above equationcan be written as:

sm tð Þ = θ′m F tð Þx tð Þ;u tð Þ½ � + wm tð Þ ðB:3Þ

Where, θ′m=[am,c1,m,…,cJ,m,dm]F(t)=[IM v1(t)IM…..vJ(t)IM]′F.We assume the following Gaussian-Gamma conjugate priors for θm

and βm

p θm; βmjαð Þ eN 0; βmAαð Þ−1�

Ga ao; boð Þ ðB:4Þ

Where,α=[α1,α2,….,α2M+1] are the hyperpriors on each elementof θm and Aα=diag(α).

Let the prior on α be

p αð Þ = ∏2M+1

i=1Ga co;doð Þ ðB:5Þ

Therefore, by applying Eq. (9), the joint posterior for θm&βm isgiven by

q θm; βmjYð Þ = N θm ;β−1m Σm

� Ga am;N ; bmN

� ðB:6Þ



1165

11661167

11681169

11701171

117211731174

117511761177

11781179

118011811182

1183

1184

1185

1186

118711881189

1190

119111921193

1194

1195

119611971198

11991200

12011202

12031204

12051206

12071208

12091210

12111212

121312141215

1216

1217

1218

1219

1220

1221

1222

1223122412251226122712281229

1230

1231

1232

1233

12341235123612371238123912401241Q712421243Q81244124512461247124812491250125112521253Q912541255125612571258Q1012591260126112621263126412651266126712681269


Where,

∑−1m =

∑T

t=2F tð ÞP t−1ð ÞF tð Þ′ ∑

T

t=2F tð ÞxTt−1u tð Þ

∑T

t=2xTt−1u tð ÞF tð Þ′ ∑

T

t=2u tð Þ2

0BBBB@1CCCCA ðB:7Þ

θm = Σm

∑T

t=2F tð ÞE sm tð Þx t−1ð Þð Þ

∑T

t=2Ps t; t−1ð ÞF tð Þ′

0BBBB@1CCCCA ðB:8Þ

am;N = ao +T + 2M

2ðB:9Þ

bm;N = bo + 0:5 ∑T

t=2E s2m tð Þ�

−θm′ Σ−1m θm

!ðB:10Þ

The posterior for hyper parameters α is given by

q αjYð Þ = ∏2M+1

i=1Ga cN ;dNið Þ ðB:11Þ

Where,

cN = co +12

ðB:12Þ

dNi = do +12

θ2m ið Þ am;N

bm;N+ Σm i; ið Þ

!ðB:13Þ

The posteriors for θm, βm and α are estimated for each m=1,2,…,M from which the posteriors for A;C1;…;CJ ;D&Q are computed.

Similarly, the posterior distribution for the model parameters inthe output equation is computed. Since R is assumed to be diagonal,the observation equation in m-th node is given by

ym tð Þ = bmΦxm tð Þ + em tð Þ; em tð Þ eN 0;λmð Þ ðB:14Þ

Where, λm = 1R m;mð Þ. Again assuming Gaussian-Gamma conjugate

priors for bm and λm

p bm; λmjαð Þ eN 0; λmAαð Þ−1�

Ga ao; boð Þ ðB:15Þ

Where, α=[α1,α2,….,αP] are the hyper priors on each element ofbm and Aα=diag(α).

Let the prior on α be

p αð Þ = ∏P

i=1Ga ao; boð Þ ðB:16Þ

By applying Eq. (9), the joint posterior for bm and λm is given by

q bm; λmjYð Þ = N bm;λ−1m Vm

� Ga am;N ; bmN

� ðB:17Þ

V−1m = Φ ∑

T

t=1Pm tð ÞΦ′ + Aα ðB:18Þ

bm = VmΦ ∑T

t=1ym tð ÞxTt mð Þ ðB:19Þ

am;N = ao +T + P−1

2ðB:20Þ


bm;N = bo + 0:5 ∑T

t=2E y2m tð Þ�

−bm′ V−1m bm

!ðB:21Þ

cN = co +12

ðB:22Þ

dNi = do +12

b2m ið Þ am;N

bm;N+ Vm i; ið Þ

!ðB:23Þ

Aα i; ið Þ = cNdNi

ðB:24Þ

The posteriors for bm, λm andα are estimated for eachm=1,2,...,M.In this work, we set the hyperparameters ao, bo, co and do to 10−3.

The VB-E and VB-M steps are repeated until convergence.

Appendix C

Initialization

The above iterative procedure (E and M step) needs to beinitialized. In this work, we estimate the initial values of latent signalss tð Þ �

by estimating them at each node using the Weiner deconvolu-tion method of (Glover, 1999) wherein the canonical HRF is used forthis deconvolution step. We then estimate the initial values of A, C, dand Q by solving Eq. (1) by least squares assuming that the s tð Þ0s aretrue values. Similarly the parameters B and R are estimated fromEq. (3) by least squares approach. The EM algorithm is then run usingthese initial values till the required convergence is obtained.

Appendix D. Supplementary data

Supplementary data to this article can be found online atdoi:10.1016/j.neuroimage.2010.09.052.

References

Abler, B., Roebroeck, A., Goebel, R., Hose, A., Schonfeldt-Lecuona, C., Hole, G., Walter, H.,2006. Investigating directed influences between activated brain areas in a motor-response task using fMRI. Magn. Reson. Imaging 24, 181–185.

Bishop, C., 2006. Pattern Recognition and Machine Learning. Springer.Box, G.E.P., Jenkins, G.M., Reinsel, G.C., 1994. Time Series Analysis Forecasting and

Control. Pearson Education.Bressler, S.L., Menon, V., 2010. Large-scale brain networks in cognition: emerging

methods and principles. Trends Cogn. Sci.Bressler, S.L., Seth, A.K., 2010. Wiener–Granger causality: a well established

methodology. Neuroimage.Cassidy, M.J., Penny, W.D., 2002. Bayesian nonstationary autoregressive models for

biomedical signal analysis. IEEE Trans. Biomed. Eng. 49, 1142–1152.Daunizeau, J., Friston, K.J., Kiebel, S.J., 2009. Variational Bayesian identification and

prediction of stochastic nonlinear dynamic causal models. Physica D 238,2089–2118.

Deshpande, G., Hu, X., Stilla, R., Sathian, K., 2008. Effective connectivity during hapticperception: a study using Granger causality analysis of functional magneticresonance imaging data. Neuroimage 40, 1807–1814.

Deshpande, G., Sathian, K., Hu, X., 2009. Effect of hemodynamic variability on Grangercausality analysis of fMRI. Neuroimage.

Friston, K., 2009a. Causal modelling and brain connectivity in functional magneticresonance imaging. PLoS Biol. 7, e33.

Friston, K., 2009b. Dynamic causal modeling and Granger causality comments on: Theidentification of interacting networks in the brain using fMRI: model selection,causality and deconvolution. Neuroimage.

Friston, K.J., 2009c. Modalities, modes, and models in functional neuroimaging. Science326, 399–403.

Friston, K.J., Harrison, L., Penny, W., 2003. Dynamic causal modelling. Neuroimage 19,1273–1302.

Fuster, J.M., 2006. The cognit: a network model of cortical representation. Int. J.Psychophysiol. 60, 125–132.

Ge, T., Kendrick, K.M., Feng, J., 2009. A novel extended Granger Causal Model approachdemonstrates brain hemispheric differences during face recognition learning. PLoSComput. Biol. 5, e1000570.

Glover, G.H., 1999. Deconvolution of impulse response in event-related BOLD fMRI.Neuroimage 9, 416–429.



127012711272127312741275

1277127812791280Q11128112821283128412851286128712881289129012911292129312941295129612971298

1299130013011302130313041305130613071308Q1213091310131113121313131413151316131713181319Q1313201321132213231324132513261327

1329


Goebel, R., Roebroeck, A., Kim, D.S., Formisano, E., 2003. Investigating directed corticalinteractions in time-resolved fMRI data using vector autoregressive modeling andGranger causality mapping. Magn. Reson. Imaging 21, 1251–1261.

Guo, S., Wu, J., Ding, M., Feng, J., 2008. Uncovering interactions in the frequency domain.PLoS Comput. Biol. 4, e1000087.

Havlicek, M., Jan, J., Brazdil, M., Calhoun, V.D., 2010. Dynamic Granger causality basedon Kalman filter for evaluation of functional network connectivity in fMRI data.Neuroimage.

Hemmelmann, D., Ungureanu, M., Hesse, W., Wustenberg, T., Reichenbach, J.R., Witte,O.W., Witte, H., Leistritz, L., 2009. Modelling and analysis of time-variant directedinterrelations between brain regions based on BOLD-signals. Neuroimage.

Hesse, W., Moller, E., Arnold, M., Schack, B., 2003. The use of time-variant EEG Grangercausality for inspecting directed interdependencies of neural assemblies. J.Neurosci. Methods 124, 27–44.

Koller, D., Friedman, N., 2009. Probabilistic Graphical Models Principles and Techniques.The MIT Press.

Makni, S., Beckmann, C., Smith, S., Woolrich, M., 2008. Bayesian deconvolution of[corrected] fMRI data using bilinear dynamical systems. Neuroimage 42,1381–1396.

Mechelli, A., Price,C.J., Noppeney,U., Friston,K.J., 2003.Adynamiccausalmodelingstudyoncategory effects: bottom-up or top-down mediation? J. Cogn. Neurosci. 15, 925–934.

Murphy, K.P., 1998. Switching Kalman Filters. Technical report, DEC/Compaq Cam-bridge Research Labs.

Passingham, R.E., Stephan, K.E., Kotter, R., 2002. The anatomical basis of functionallocalization in the cortex. Nat. Rev. Neurosci. 3, 606–616.

Penny, W., Ghahramani, Z., Friston, K., 2005. Bilinear dynamical systems. Philos. Trans.R. Soc. Lond. B Biol. Sci. 360, 983–993.

Prichard, D., Theiler, J., 1994. Generating surrogate data for time series with severalsimultaneously measured variables. Phys. Rev. Lett. 73, 951–954.

1328


Rabiner, L.R., 1989. A tutorial on hidden Markov models and selected applications inspeech recognition. Proc. IEEE 77, 257–285.

Rajapakse, J.C., Zhou, J., 2007. Learning effective brain connectivity with dynamicBayesian networks. Neuroimage 37, 749–760.

Ramsey, J.D., Hanson, S.J., Hanson, C., Halchenko, Y.O., Poldrack, R.A., Glymour, C., 2009.Six problems for causal inference from fMRI. Neuroimage 49, 1545–1558.

Roebroeck, A., Formisano, E., Goebel, R., 2005. Mapping directed influence over thebrain using Granger causality and fMRI. Neuroimage 25, 230–242.

Roebroeck, A., Formisano, E., Goebel, R., 2009. The identification of interacting networksin the brain using fMRI: model selection, causality and deconvolution. Neuroimage.

Sato, J.R., Junior, E.A., Takahashi, D.Y., de Maria Felix, M., Brammer, M.J., Morettin, P.A.,2006. A method to produce evolving functional connectivity maps during thecourse of an fMRI experiment using wavelet-based time-varying Granger causality.Neuroimage 31, 187–196.

Seth, A.K., 2005. Causal connectivity of evolved neural networks during behavior.Network 16, 35–54.

Seth, A.K., 2010. AMATLAB toolbox for Granger causal connectivity analysis. J. Neurosci.Methods 186, 262–273.

Smith, J.F., Pillai, A., Chen, K., Horwitz, B., 2009. Identification and validation of effectiveconnectivity networks in functional magnetic resonance imaging using switchinglinear dynamic systems. Neuroimage.

Sridharan, D., Levitin, D.J., Menon, V., 2008. A critical role for the right fronto-insularcortex in switching between central-executive and default-mode networks. Proc.Natl. Acad. Sci. U. S. A. 105, 12569–12574.

Tipping, M., 2001. Sparse Bayesian learning and relevant vector machine. J. Mach. Learn.Res. 1, 211–244.

Valdes-Sosa, P.A., Sanchez-Bornot, J.M., Lage-Castellanos, A., Vega-Hernandez, M.,Bosch-Bayard, J., Melie-Garcia, L., Canales-Rodriguez, E., 2005. Estimating brainfunctional connectivity with sparse multivariate autoregression. Philos. Trans. R.Soc. Lond. B Biol. Sci. 360, 969–981.



Documents

Multivariate dynamical systems models for estimating causal ...med.stanford.edu/content/dam/sm/scsnl/documents/Ryali...Third, causal interactions between brain regions 31 can change