Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Leverage, asymmetry and heavy tails in thehigh-dimensional factor stochastic volatility
model
Mengheng Li∗
University of Technology Sydney, UTS Business School, Australia
Marcel Scharth†
University of Sydney Business School, Australia
We develop a factor stochastic volatility model that incorporates leverage effects,return asymmetry, and heavy tails across all systematic and idiosyncratic modelcomponents. Our model leads to a flexible high-dimensional dependence structurethat allows for time-varying correlations, tail dependence and volatility response toboth systematic and idiosyncratic return shocks. We develop an efficient Markovchain Monte Carlo (MCMC) algorithm for posterior estimation based on the par-ticle Gibbs, ancestor sampling, particle efficient importance sampling methods andinterweaving strategy. To obtain parsimonious specifications in practice, we buildcomputationally efficient model selection directly into our estimation algorithm. Wevalidate the performance of our proposed estimation method via simulation studieswith different model specifications. An empirical study for a sample of US stocksshows that return asymmetry is a systematic phenomenon and our model outper-forms other factor models for value-at-risk evaluation.
Keywords: Generalised hyperbolic skew Student’s t-distribution; Metropolis-Hastings algorithm;Importance sampling; Particle filter; Particle Gibbs; State space model; Time-varying covariancematrix; Factor model
JEL Classification: C11; C32; C53; C55; G32
∗Corresponding author. Email: [email protected]; PO BOX 123, Broadway NSW 2007, Australia†Email: [email protected]
1
1 Introduction
The vast literature on financial times series provides ample evidence for the presence of time-
varying volatilities and correlations, volatility co-movements, leverage effects, return asymmetry,
and heavy tails in asset return series. The class of stochastic volatility (SV) models incorporates
these stylised facts into a range of univariate and multivariate specifications. Shephard and Pitt
(1997) and Durbin and Koopman (1997) among many others, develop estimation procedures for
SV models with Student’s t errors. Koopman and Hol Uspensky (2002) and Yu (2005) discuss
leverage effects in SV models. Asai et al. (2006) and Chib et al. (2009) review several approaches
for multivariate stochastic volatility modelling.
This paper aims to model a wide range of stylised facts in high-dimensional settings. We fol-
low Pitt and Shephard (1999a), Chib et al. (2006), Kastner et al. (2017) and Kastner (2019) and
consider a factor stochastic volatility (FSV) framework. FSV models represent each return series
as a linear combination of factor (systematic) innovations and an asset-specific (idiosyncratic)
innovation, all subject to SV. To balance parsimony and flexibility in this multivariate frame-
work, we implement Bayesian variable selection to all systematic and idiosyncratic components,
each of which follows a univariate SV model with leverage effects proposed by Nakajima and
Omori (2012). We term the latter GHSt-SVL because it incorporates skewness and heavy-tails
via the generalised hyperbolic skew Student’s t errors of Aas and Haff (2006).
The motivation for adopting GHSt-SVL as the univariate basis of our FSV specification is
twofold. First, it ensures that each marginal return series is consistent with a flexible specification
found to be empirically relevant in univariate studies. Second, GHSt-SVL specification for factors
introduces flexibility into the dependence structure of the model. By incorporating heavy-tailed
and asymmetric factor innovations and factor leverage effects, our model can account for tail
co-movements (Beine et al., 2010; Oh and Patton, 2017) and higher correlations for downside
moves (Ang and Chen, 2002; Patton, 2004).
We develop an efficient Markov Chain Monte Carlo (MCMC) algorithm for our FSV model
based on the particle Gibbs method of Andrieu et al. (2010). Our sampling scheme for latent
SV processes builds on the efficient importance sampling (EIS) algorithm of Richard and Zhang
(2007) and the particle Gibbs with ancestral sampling (PGAS) algorithm of Lindsten et al.
(2014). The EIS method constructs a globally optimal approximation to the conditional posterior
2
distribution of the SV processes which is typically orders of magnitude more accurate than
methods based on local approximation, such as the multi-move sampler of Nakajima and Omori
(2012). Using PGAS as an efficient way to implement the EIS proposal within our MCMC
scheme, we contrast the literature on using Gibbs-type algorithms for multivariate SV models
with rich model specifications as in Nakajima (2017). In this regard, our paper is similar to
Grothe et al. (2017) who advocate the use of Particle Gibbs for general state space models.
Studies on SV models can be split into two strands: (1) rich model specifications in univariate
settings and; (2) simple model specifications in multivariate settings. On the computational
front, the former includes the pseudo-marginal Metropolis-Hastings algorithm of (Andrieu and
Roberts, 2009; Andrieu et al., 2010), Gibbs sampling with data augmentation (Kim et al., 1998;
Kastner and Fruhwirth-Schnatter, 2014), and Metropolis-within-Gibbs sampling (Gilks et al.,
1995, Geweke and Tanizaki, 2001, Koop et al., 2007, Watanabe and Omori, 2004), whereas
the latter includes Chib et al. (2006), Chib et al. (2009), Kurose and Omori (2016), and more
recently Kastner et al. (2017) and Kastner (2019). An attempt to bridge these two strands of
studies is the multivariate model of Nakajima (2017). However, this model suffers from curse of
dimensionality, making estimation extremely challenging for large portfolio sizes. Also, Gruber
et al. (2016) and Gruber et al. (2017) tackle the dimensionality problem via a coupled system of
univariate models, using highly structured simultaneous graphical dynamic linear models where
estimation is accelerated via GPU parallelisation and variational Bayes. In comparison, we
consider factor analysis such that the number of parameters grows linearly with the number
of assets and conduct exact Bayesian inference. In particular, the factor structure leads to a
convenient sampling scheme which reduces to vectorised parallel treatment of many univariate
series conditional on factor loadings. Noticed by many however, the mixing of factor loadings
can be poor as loading matrix and factors show up in the likelihood in product form. Among
others, Chib et al. (2006) propose to marginalise out factors via a Rao-Blackwellisation procedure
to improve mixing. Alternatively, Kastner et al. (2017) and Kastner (2019) implement the
ancillarity-sufficiency interweaving strategy developed by Yu and Meng (2011) to boost mixing
efficiency, utilising the dependence structure between loadings and factors. Yet little is known
about how these two work in high-dimensional FSV models with rich model specifications. We
thus implement both and compare them within our proposed sampling framework.
3
We organise our discussion as follows. Section 2 introduces our FSV model that allows for
leverage effects, asymmetry and heavy tails in all model components. Section 3 introduces the
proposed efficient MCMC estimation procedure. Section 4 presents simulation studies under
different model specifications. Section 5 presents empirical applications for a portfolio of stock
returns based on the components of the S&P100. Section 6 concludes.
2 The factor stochastic volatility model
Our FSV model is based on the basic FSV model developed in Aguilar and West (2000) and
Chib et al. (2006). The latter takes the form yt = Λft + ut with Gaussian factor ft and
Gaussian idiosyncratic components ut whose log variances are modelled by AR(1) processes.
Efficient sampling of the basic FSV model has been also studied by Kastner et al. (2017) and
Kastner (2019) among others. In this section, we introduce our FSV model that assigns a very
flexible specification to both factor and idiosyncratic components to account for other features
of financial return series beyond time-varying volatility, and discuss identification.
2.1 Leverage effect, asymmetry and heavy tails in FSV models
We propose a FSV model with leverage effects, asymmetry and heavy tails (FSV-LAH). Suppose
for t = 1, ..., T , we have a vector of returns yt ∈ RN that is assumed to be driven by systematic
factors ft ∈ Rn with loadings Λ ∈ RN×n and idiosyncratic components ut ∈ RN . Let E(νi,t) = 0,
E(ν2i,t) = 1, and E(νi,tνj,t) = 0 for i 6= j and i, j = 1, ..., n+N . The model is given by
yt = Λft + ut, (f ′t , u′t)′ = diag
[exp
(h1,t
2
), ..., exp
(hn+N,t
2
)]νt. (1)
The FSV model above features hi,t and νi,t, i = 1, ..., n, as factor log variances and systematic
shocks, and with i = n + 1, ..., n + N as idiosyncratic log variances and idiosyncratic shocks.
FSV-LAH becomes the standard FSV model when νt is normal. Let zt = (f ′t , u′t)′ for all t. For
4
all i, our proposed FSV-LAH model further reads
zi,t = exp
(hi,t2
)νi,t, νi,t = αi + βiWi,tγi +
√Wi,tγiεi,t,
hi,t+1 = µi(1− φi) + φihi,t + ηi,t, hi,1 ∼ N
(µi,
σ2i
1− φ2i
),εi,t
ηi,t
∼ N0
0
, 1 ρiσi
ρiσi σ2i
, Wi,t ∼ IG
(ζi2,ζi2
),
(2)
where each νi,t follows the generalized hyperbolic skew Student’s t (GHSt) distribution, which
we write as a mean-variance Gaussian mixture. The mixing variable Wi,t follows the inverse
gamma (IG) distribution. We standardise νi,t by choosing αi = −βiγiζi/(ζi − 2) and γi =
(E(Wi,t) + β2i V ar(Wi,t))
−1/2, so that νi,t has zero mean and unit variance. ζi > 4 is imposed
to ensure the existence of a finite variance. The asymmetry parameter βi and the degrees of
freedom ζi jointly determine the asymmetry and heavy-tailedness of νi,t.
Suppressing the subscript i, we make some remarks:
Remark 1. Aas and Haff (2006) details the GHSt distribution, including its density function
fν , k-th moment E(|ν|k), and an EM algorithm for parameter estimation. The GHSt dis-
tribution is ideal for modelling financial data. Besides it explicitly models return asymme-
try, the distribution enables a flexible specification of tail behaviours. In the tails, we have
fν(ν) ∝ |ν|−ζ/2−1exp(−|βν| + βν) as ν → ±∞. So depending on the value of β and ζ, we can
allow for a polynomially decaying tail, say the left tail for negative returns, while having an
exponentially decaying right tail.
Remark 2. The univariate SV model with leverage effects and GHSt-distributed errors, or
GHSt-SVL, has been originally proposed in Nakajima and Omori (2012) and extended by Naka-
jima (2017) in low-dimensional multivariate settings. That fν can be written as a Gaussian
mean-variance mixture facilitates a Gibbs sampling procedure for estimation. Leverage effect
E(νtηt/σ) is modelled via E(εtηt) = σρ.
Remark 3. In the supplementary appendix, we show that E(νtηt/σ) = Le(β, ζ)ρ where the
multiplier Le(β, ζ) = Γ(ζ/2−1/2)Γ(ζ/2)
√(ζ−2)2(ζ−4)
2ζ2+(4β2−12)ζ+16. Basic algebra shows that Le(β, ζ) ∈ (0, 1),
∀β, ζ ∈ R with ∂Le∂ζ > 0, ∂
2Le∂ζ2 < 0, ∂Le∂|β| < 0, and ∂2Le
∂β2 < 0. Given β, large ζ makes fv less skewed
and lighter tailed (Aas and Haff, 2006); subsequently, Le(β, ζ) tends to one or leverage effect
5
tends to ρ. Given ζ > 4, the magnitude of leverage effect decreases to zero with |β| even though
ρ 6= 0. It means that if the return shock νt puts a large weight on the mixing variable Wt (i.e.
large |β|), leverage effect vanishes.
Due to the GHSt-SVL model used for both systematic and idiosyncratic components, our FSV-
LAH model makes possible the dependence structure along “stylised facts” including volatility
clustering, return asymmetry, leverage effects and heavy-tailedness. It is empirically important
to study the roles that they play in a high-dimensional setting, because a vast literature has
shown that such “stylised facts” are central to forecasting performance (see e.g., Nelson, 1991,
Yu, 2005 and Nakajima and Omori, 2012). Later in Section 3.1.3, we introduce Bayesian variable
selection to shrink ρi’s and βi’s so that we can also distinguish the systematic sources of leverage
effects and return asymmetry from idiosyncratic ones.
2.2 Identification of factors and loadings
To mitigate order dependence in the identification (of rotation, sign and scale) of factor models
(see e.g. Bai and Wang, 2015), we propose a pre-estimation step to pin down restrictions in Λ.
The first step relies on the principal component analysis (PCA) applied to yt which produces
ΛPCA. Rotation is identified by putting the i− 1 smallest loadings in absolute value in the i-th
column of ΛPCA to zero, i = 1, ..., n; and we fix the location of the zero loadings throughout.
The sign of the i-th factor and the i-column in Λ are identified by restricting the largest entry
in absolute value in the i-th column in ΛPCA to be positive; this is to say Λkii > 0, where
ki = arg maxk|ΛPCAki |; k = 1, ..., n
and Xij denotes the i, j-th element of matrix X. The scale
is identified by restricting µi = 0 for i = 1, .., n.
With ki = i and without the PCA step, the identification scheme is identical to the one in
Kastner et al. (2017) who argue that this scheme does not impose the disproportionate effect of
the n leading variables on factor dynamics. We choose the zero restrictions on Λ using the extra
PCA step, because PCs are order-invariant. As a result, the identified factor space is close to
the one spanned by the unique order-invariant PCs.
6
3 Bayesian estimation
This section introduces our proposed Bayesian estimation procedure for the FSV-LAH model.
Conditional on Λ and system parameters, ft and ut can be sampled directly. This gives n +
N univariate GHSt-SVL models, which can be analysed in parallel in each MCMC run on a
computer with multi-threading CPUs. We first provide a sampling algorithm to this univariate
problem that is more efficient but slightly more computationally intensive than the one used in
Nakajima and Omori (2012) and Nakajima (2017) and second a sampling procedure for drawing
Λ from its full conditional posterior distribution when N or/and n become large.
3.1 Efficient estimation of the GHSt-SVL model
In each run of the full MCMC sampler, we obtain a sample of zt = (f ′t , u′t)′ ∈ Rn+N conditional
on Λ with each zi,t seen as a separate GHSt-SVL model. Suppressing the subscript i in Section
3.1.1 and 3.1.2 and letting θ = (σ, ρ, φ, µ, β, ζ) collect the model parameters and x1:t denote
x1, x2, ..., xt, we propose a Gibbs sampler for zt that iterates over
1. Sampling (h1:T ,W1:T )|z1:T , θ;
2. Sampling θ|z1:T , h1:T ,W1:T .
The key feature here is the joint sampling of the entire trajectories h1:T and W1:T as a single
block via an efficient particle Gibbs algorithm. In comparison, Nakajima and Omori (2012)
sample ht|Wt using a multi-move sampler (Shephard and Pitt, 1997), and subsequently sample
Wt|ht using a Metropolis-Hastings (MH) step.Our procedure avoids the autocorrelation cause
by the iterative sampling between ht|Wt and Wt|ht.
3.1.1 Particle Gibbs with ancestor sampling for sampling (h1:T ,W1:T )|z1:T , θ
Let xt denote (ht,Wt)′ and θ be suppressed. To sample from p(x1:T |z1:T ), we modify the particle
Gibbs with ancestor sampling (PGAS) method developed by Lindsten et al. (2014). Compared
with the usual forward-filtering and backward-smoothing with importance sampling (see Shep-
hard and Pitt, 1997 and Watanabe and Omori, 2004), particle Gibbs (PG, see e.g., Doucet
et al., 2001 and Chopin et al., 2013) can be computationally advantageous if a small number of
particles is used in the one-way simulation from t = 1 to T with sequential Monte Carlo.
7
Although the basic PG with input xold1:T (the old draw in the Markov chain) and output xnew1:T
(the new draw) is straightforward to implement, it suffers from the well-documented problem of
particle degeneration (see e.g., Pitt and Shephard, 1999b and Snyder et al., 2008)— p(xnew1:T =
xold1:T ) is bounded away from zero, which leads to poor mixing. According to Lindsten et al.
(2014), PGAS circumvents the global degeneration problem by cutting xold1:T into pieces of xt1:t2
where t1 < t2 and t1, t2 = 1, ..., T . Consequently, xnew1:T 6= xold1:T a.s. Lindsten et al. (2014) show
that PGAS improves the mixing of latent processes in a standard SV model.
We briefly discuss the adoption of PGAS in the GHSt-SVL model with details left in the
supplementary appendix. Suppose at t − 1, we have a particle system containing M par-
ticles xi1:t−1Mi=1 and associated weights ωit−1Mi=1 which approximates the filtering density
p(x1:t−1|z1:t−1) by p(x1:t−1|z1:t−1) ∝∑M
i=1 ωit−1D(x1:t−1 − xi1:t−1), where D(.) is the Dirac delta
function. PGAS propagates the particle system by first sampling ait, xtMi=1 from
It(at, xt) ∝ ωatt−1p(xt|xatt−1, zt−1), (3)
where p(xt|xt−1, zt−1) is the transition density of (ht,Wt)′, i.e.
p(xt|xt−1, zt−1) ∝ exp
(µ(1− φ) + φht−1 + ρσεt−1
(1− ρ2)σ2ht −
h2t
12(1− ρ2)σ2
)·W−
ζ2−1
t exp
(−ζ
2W−1t
)
with εt−1 = (zt−1e−ht−1/2−α−βWt−1γ)/
√Wt−1γ accounting for the leverage effect (Yu, 2005);
and where at indexes the ancestor particle, i.e. xi1:t = (xait1:t−1, x
it). Second, we augment the parti-
cle system with xold1:T by assigning xM+1t = xoldt and sample the ancestor according to P(aM+1
t =
i) ∝ ωit−1p(x?t |xit−1, zt−1), which rewrites the history of xold1:t by xM+1
1:t = (xaM+1t
1:t−1 , xM+1t ). The
augmented system at each t is then re-weighted according to
ωit ∝ p(zt|xit), for i = 1, ...,M + 1. (4)
Once the propagation reaches t = T , PGAS samples a new path xnew1:T from p(x1:T |z1:T ) ∝
ωiTD(x1:T − xi1:T ).
When propagating the particle system, PGAS breaks the dependence of xnew1:T on xold1:T , i.e.
the ancestral dependence, which allows for a much smaller number of particles than PG while
8
remaining functionality. It is however important to notice that PGAS does not break the depen-
dence between xnewt1:t2 on xoldt1:t2 for some t1 and t2 (ancestor sampling does not solve local particle
degeneration). We propose an efficient importance sampling procedure in the next section that
ensures local efficiency and uses a small number of particles. This helps reduce computational
burden from fewer function evaluations and Monte Carlo noise that is non-negligible in high-
dimensional models (Dang et al., 2015).
3.1.2 Approximating p(h1:T ,W1:T |z1:T , θ) via efficient importance sampling
Drawing xt from the transition density p(xt|xt−1, zt−1) renders the effective sample size small and
thus requires a large number of particles (in PG particularly) and constant resampling to avoid
particle degeneration at a cost of efficiency. The remedy is to incorporate an importance sampling
(IS) step that uses a proposal density q(xt|xt−1, zt−1,F), where F is a chosen information set,
to generate “informative” draws.
To maximise efficiency while utilising the small number of particles allowed in PGAS, we
propose to choose F = z1:T for the GHSt-SVL model. This is motivated by the fact that we
can express the likelihood of the GHSt-SVL model as
L(z1:T ) =
∫p(z1:T |x1:T )p(x1:T )
q(x1:T |z1:T )q(x1:T |z1:T )dx1:T
=
∫p(z1|x1)p(x1)
q(x1|z1:T )
T∏t=2
p(zt|xt)p(xt|xt−1, zt−1)
q(xt|xt−1, z1:T )q(x1:T |z1:T )dx1:T ,
(5)
where p(zt|xt) is Gaussian with mean eht/2(α + βWtγ) and variance γ2Wteht ; and q(x1:T |z1:T )
is a global approximating density that can be factorised into terms such as q(xt|xt−1, z1:T ). It is
evident that choosing F = z1:T has an informational advantage over other choices such as the
one in Pitt and Shephard (1999b) with F = zt, zt+1. Draws from q(xt|xt−1, z1:T ) are the best
“guesses” as they are informed by the full data information. Consequently, the effective sample
size is large even when the number of particle is small. Scharth and Kohn (2016) show that
the (frequentist) efficient importance sampling (EIS) method of Richard and Zhang (2007) and
Jung and Liesenfeld (2001) can be used in a particle marginal Metropolis Hastings (PMMH)
algorithm (Andrieu et al., 2010) for SV models to marginalise the log variance process and
accurately estimate the likelihood using as few as two particles.
9
It is worthwhile noticing that p(z1:T |x1:T )p(x1:T ) contains the product of Gaussian and IG
densities, which are exponential family distributions that are closed under multiplication. We
use this conjugacy property and choose q(x1:T |z1:T ) to be also a product of Gaussians and IG’s
so that q(x1:T |z1:T ) well approximates p(z1:T |x1:T )p(x1:T ). Because the integrand in (5) can be
factorised, we choose
q(xt|xt−1, z1:T ) ∝ exp
(btht −
1
2cth
2t
)·W st
t exp(rtW
−1t
). (6)
Let δt = (bt, ct, st, rt)′ collect the proposal density parameters which are functions of z1:T . In
the supplementary appendix, we show that given δt the candidate draws of ht and Wt can be
generated from N(µht , vht) with vht = [(1− ρ2)σ2]/[1 + (1− ρ2)σ2ct] and µht = vtbt + vt[µ(1−
φ) + φht−1 + ρσεt−1]/[(1− ρ2)σ2] and IG(st + ζ/2, rt + ζ/2), respectively. To determine δt, we
minimise the squared distance between log[p(zt|xt)p(xt|xt−1, zt−1)] and log q(xt|xt−1, z1:T ). The
latter being linear in δt conveniently allows us to write the minimisation problem as an OLS
with solution δt (ignoring constant) given by (X ′X )−1X ′Y. For j = 1, ..., J , X is J-by-5 with
the j-th row(
1, h(j)t ,−(h
(j)t )2/2, logW
(j)t , 1/W
(j)t
)and Y is J-by-1 with the j-th element
−
(zt − exp
(h
(j)t /2
)(α+ βW
(j)t γ
))2
2γ2W(j)t exp
(h
(j)t
) − g(δt+1, zt−1, h
(j)t , h
(j)t−1,W
(j)t ,W
(j)t−1
),
where h(j)t and W
(j)t are the j-th draw from N(µoldht , v
oldht
) and IG(soldt + ζ/2, roldt + ζ/2), respec-
tively, with the proposal density parameters δoldt obtained in the previous run of the Markov
chain.
Importantly, g(δt+1, zt−1, h
(j)t , h
(j)t−1,W
(j)t ,W
(j)t−1
)is a function (see supplementary appendix)
that depends on δt+1, which unfortunately creates computational overhead cost because δt+1
has to be determined prior to δt, resulting in an un-parallelisable backward recursion (see also
Richard and Zhang, 2007). This backward-shift of δt+1, a function of z1:T , is however crucial for
obtaining an efficient proposal density that enables us to generate draws of xt using information
in z1:T . In simulation studies and empirical studies, we choose J to be 50 to keep the computa-
tional cost of computing X ′X to the minimum. Lastly, with q(xt|xt−1, z1:T ) determined, PGAS
10
with EIS, or PGAS-EIS, changes step (3) and step (4) into
It(at, xt) ∝ ωatt−1q(xt|xatt−1, z1:T ), and ωit ∝
p(zt|xit)p(xit|xit−1, zt−1)
q(xit|xit−1, z1:T ).
Analysing the optimal number of particles M is beyond the scope of this research. Neverthe-
less, we use the criterion in Scharth and Kohn (2016) who apply EIS to estimate a standard SV
model via PMMH: M is such that the IS estimate of the variance of logL(z1:T ) equals approxi-
mately one; see also Pitt et al. (2012). In our simulation studies (except for one extreme case)
and empirical applications, M varies around 10. So we choose M = 20 to be conservative, a
very small number compared to other studies using particle methods.
3.1.3 Sampling θ|z1:T , h1:T ,W1:T and variable selection in the FSV-LAH model
Given hi,1:T , Wi,1:T , and zi,1:T , we can sample θi series-by-series. We here focus on σi, ρi and
βi, and sample the remaining parameters according to Nakajima and Omori (2012). We shrink
some ρi’s and βi’s to zero via Bayesian variable selection which avoids 22(n+N) model comparisons
and keeps the model parsimonious in a data-driven way. Non-zero ρi’s and βi’s also address the
systematic and idiosyncratic sources of leverage effects and return asymmetry.
Let p0(·) and p(|·) denote prior and conditional posterior distribution, respectively. For the
i-th series, the joint posterior distribution p(σi, ρi|·) is
p(σi, ρi|·) ∝ p0(σi, ρi)σ−Ti (1− ρ2
i )−T−1
2 exp
−
(1− φ2i )h
2i,1
2σ2i
−T−1∑t=1
(hi,t+1 − φhi,t − ρiσiεi,t)2
2σ2i (1− ρ2
i )
,
where hi,t = hi,t − µi. Re-parametrising ϑi = ρiσi and $i = σ2i (1 − ρ2
i ) as in Jacquier et al.
(2004) to facilitate the shrinkage prior and posterior used in Bayesian variable selection, we
introduce an axillary variable ∆ and define the joint prior as p0(ϑi|$i,∆)p0($i)p0(∆) where
p0($i) = IG(s0, r0) and
p0(ϑi|$i,∆) = ∆D(ϑi) + (1−∆)N(ϑ0, v2ϑ$i).
Here ∆ with p0(∆) = Beta(ς1, ς2) indicates non-zero probability at ϑi = 0 (Clyde and George,
2004), i.e. no leverage effect. For i = 1, ..., n+N , new $i is generated from p($i|·) = IG(s0 +
11
T/2, r0 + ri/2) with ri =∑T−1
t=1 (hi,t+1 − φhi,t)2 − µ2ϑi/σ2
ϑi+ ϑ2
0/v2ϑ and ϑi from
p(ϑi|·) =(1−∆i)D(ϑi)
∆iσ2ϑi
+ 1−∆i+
∆iσ2ϑi
σ2ϑi
+ 1−∆iN(µϑi , σ
2ϑi$i), (7)
with σ2ϑi
= σϑi exp(µϑ2i/2σ2
ϑi$i) and ∆i drawn from p(∆i|·) = Beta(ς1 +
∑n+Ni=1 1(ς2+6=0), ς2 +
n+N −∑n+N
i=1 1(ϑi 6=0)), where 1(.) denotes the indicator function, and with posterior variance
and mean given by
σ2ϑi
=
1
v2ϑ
+
T−1∑t=1
ε2i,t
−1
, µϑ = σ2ϑi
ϑ0
v2ϑ
+
T−1∑t=1
εi,t(hi,t+1 − φhi,t)
.
The Markov chain is updated by σi =√ϑ2i +$i and ρi = ϑi/σi.
It is standard to show that with a normal prior, the asymmetry parameter βi has a normal
posterior. So we can modify the prior same as the above and obtain a shrinkage posterior in the
form of (7). The supplementary appendix details how to sample the remaining parameters.
3.2 Sampling factors and factor loadings
Define yi,t = yi,t−(αn+i+βn+iWn+iγn+i)ehn+i,t/2 for i = 1, ..., N and Fi,t = (αi+βiWi,tγi)e
hi,t/2
for i = 1, ..., n. Conditional on log variances h1:T = h1,1:T , ..., hn+N,1:T and mixing components
W1:T = W1,1:T , ...,Wn+N,1:T , we have yt|ft ∼ N(Λft, Ut) and ft ∼ N(Ft, Vt), where yt =
(y1,t, ..., yN,t)′, Ft = (F1,t, ..., Fn,t)
′, Ut = diag(Wn+1,tγ2n+1e
hn+1,t , ...,Wn+N,tγ2n+Ne
hn+N,t), and
Vt = diag(W1,tγ21eh1,t , ...,Wn,tγ
2nehn,t). So we can draw ft from N(µft ,Σft) with
Σft = (Λ′U−1t Λ + V −1
t ), µft = Σft(Λ′U−1t yt + V −1
t Ft).
We assume an independent normal prior p0(Λij) = N(Λ0, v2Λ) for all free elements in Λ.
Because Λ and ft show up in the likelihood in product form, efficient sampling usually replies on
a Rao-Blackwellisation (RB) procedure to marginalise ft and sample Λ using a MH algorithm
(Chib et al., 2006) based on p(Λ|·) ∝ p(y1:T |Λ)p0(Λ). Notice that the conditional posterior is
given by
p(y1:T |Λ) ∝ exp
(−1
2
T∑t=1
log |Ωt|+ (yt − ΛFt)′Ω−1t (yt − ΛFt)
), (8)
12
with Ωt = ΛVtΛ′ + Ut. A multivariate Student’s t proposal is constructed for importance
sampling. Details are given in the supplementary appendix.
The RB method may become infeasible when N or n grows due to the cost of constructing
high-dimensional proposal density. Alternatively, we follow Kastner et al. (2017) and implement
the ancillarity-sufficiency interweaving strategy (ASIS) developed by Yu and Meng (2011). In
the proposed FSV-LAH model (1)-(2), ASIS for sampling Λ contains two steps: ancillary data
augmentation (ADA) and sufficient data augmentation (SDA). ADA step samples Λ row-by-row
as in Bayesian least squares based on the fact that p(yt|Λ) is Gaussian given ft, Wt and ht. With
ADA only, the mixing of the Markov chain is known to be very slow, if at all (Chib et al., 2006).
Followed by the ADA step, we rewrite the FSV-LAH model equivalently as
yt = Λft + ut,
where Λji = Λji/Λkii for j = 1, ..., N and i = 1, .., n, and where fi,t = Λkiifi,t = exp(hi,t/2)vi,t
with hi,t+1 = log(Λ2kii
)(1− φi) + φihi,t + ηi,t and hi,t = hi,t + log(Λ2kii
). For each i, SDA samples
Λkii based on hi,1:T only, because the latter is a sufficient statistic for p(log(Λ2kii
)|·), conditional
on other parameters.
With the bijective mapping between Λ and Λ, ASIS iterates over ADA and SDA steps to
generate posterior samples of Λ. Basu’s theorem suggests that conditional on other parameters,
the draws obtained from ADA and SDA steps are independent. Implementation details are left
in the supplementary appendix. Section 4 provides comparisons between RB and ASIS applied
to the FSV-LAH model.
3.3 Prior distributions
We follow the priors specified in Jacquier et al. (2004) and Nakajima and Omori (2012) for
parameters related to the GHSt-SVL specification for factor and idiosyncratic dynamics, and
Clyde and George (2004) for parameters in selection priors. Table 1 summarises the prior
distributions we use in our simulation and empirical study. The supplementary appendix shows
a carefully chosen initialisation procedure of the Markov chain that accelerates the convergence
of the Markov chain.
13
Table 1: Prior distributions
Distribution Parameter values
(φk + 1)/2 Beta(s1, s2) s1 = 20, s2 = 1.5µn+i N(µ0, v
2µ) µ0 = −10, v2
µ = 100
ΥBeta(ς1, ς2) ς1 = 1, ς2 = 1
∆βk|Υ ΥD(βk) + (1−Υ)N(β0, v
2β) β0 = 0, v2
β = 10
$k IG(s0, r0) s0 = 2.5, r0 = 0.025ϑk|$k,∆ ∆D(ϑk) + (1−∆)N(ϑ0, v
2ϑ$) ϑ0 = 0, v2
ϑ = 100ζk Gamma(r1, r2)1(ζ>4) r1 = 20, r2 = 1.25
Λij N(Λ0, v2Λ) Λ0 = 0, v2
Λ = 100
1 Subscripts are k = 1, ..., n+N , i = 1, ..., N and j = 1, ..., n. Notice µj = 0 due to identification restriction.2 Υ and ∆ are auxiliary parameters used in the selection priors introduced in Section 3.1.3.3 For all k, we have p0(ϑk, $k,∆) = p0(ϑk|$k)p0($k)p0(∆), resulted from ϑk = ρkσk and $ = σ2
k(1− ρ2k).
4 p0(Λkjj) is given by N(Λ0, v2Λ)1(Λkjj
>0) due to identification restriction.
4 Simulation study
We conduct three simulation studies (S1, S2 and S3). S1 compares our PGAS-EIS sampler, a
particle method, to the standard Gibbs approach developed by Nakajima and Omori (2012). S2
compares RB with ASIS for sampling factor loadings. S3 studies the selection of leverage effects
and asymmetry. A simulation study on univariate GHSt-SVL models is left in the supplementary
appendix. All computations are carried out by the object-oriented matrix language Ox of Doornik
(2007) and the library of state space functions SsfPack of Koopman et al. (2008) on a desktop
with quad-core Intel(R) Core(TM) i5-7500 3.40GHz CPU.
4.1 Simulation 1: PGAS-EIS v.s. Gibbs
Nakajima and Omori (2012) develop a Gibbs sampler for the univariate GHSt-SVL model, which
we call MM-MH. MM-MH alternately samples the log variance h1:T using a multi-move sampler
and the mixing variable Wt using a MH step. One could estimate the FSV-LAH model without
variable selection by plugging MM-MH into the R package factorstochvol of Kastner (2019).
With easy-to-implement feature, Gibbs approaches are common in literature on SV models.
To see how our PGAS-EIS sampler competes with MM-MH, the first simulation study varies
the triple (βi, ρi, ζi) for the GHSt-SVL dynamics in (2) because they jointly determine the
leverage effect, asymmetry and heavy-tailedness of zi,t where i = 1, ..., n+N and zt = (f ′t , u′t)′.
14
Table 2: Simulation 1 DGP values
Scenario (βi, ρi, ζi) Description
S1.1 (−1,−0.9, 5) excess asymmetry, excess leverage effect, heavy tailsS1.2 (−0.5,−0.3, 5) moderate asymmetry, moderate leverage effect, heavy tailsS1.3 (−1,−0.9, 30) excess asymmetry, excess leverage effect, light tailsS1.4 (−0.5,−0.3, 30) moderate asymmetry, moderate leverage effect, light tails
Subscripts are for i = 1, ..., n + N . Si.j denotes scenario j in Simulation i. Light tail here means the tail is much
lighter than the heavy tail scenario, although the tail still shows power decay.
100 Repetitions are generated under empirically plausible values φi = 0.95, σi = 0.1, T = 2000,
N = 50, n = 3 and all elements in Λ fixed at 0.5. All selection priors are switched off in S1, i.e.
Υ = ∆ = 0. DGPs are summarised in Table 2. We consider PGAS-EIS, PGAS where particle
propagation uses the bootstrap filter (Doucet et al., 2001), and MM-MH. Lastly, ASIS is used
for sampling factor loadings.
The mixing property of the Markov chain generated by each method is evaluated by the
inefficiency factor (IF, see Chib, 2001). IF is the number of draws needed to achieve the same
inferential accuracy as with independent draws; thus small IF is preferred. Figure 1 shows the
distribution of IFs of parameters in the dynamics of zt across all repetitions. Both PGAS-EIS
and PGAS outperform MM-MH under all DGPs. Clearly, the incorporation of ancestor sampling
gains efficiency. Noticeably, the two particle methods achieve hundreds-fold improvement for ζ.
ρ estimated by MM-MH suffers from inefficiency in S1.1 and S1.3 due to the “disappearance of
leverage effect” discussed in Section 2.1; whereas the two particle methods are less concerning.
The use of efficient importance sampling in PGAS-EIS further adds two- to five-fold reduction
in IFs from PGAS. Furthermore, despite that the mixing of Markov chains from SV models
depends on data at hand (Kastner and Fruhwirth-Schnatter, 2014; Hosszejni and Kastner, 2019),
the distribution of IFs of PGAS-EIS shows much smaller variability than the other two methods,
highlighting the anchoring effect of EIS proposal density.
Table 3 shows the maximum IFs of the last period value of factor processes. PGAS-EIS
performs the best with maximum IFs smaller than 30 . PGAS generally outperforms MM-MH
by obtaining two-digit IFs for factors, but it still shows inefficiency for ht and Wt when the DGP
features heavy and light tails, respectively. Although ancestor sampling avoids global particle
degeneration, local degeneration of certain periods, say the last period, is still likely. Using EIS
15
0
20
40
S1.
1
0
10
20
0
20
40
0
200
400
0
100
200
0
500
1000
0
10
20
30
S1.
2
0
10
20
0
10
20
30
0
50
100
150
0
50
100
150
0
200
400
600
0
10
20
30
S1.
3
0
10
20
0
20
40
0
100
200
300
0
100
200
0
1000
2000
3000
0
10
20
S1.
4
0
10
20
0
20
40
0
50
100
150
0
50
100
150
0
500
1000
Figure 1: Inefficient factors of latent process parameters estimated by PGAS-EIS, PGAS andMM-MH. Each column indicates a parameter of interest, and each row indicates a DGP. Each of the
three boxplots in one graph shows the distribution of 5, 300 inefficient factors of a particular parameter
from all 53 factor and idiosyncratic latent processes and across 100 repetitions, estimated, from left
to right, by PGAS-EIS, PGAS and MM-MH.
succeeds in largely mitigating this problem and boosting efficiency.
Despite the potential of PGAS and PGAS-EIS in FSV-LAH models, a central concern for
practitioners is the time cost. Table 4 shows the computation time for 1,000 MCMC runs of
each method under different DGPs, across the 100 repetitions. Not surprisingly, MM-MH is the
least time-consuming one as it has no particle propagation and thus avoids thousands of function
evaluations. The computation time PGAS-EIS consists of (i) the overhead time for constructing
EIS proposal density and (ii) the time for function evaluations in particle propagation; whereas
PGAS only needs the latter. Function evaluations depend on the number of particles and
resamplings. Although PGAS reduces the optimal number of particles (compared with basic
PG) needed in propagation (Lindsten et al., 2014; Scharth and Kohn, 2016), PGAS-EIS uses
EIS proposal density to anchor the particle evolution that minimises Monte Carlo noise, further
reducing the number of particles. To see how PGAS-EIS trades off between overhead cost and
number of particles, Table 5 reports the maximum optimal number of particles (no.p) (Pitt
16
Table 3: Maximum IFs for the factors, their log variances and mixing variables
f1,2000 f2,2000 f3,2000 h1,2000 h2,2000 h3,2000 W1,2000 W2,2000 W3,2000
S1.1: excess asymmetry, excess leverage effect, heavy tailsPGAS-EIS 17 22 14 12 21 19 15 26 17PGAS 65 87 63 84 172 133 72 153 84MM-MH 153 320 142 108 274 245 138 472 182
S1.2: moderate asymmetry, moderate leverage, heavy tailsPGAS-EIS 15 18 14 14 18 15 15 17 16PGAS 58 84 51 66 98 144 43 160 38MM-MH 166 268 172 47 157 110 69 631 343
S1.3: excess asymmetry, excess leverage, light tailsPGAS-EIS 21 18 18 15 17 27 15 22 13PGAS 92 104 79 45 236 132 67 121 41MM-MH 107 308 233 55 340 175 768 382 1042
S1.4: moderate asymmetry, moderate leverage, light tailsPGAS-EIS 24 18 18 12 19 23 34 21 28PGAS 45 66 30 45 83 101 156 180 70MM-MH 146 169 106 112 223 197 653 470 844
Each column shows the maximum inefficiency factor for the last-period factors, their log variances and the IG mix-
ing variables across 100 repetitions. Each panel corresponds to one DGP with varying asymmetry, leverage effect
and heavy-tailedness, estimated using different sampling scheme.
Table 4: Execution time of different sampling schemes
S1.1 S1.2 S1.3 S1.4 S2.1 S2.2 S2.3 S2.9
Approaches for sampling ft, ht, Wt, and associated parametersPGAS-EIS 8-12 6-13 8-16 7-21PGAS 18-33 21-38 16-41 17-35MM-MH 3-9 4-9 5-11 5-13
Approaches for sampling Λij, for i = 1, ..., 50 and j = 1, 2, 3ASIS 3-11 5-11 6-13 12-35RB 4-9 6-11 11-23 42-371
Reported are the 10-th and 90-th percentile computation time, in seconds, for 1,000 MCMC runs across 100 rep-
etitions. Each column corresponds to a different DGP. The first panel compares the sampling schemes for latent
processes and their associated parameters, while using ASIS for factor loadings. The second panel compares the
sampling scheme for factor loadings while using PGAS-EIS for latent processes.
17
Table 5: Number of particles and per particle execution time
S1.1 S1.2 S1.3 S1.4
no.p p.p.t no.p p.p.t no.p p.p.t no.p p.p.t
PGAS-EIS 15 0.63 13 0.54 47 0.37 22 0.37PGAS 677 0.11 465 0.07 939 0.08 720 0.06
The table reports the maximum optimal number of particles (no.p) and per particle execution time (p.p.t, in sec-
onds) needed to complete one iteration of particle Gibbs for propagating (hi,t,W1,t) across all i = 1, ..., 53 and all
MCMC runs. Each two columns corresponds to the no.p and p.p.t for the generated dataset that has the maximum
inefficiency factor for f1,2000 under one DGP.
et al., 2012) and per particle execution time (p.p.t) used for one iteration in the sampling of
(hi,t,Wi,t)′, across all i, t and repetitions. Due to the construction of EIS proposal density, p.p.t
of PGAS-EIS almost quintuples that of PGAS, under S1.1 and S1.2 in particular. However, EIS
manages to reduce no.p needed by more than 25 times, contributing to the reduced total time
cost for PGAS-EIS.
Number of resamplings is another concern in particle methods. To avoid degeneration, resam-
pling has to take place when the particle effective sample size (PESS) of the particle system is
smaller than a threshold, say, 75% of the total number of particles. However, over-resampling
leads to increasing number of function evaluations and loss of efficiency due to information im-
poverishment (Chib, 2001). Figure 2 shows resamplings triggered by PESS falling below the
75% threshold for PGAS-EIS and PGAS. With ancestor sampling, PGAS avoids complete de-
generation, yet it resamples frequently so the particle system does not get stuck locally. In
comparison, because EIS anchors the evolution of the particle system, PGAS-EIS shows a linear
decline of PESS which calls for only 4 or 5 resamplings when T = 2000, greatly reducing function
evaluations and preventing information impoverishment.
To measure efficiency with computation time taken into account, we follow Hosszejni and
Kastner (2019) and define effective sampling rate (ESR) as the effective sample size (ESS) per
second of runtime. ESS of a parameter is defined as the total number of MCMC runs divided
by its IF. It is thus the number of independent draws based on which one can reach the same
inferential accuracy as the whole posterior sample. Figure 3 is the ESR counterpart to Figure
1. PGAS-EIS shows multifold improvement in ESR compared to PGAS and MM-MH, with the
latter two showing surprisingly comparable performances. The smallest improvement amounts
18
0 1000 20000.7
0.8
0.9
1S1.3
0 1000 2000
0.2
0.4
0.6
0.8
0 1000 20000.7
0.8
0.9
1
PG
AS
-EIS
S1.1
0 1000 2000
0.2
0.4
0.6
0.8
PG
AS
0 1000 20000.7
0.8
0.9
1S1.4
0 1000 2000
0.2
0.4
0.6
0.8
0 1000 20000.7
0.8
0.9
1S1.2
0 1000 2000
0.2
0.4
0.6
0.8
Figure 2: Particle effective sample size and resampling frequency of PGAS-EIS and PGAS. Re-
sults are from the last MCMC run of the 1st repetition. Each column indicates a DGP, and the top
and the bottom row indicate results from PGAS-EIS and PGAS, respectively.
to some threefold increase in ESR of βi and ρi under S1.1, meaning that PGAS-EIS delivers the
same inferential accuracy as PGAS or MM-MH but with a third of their runtime.
0
50
100
150
S1.
1
0
100
200
0
100
200
0
20
40
0
20
40
60
0
10
20
0
50
100
150
S1.
2
0
100
200
0
50
100
150
0
50
100
0
20
40
60
0
2
4
6
0
100
200
S1.
3
0
100
200
0
50
100
150
0
50
100
0
100
200
0
5
10
15
0
50
100
S1.
4
0
50
100
0
20
40
60
0
20
40
60
80
0
20
40
60
80
0
10
20
Figure 3: Effective sampling rate of latent process parameters estimated by PGAS-EIS, PGASand MM-MH. Each column indicates a parameter of interest, and each row indicates a DGP. Each of
the three boxplots in one graph shows the distribution of 5, 300 effective sampling rates of a particular
parameter from all 53 factor and idiosyncratic latent processes and across 100 repetitions, estimated,
from left to right, by PGAS-EIS, PGAS and MM-MH.
19
Table 6: Simulation 2 DGP values
Scenario S2.1 S2.2 S2.3 S2.4 S2.5 S2.6 S2.7 S2.8 S2.9
n 3 3 3 4 8 20 8 16 40N 10 20 50 10 20 50 10 20 50
Si.j denotes scenario j in Simulation i. n and N denote the number of factors and assets, respectively.
4.2 Simulation 2: ASIS v.s. RB
As discussed in Section 3.2, both RB and ASIS lead to efficiency gain for Λ, via breaking down
the dependence between factors and loadings (Chib et al., 2006; Kastner et al., 2017). It is of
interest to compare them under different model dimensions. In S2, 100 repetitions are generated,
each with varying n and N , while other parameters, except for βi = −0.5, ρi = −0.3 and ζi = 5
for all i = 1, ..., n + N , are kept at empirically plausible values as in S1. Table 6 summarises
DGPs. Notice S2.3 is the same as S1.2.
Table 7 reports the maximum IF, minimum acceptance rate involved in the acceptance-
rejection algorithm in both RB and ASIS, and the maximum duration of no update for Λk11, i.e.
the leading loading on the first factor as introduced in Section 2.2. Firstly, we notice that our
results show larger IFs when using ASIS than those found by Kastner et al. (2017) and Kastner
(2019). This is because in FSV models each interweaving swap updates both fi,t and hi,t for
i = 1, ..., n; however in FAV-LAH models, the interweaving cannot update Wi,t for i = 1, ..., n,
inevitably leaving correlated samples.
Secondly, based on IFs from S2.1, S2.2 and S2.4, RB seems to outperform ASIS when number
of elements in Λ is relatively small, with a rule-of-thumb nN < 60. The performance of RB
quickly deteriorates as nN increases, while ASIS manages to maintain functionality even when
the number of variables and factors increases. For example, IFs of ASIS seem to be independent
of factor dimensions because under S2.2 and S2.8 they are essentially the same, while that of
RB increases by 40 times.
In our experiment, we find this a numerical problem at large. Despite integrating out factor
innovations, RB depends on a nN -dimensional MH algorithm whose initial step is to numerically
optimise (8), whereas ASIS replies on simple Bayesian least squares with only n leading loadings
Λkii that need univariate MH treatment. This means that one additional factor (variable)
20
Table 7: Comparisons of efficiency between ASIS and RB approaches
Scenario S2.1 S2.2 S2.3 S2.4 S2.5 S2.6 S2.7 S2.8 S2.9
Maximum IFsASIS 36 27 41 25 33 52 28 25 42RB 14 34 272 18 320 664 73 1224 6021
Minimum acceptance rateASIS 71% 64% 82% 61% 70% 74% 86% 66% 74%RB 37% 33% 28% 31% 6% 0% 21% 4% 0%
Maximum duration of no updateASIS 24 31 28 44 35 24 48 25 30RB 67 82 720 46 883 2653 478 1805 9434
The table reports the maximum IF, the minimum acceptance rate and, the maximum duration of no update of Λk11,
over all MCMC runs across 100 repetitions under each DGP.
increases the dimension of MH sampler in RB by N (n), but it only adds one (zero) univariate
MH problem in RB. Furthermore, Λ appears in both the mean and covariance of the collapsed
likelihood (8), rendering the latter multimodal. As a result of dimensionality and multimodality,
RB loses functionality quickly with increasing nN . This can be seem from Table 7 that in some
repetitions of S2.5, S2.6, S2.8 and S2.9 the Markov chains of Λ generated by RB get completely
stuck. Figure 4 gives examples of the Markov chain generated by RB and ASIS. As nN increases
from 30 to 1000, the decaying speed of autocorrelation function delivered by RB changes from
faster than ASIS to nearly zero, while that of ASIS only shows negligible changes. Lastly, Table
4 shows that RB is also less time-consuming than ASIS when nN is small. However as nN
increases, the computational burden of numerical optimisation for constructing the proposal
density grows exponentially, making RB much less appealing than ASIS.
4.3 Simulation 3: Detection of leverage effect and asymmetry
S3 studies the effect of variable selection introduced in Section 3.1.3 on asymmetry and leverage
effects. Let βi and ρi respectively define factor (i.e systematic) asymmetry and leverage effect
parameters if i ≤ n; otherwise they define idiosyncratic ones. Table 8 summarises DGPs with
some, all or none of these features. Each DGP is repeated 100 times, with DGPs of other
parameters kept at their empirically plausible values as in S1 and S2.
For each DGP, we also consider different heavy-tailednesses because βi, ρi and ζi together
determine the leverage effects and asymmetry of zi,t, as discussed in Section 2.1. For all ρi and βi,
21
Figure 4: The Markov chains generated by RB and ASIS. From the leftmost column to the rightmost,
each graph shows the chain (in red) of Λk11 in the first repetition of S2.1, S2.7, S2.5 and S2.6, together
with estimated autocorrelation function (in black) indicated by the left y-axis.
i = 1, ..., n+N , and across all repetitions, Table 9 reports the minimum and maximum posterior
probability of ι, ι ∈ ρ, β, being zero while their DGPs are zero and non-zero, respectively; i.e.
mini P (ιi = 0|·) when ιGDPi = 0 and maxi P (ιi = 0|·) when ιGDPi 6= 0. The former (latter) is a
measure of under-shrinking (over-shrinking) because it indicates failure to select zero (non-zero)
asymmetry or leverage effects while they actually are. Interestingly, the upper panel of the
table shows that ρi tends to be under-shrunk when zi,t has heavier tails, whereas βi tends to be
under-shrunk when zi,t has lighter tails. This suggests that given non-zero asymmetry, heavy
tails dilute data information on ρi, in line with the discussion in Section 2.1. In contrast, lighter
Table 8: Simulation 3 DGP values
Scenario Description: for i ∈ 1, ..., n+N with n = 3 and N = 5
S3.1 If i is odd, ρi = 0 and βi = −0.5; and if i is even, ρi = −0.3 and βi = 0.That is, leverage effect and return asymmetry are present in some zi,t’s,where zt = (f ′t , u
′t)′.
S3.2 If i is odd, ρi = 0 and −0.3 otherwise; whereas βi = −0.5 for all i.S3.3 If i is odd, βk = 0 and −0.5 otherwise; whereas ρi = −0.3 for all i.S3.4 For all i, ρi = −0.3 and βi = −0.5.S3.5 For all i, ρi = βi = 0.
Subscripts are for i = 1, ..., n+N . Si.j denotes scenario j in Simulation i. Each DGP is simulated under three cases:
ζi = 5 (heavy tail), ζi = 10 (moderate tail) and ζi = 50 (light tail) for i ∈ 1, ..., n+N.
22
Table 9: Posterior zero probability of leverage effect and asymmetry
ζi = 5 ζi = 10 ζi = 30 ζi = 5 ζi = 10 ζi = 30mini P (ρi = 0|·) where ρDGPi = 0 mini P (βi = 0|·) where βDGPi = 0
S3.1 0.64 0.88 0.92 0.96 0.63 0.23S3.2 0.71 0.84 0.94 – – –S3.3 – – – 0.94 0.82 0.45S3.5 0.78 0.92 0.90 0.96 0.56 0.14
maxi P (ρi = 0|·) where ρDGPi 6= 0 maxi P (βi = 0|·) where βDGPi 6= 0S3.1 0.16 0.17 0.08 0.18 0.27 0.41S3.2 0.24 0.21 0.10 0.11 0.18 0.28S3.3 0.17 0.08 0.11 0.23 0.46 0.37S3.4 0.20 0.16 0.15 0.09 0.15 0.33
Reported are lowest or highest posterior probability of ρi and βi being zero under different DGP values of ρGDPi and
βGDPi , across 100 repetitions of different heavy-tailedness.
tails bring less variation in βiWi,t, an additive term in the mean-variance Gaussian mixture
representation of the GHSt-distributed innovations, rendering variable selection on βi’s more
ambiguous. The latter effect also occasionally causes zero βi’s to be selected as non-zeros, as
seen in the bottom panel. We however do not worry about this in our empirical study because
the sample size is larger than T = 2, 000; and even if asymmetry gets over-shrunk, it only
strengthens the finding that asymmetry is mostly a systematic phenomenon .
Lastly, we study the predictive performance among models with different number of factors,
following Kastner (2019). Using a HPC Linux cluster, we re-estimate the model with expanding
windows and accumulate log predictive likelihoods to compute Bayes factor. Figure 5 shows
the log predictive Bayes factor in favour of multi-factor models over the zero-factor model,
normalised to increase readability and comparability across different repetitions. Naturally,
most of the predictive likelihood gains from the 1-st factor, but under S3.4 with heavy tailed
zi,t’s, adding the 2nd and 3rd factors can also significantly improve the predictive performance
in some repetitions. Under all DGPs, the number of factors can easily be determined to be
three, with negligible changes in Bayes factor from additional factors.
23
Figure 5: Normalised log predictive Bayes factor in favour of multi-factor models over the zero-factor model. For all 100 repetitions, each graph in light colour illustrates the accumulated log
predictive Bayes factors that are normalised with respect to the likelihood gains from adding the
first factor, with their averages indicated by the black line. The predictive Bayes factor is based on
estimating the model with y1:1000 and accumulating the one-period-ahead log predictive likelihood for
y1001:2000.
5 Empirical study
In the empirical study, yt consists of 80 stock return series with sufficient trading history from
constituents in S&P100 index. Our sample period has a total of T = 4, 252 trading days from
23/06/2001 to 15/06/2018. Tickers are given in the supplementary appendix and return series
are ordered in yt alphabetically. In what follows, we summarise estimation results from a 4-
factor FAS-LAH model, discuss the effect of selection on leverage effects and asymmetry, and
conduct a forecasting exercise with competing models.
5.1 Estimation results
This section reports main estimation results of the FSV-LAH model. Figure 6 shows posterior
estimate of hi,t and Wi,t for i = 1, ..., 4, i.e factor log variances and their IG mixing components.
The peak of log variances in 2008 correspond to the GFC, but only h3,t responds to the dot-com
bubble in the early 2000s. Jumps observed in W1,t and W2,t suggest that they are risk factors
that facilitate systematic anomalies in asset returns, whereas the effect of W4,t is more muted.
However, we should interpret Wi,t’s with caution. As in (2), the IG mixing component appears
in the likelihood both additively and multiplicatively, and interacts with the SV process. As a
result, the role of Wi,t’s is much more involved than the additive jumps in Chib et al. (2006).
Figure 7 shows the posterior estimate of the loadings on the i-th factor for i = 1, ..., 4, i.e.
the iith column of Λ denoted by Λ·i, sorted in ascending order and excluding the loadings
24
Figure 6: Posterior estimate of factor log variances and the IG mixing components. The top four
figures show the 5-th percentile, the 95-th percentile and the median of the posterior distributions of
hi,t; the bottom four figures show the posterior median of Wi,t.
with identifying zero restrictions. The x-axis indicates the j-th return series in yt. Based on the
identification scheme in Section 2.2, almost all returns load on factors positively, with exceptions
observed in f3,t which has few negative loadings whose 90% credible intervals do not cover zero.
f4,t is mainly loaded by around 15 return series. Interestingly, most of these stocks belong to
the production and manufacturing sector.
48 80 18 61 49 51 45 4 22
0.1
0.15
0.2
35 66 56 51 27 22 45 61 80
0
0.5
1
35 66 56 27 51 61 22 9 67
0
0.5
1
69 80 40 26 47 64 78 29 79
0
5
10
Figure 7: Posterior estimate of factor loadings. Each figure shows the 5-th percentile, the 95-th percentile
and the median of loadings on fi,t, i.e. Λ·i, sorted in an ascending order.
As is shown in the simulation study, both ASIS and RB can efficiently sample Λ and ft,
but the actual performance depends on the dimensionality nN . Table 7 shows the column-
wise maximum IF and minimum ESR of factor loadings and the component-wise last-period
maximum IF and minimum ESR of factors, their log variances and their IG mixing processes.
Despite the similarity of efficiency in factor dynamics due to the use of PGAS-EIS, in our
25
Table 10: IF and ESR of loadings and factors from a 4-factor FAS-LAH model
Λ·1 Λ·2 Λ·3 Λ·4 fi,42524i=1 hi,42524i=1 Wi,42524i=1
Maximum inefficiency factorASIS 22 37 26 44 17 21 39RB 92 268 364 230 46 34 58
Minimum effective sampling rateASIS 2.84 1.69 2.40 1.42 3.67 2.98 1.60RB 0.45 0.16 0.11 0.18 0.91 1.22 0.71
The first four columns report the maximum IF and minimum ESR of elements in the four columns of Λ, respectively.
The last three columns show the last-period maximum IF and minimum ESR of factors, their log variance and their
IG mixing process.
sample nN = 320 makes RB less efficient than ASIS, as is seen by the multifold increase in IF
and decrease in ESR in factor loadings. RB suffers from the burden of numerically finding the
nN -dimensional and multi-modal optimum of the marginalised conditional posterior (8).
Figure 8 shows the posterior estimates of the leverage effect parameter ρi, the asymmetry
parameter βi and degrees of freedom ζi, sorted in ascending order across model components.
We use symbols to denote the parameters associated with factor dynamics. The fact that the
first two factors have quite heavy tails supports tail dependence, or stronger correlation during
market downturns as documented by Ang and Chen (2002) and Patton (2004) among others. Via
Bayesian variable selection, the second factor seems to capture most of the return asymmetry,
meaning that skewness is clearly a systematic phenomenon. Leverage effects are however mostly
idiosyncratic. The relative magnitudes of leverage effects captured by the fourth and the first
factors are much less pronounced than the factor return asymmetry. Notice that these results
are subject to identification restrictions, which we mitigate by aligning factor rotation with the
order-invariant principal components, as introduced in Section 2.2.
5.2 Leverage effect, asymmetry and heavy tails
The Bayesian variable selection used in FSV-LAH conveniently avoids model comparisons as
it determines zero leverage effect and asymmetry in a data-driven way. It is still insightful to
study the implications of having or not having them in a model. We consider three other model
specifications: (1) FSV does not allow for heavy tails, leverage effect or asymmetry; FSV-H
only allows for heavy tails in factor and idiosyncratic components; and (3) FSV-LAH* allows
26
1 2 3 4
0
10
20
30
40
2 1 4 3
-0.6
-0.4
-0.2
0
4 1 3 2
-0.4
-0.3
-0.2
-0.1
0
Figure 8: Posterior estimate of leverage effect, asymmetry and heavy-tailedness. Figure from left
to right shows the 5-th percentile, the 95-th percentile and the median of ρi, βi and ζi, respectively,
sorted in an ascending order for i = 1, ..., 84. Circle, triangle, nabla and diamond indicate those
parameters associated with the dynamics of f1,t, f2,t, f3,t and f4,t, respectively.
for all features but does not use variable selection to determine zero leverage effects and zero
asymmetry. FSV has been studied by Kastner et al. (2017) and Kastner (2019) among others.
FSV-H extends the model of Chib et al. (2006) by allowing for heavy-tailed factors.
Figure 9 shows the log predictive Bayes factors in favour of multi-factor models over a zero-
factor model under different model specifications. This exercise can be seen as a pseudo out-of-
sample density forecast competition because one-step-ahead predictive likelihoods are accumu-
lated when we construct the log predictive Bayes factor (Kastner, 2019). Increasing the number
of factors from zero to one largely improves the predictive power for all models, and FSV-LAH*
slightly outperforms FSV-LAH. The difference among Bayes factors starts to materialise once
the number of factors hits two. When models have more than three factors, FSV-H outperforms
FSV-LAH* but is dominated by FSV-LAH, suggesting that too much flexibility introduced via
leverage effects and asymmetry can harm forecasting due to overfitting, but a parsimonious
model can still benefit from a richer structure. FSV-LAH uses Bayesian variable selection to
automatically achieve the balance. We determine the number of factors based on the decreasing
marginal gain of adding more factors than four in FSV-LAH. Should FSV be our benchmark
model, we would choose three factors.
Figure 10 shows the cross-sectional average of posterior medians of communality, i.e. the
proportion of total variance explained by factor variance. Communalities of models are seen to
increase during turbulent times. The communality of FSV is uniformly lower than the other
three models, as it does not distinguish systematic anomalies from idiosyncratic volatilities. En-
riching FSV by incorporating heavy tails, FSV-H clearly increases communality, suggesting that
27
1 2 3 4 5 6 7 8 9 10
Number of factors
2.5
3
3.5
4
4.5
Pre
dict
ive
Bay
es fa
ctor
104
FSV-LAHFSV-HFSVFSV-LAH*
Figure 9: Log predictive Bayes factor in favour of multi-factor models over zero-factor model.The first half of the sample is used as the initial estimation period and the Bayes factor is obtained
by accumulating the one-step-ahead log predictive likelihood for the second half of the sample. Circle
with solid line, triangle with dash line, nabla with dash dotted line and square with dotted line indicate
the log predictive Bayes factor obtained from FSV-LAH, FSV-H, FSV and FSV-LAH*, respectively.
heavy-tailed factors capture more systematic movements than Gaussian factors. Communalities
obtained from FSV-LAH* and FSV-LAH are very similar, though at the beginning and the end
of the sample period factors in the former explain slightly more variation in yt. Despite that
FSV-LAH has many zero ρi’s and βi’s as seen in Figure 8, allowing them to be non-zeros does
not bring extra explanatory power from the factors.
2004 2006 2009 2012 2015 20170.2
0.4
0.6
0.8
Ave
rage
com
mun
ality FSV-LAH
FSV-HFSVFSV-LAH*
Figure 10: Cross-sectional average commonalities over time. The figure shows the posterior median
of communality, i.e. the proportion of total variance explained by factor variance, averaged over 80
stock returns. Solid line, dash dotted line, dash line and dotted line indicate FSV-LAH, FSV, FSV-H
and FSV-LAH*, respectively.
Figure 11 shows the posterior medians of correlation matrices from the four models using con-
tour maps. Pairwise correlation is sorted to increase comparability. We consider the correlation
matrices on 28/03/2006 when the Fed target rate increased to 4.75% right after Bernanke took
28
over the helm from Greenspan, on 22/09/2008 which was one week after Lehman Brothers filed
bankruptcy and on 17/12/2015 when the Fed raised its target rate for the first time after 2008.
An interesting observation is that on 28/03/2006 and 22/09/2008 when the volatility is low
and high, respectively, FSV-LAH*, though overfits, shows smaller correlation than FSV-LAH.
On 17/12/2015 which corresponds to the spike in communality after 2015 as seen in Figure 10,
correlation estimate from FSV-LAH* slightly surpasses FSV-LAH. The fact that the overfitting
FSV-LAH* does not always lead to larger correlation shades light on possible model misspeci-
fication. So by finding a balance between FSV-H and FSV-LAH*, FSV-LAH serves as a better
modelling framework.
Figure 11: Contour map of sorted posterior median of correlation matrices. From left to right
are posterior median of correlation matrices obtained from FSV-LAH, FSV, FSV-H and FSV-LAH*,
respectively. The top, middle and bottom panels are correlation matrices on 28/03/2006, 22/09/2008
and 17/12/2015, respectively. Pairwise correlations are sorted to increase readability.
5.3 Competing models and value-at-risk
We carry out a rolling window out-of-sample forecasting exercise with window size T = 2, 000
to assess the performance of value-at-risk (VaR) evaluation of our FSV-LAH model and the
following competing factor models: (1) FSV and FSV-H are introduced in the previous section;
29
(2) CNS (Chib et al., 2006) is a FSV model with heavy tails in the idiosyncratic component ut
and additive jumps that admits a factor structure; (3) FSV-FF is a FSV-LAH model with ft
replaced by three Fama-French (FF) factors (Fama and French, 1993); (4) FSV-FFE stacks FF
factors over yt in the FSV-LAH model and restrict Λ to be such that only (f∗i,t, y′t)′ loads on fi,t
with i = 1, .., 3 and the i-th FF factor f∗i,t; (5) CKL (Chan et al., 1999) is a FSV model with Λ, ft
and ht estimated nonparametrically using rolling windows; (6) DFMG (Santos and Moura, 2014)
is a dynamic factor multivariate GARCH model that uses FF factors with volatility extracted
from a GARCH model and time-varying loadings estimated by Kalman filter conditional on
GARCH volatility; (7) FCO (Oh and Patton, 2017) is a factor copula model with marginals
estimated from GARCH models and a copula implied by a static Gaussian 2-factor model; (8)
FCOt is the same as FCO but with GARCHt marginals.
For each s ∈ 2001, ..., 4252 − h, we simulate 104 h-step-ahead return vectors ys+h|s, to
construct the empirical predictive distribution of the portfolio Fs+h|s(y). The h-step-ahead VaR
at nominal level α is forecast by VaRs+h|s(α) = F−1s+h|s(α). For various FSV models and CNS,
ys+1, ..., ys+h are generated conditional on parameter vector θ simulated from the posterior. For
other models, we keep θ is at its maximum likelihood estimate and derive Fs+h|s(y) recursively.
We define a hit process such that Ih,s(α) = 1 if (1/N)∑N
i=1 yi,s+h|s < VaRs+h|s(α) and 0
otherwise. Well-behaved VaR forecasts should deliver a serially independent Ih,s(α), Ih,s(α) ⊥
Ih,s−1(α) in particular, and has correct unconditional coverage ratio, i.e. P (Is(h, α) = 1) =
E(Is(h, α)) = α (Lopez and Walter, 2001). Based on Ih,s(α) and hit rate (HR), or the sam-
ple average of Is(h, α), Christoffersen (1998) develops likelihood ratio tests – LRind for serial
independence, LRuc for unconditional coverage and LRcc for conditional coverage.
Table 11 summarises test results when h ∈ 1, 5 and α ∈ 0.01, 0.05. Not surprisingly, FSV-
LAH uniformly outperforms FSV. The result suggests that modelling leverage effects, return
asymmetry and heavy tails brings in forecasting power for financial returns. This can also
be seen if we compare FCOt which incorporates Student’s t marginals with FCO. However,
comparing both FSV-H and FSV-LAH* with FSV-LAH when h = 5 or α = 0.01 suggests that
Bayesian variable selection embedded in the latter model improves forecasting by balancing
parsimony and flexibility. Furthermore, compared with FSV-LAH, CNS performs equally well
when h = 1, but poorer when h = 5. To explain this, we notice that CNS uses additive jumps
30
Table 11: Coverage test for value-at-risk
α=0.05, h = 1 α=0.01, h = 1HR LRuc LRind LRcc HR LRuc LRind LRcc
FSV-LAH 0.052 0.24 0.48 0.39 0.011 0.56 0.22 0.40FSV 0.063 0.16 0.06* 0.05** 0.012 0.24 0.02** 0.03**FSV-H 0.054 0.67 0.13 0.29 0.014 0.13 0.01** 0.01**FSV-LAH* 0.049 0.54 0.05** 0.12 0.021 0.20 0.37 0.29CNS 0.055 0.22 0.18 0.19 0.016 0.13 0.08* 0.07*FSV-FF 0.034 0.02** 0.78 0.06* 0.006 0.00** 0.17 0.00**FSV-FFE 0.052 0.33 0.40 0.43 0.010 0.92 0.45 0.54CKL 0.072 0.00** 0.16 0.01** 0.022 0.09* 0.03** 0.02**DFMG 0.064 0.02** 0.57 0.06* 0.018 0.08* 0.16 0.08*FCO 0.059 0.07* 0.02** 0.01** 0.018 0.09* 0.00** 0.00**FCOt 0.055 0.09* 0.43 0.20 0.011 0.81 0.16 0.36
α=0.05, h = 5 α=0.01, h = 5
FSV-LAH 0.048 0.30 0.58 0.50 0.015 0.16 0.18 0.15FSV 0.064 0.05** 0.17 0.06* 0.018 0.08* 0.07* 0.04**FSV-H 0.062 0.09* 0.34 0.15 0.016 0.21 0.02** 0.03**FSV-LAH* 0.087 0.00** 0.72 0.01** 0.015 0.08* 0.70 0.03**CNS 0.059 0.32 0.10* 0.16 0.006 0.03** 0.51 0.07*FSV-FF 0.042 0.14 0.57 0.29 0.026 0.02** 0.05** 0.01**FSV-FFE 0.053 0.48 0.22 0.37 0.015 0.25 0.08* 0.11CKL 0.088 0.00** 0.34 0.02** 0.030 0.05** 0.18 0.06*DFMG 0.057 0.32 0.05** 0.09* 0.007 0.12 0.06** 0.06*FCO 0.061 0.07* 0.01** 0.01** 0.014 0.15 0.00* 0.01**FCOt 0.053 0.83 0.08* 0.21 0.013 0.45 0.04** 0.9*
The table shows the p-values of unconditional coverage test, independence test and conditional coverage test for an
equally weighted portfolio with 80 assets. α is the nominal level of VaR forecast daily (h = 1) or weekly (h = 5).
Rejections at 5% and 10% level are indicated by single and double asterisk, respectively.
31
for return anomalies in yt, whereas FSV-LAH features both additive and multiplicative swings
modelled by the IG mixing components, which also interact with the SV processes. Lastly, FSV-
FF, DFMG and CKL replace ft with observed FF factors. Though the former two are quite
flexible, the fact that they are outperformed by FSV-LAH and FSV-FFE may suggest that
signals in FF factors are eclipsed by their noises, leading to inaccuracy and misspecification.
Interestingly, FSV-LAH and FSV-FFE perform comparably well. This result indicates that FF
factors do not bring extra information in forecasting VaR in our sample, and how factors are
loaded is of secondary importance.
6 Conclusion and remarks
Advances in Monte Carlo methods such as the efficient importance sampling (EIS) algorithm
of Richard and Zhang (2007) and the particle Gibbs method of Andrieu et al. (2010) allow us
to design efficient MCMC algorithms for the estimation of flexible factor stochastic volatility
models in many dimensions. Our framework connects findings in empirical literature that doc-
uments commonality in the third moments of financial returns and higher pairwise correlations
during market downturns to the literature on multivariate financial time series that aims to
address the curse of dimensionality and overcome computational challenges towards the estima-
tion of increasingly more accurate models. Extensions of the current model include exploring
coupled systems of smaller subsets of yt within each set PGAS-EIS can deliver better posterior
approximation. This can be approached using the framework of simultaneous graphical dynamic
linear models proposed by Gruber et al. (2016). Also, it will be interesting to combine our
sampling algorithm with the order-invariant formulation of factor models in Chan et al. (2018)
and Kaufmann and Schumacher (2019) and study the effect order dependence has on forecast-
ing covariance matrix. Lastly, our empirical study reveals that some factors are loaded by only
a subset of assets. The sparse loading matrix of Kastner (2019) can be directly implemented
within our framework to maintain a more parsimonious model specification.
References
Aas, K. and I. H. Haff (2006). The generalized hyperbolic skew Student’s t-distribution. Journalof Financial Econometrics 4 (2), 275–309.
32
Aguilar, O. and M. West (2000). Bayesian dynamic factor models and portfolio allocation.Journal of Business & Economic Statistics 18 (3), 338–357.
Andrieu, C., A. Doucet, and R. Holenstein (2010). Particle markov chain monte carlo methods.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 (3), 269–342.
Andrieu, C. and G. Roberts (2009). The pseudo-marginal approach for efficient monte carlocomputations. The Annals of Statistics 37 (2), 697–725.
Ang, A. and J. Chen (2002). Asymmetric correlations of equity portfolios. Journal of financialEconomics 63 (3), 443–494.
Asai, M., M. McAleer, and J. Yu (2006). Multivariate stochastic volatility: a review. Econo-metric Reviews 25 (2-3), 145–175.
Bai, J. and P. Wang (2015). Identification and Bayesian estimation of dynamic factor models.Journal of Business & Economic Statistics 33 (2), 221–240.
Beine, M., A. Cosma, and R. Vermeulen (2010). The dark side of global integration: Increasingtail dependence. Journal of Banking & Finance 34 (1), 184–192.
Chan, J., R. Leon-Gonzalez, and R. W. Strachan (2018). Invariant inference and efficient com-putation in the static factor model. Journal of the American Statistical Association 113 (522),819–828.
Chan, L. K., J. Karceski, and J. Lakonishok (1999). On portfolio optimization: Forecastingcovariances and choosing the risk model. Review of Financial Studies 12 (5), 937–974.
Chib, S. (2001). Markov chain Monte Carlo methods: computation and inference. Handbook ofeconometrics 5, 3569–3649.
Chib, S., F. Nardari, and N. Shephard (2006). Analysis of high dimensional multivariate stochas-tic volatility models. Journal of Econometrics 134 (2), 341–371.
Chib, S., Y. Omori, and M. Asai (2009). Multivariate stochastic volatility. In Handbook ofFinancial Time Series, pp. 365–400. Springer.
Chopin, N., S. S. Singh, et al. (2013). On the particle Gibbs sampler. CREST.
Christoffersen, P. F. (1998). Evaluating interval forecasts. International economic review , 841–862.
Clyde, M. and E. I. George (2004). Model uncertainty. Statistical science, 81–94.
Dang, D.-M., K. R. Jackson, and M. Mohammadi (2015). Dimension and variance reductionfor Monte Carlo methods for high-dimensional models in finance. Applied Mathematical Fi-nance 22 (6), 522–552.
Doornik, J. A. (2007). Object-Oriented Matrix Programming Using Ox, 3rd ed. London: Tim-berlake Consultants Press.
Doucet, A., N. De Freitas, and N. Gordon (2001). An introduction to sequential Monte Carlomethods. In Sequential Monte Carlo methods in practice, pp. 3–14. Springer.
Durbin, J. and S. J. Koopman (1997). Monte Carlo maximum likelihood estimation for non-gaussian state space models. Biometrika 84 (3), 669–684.
Fama, E. F. and K. R. French (1993). Common risk factors in the returns on stocks and bonds.Journal of financial economics 33 (1), 3–56.
Geweke, J. and H. Tanizaki (2001). Bayesian estimation of state-space models using theMetropolis–Hastings algorithm within Gibbs sampling. Computational Statistics & Data Anal-ysis 37 (2), 151–170.
Gilks, W. R., N. Best, and K. Tan (1995). Adaptive rejection Metropolis sampling within Gibbssampling. Applied Statistics, 455–472.
Grothe, O., T. S. Kleppe, and R. Liesenfeld (2017). The gibbs sampler with particle efficient
33
importance sampling for state-space models.
Gruber, L., M. West, et al. (2016). GPU-accelerated Bayesian learning and forecasting insimultaneous graphical dynamic linear models. Bayesian Analysis 11 (1), 125–149.
Gruber, L., M. West, et al. (2017). Bayesian forecasting and portfolio decisions using simulta-neous graphical dynamic linear models. Econometrics and Statistics 3, 3–22.
Hosszejni, D. and G. Kastner (2019). Approaches toward the Bayesian estimation of the stochas-tic volatility model with leverage. Bayesian Statistics and New Generations, 75.
Jacquier, E., N. G. Polson, and P. E. Rossi (2004). Bayesian analysis of stochastic volatilitymodels with fat-tails and correlated errors. Journal of Econometrics 122 (1), 185–212.
Jung, R. C. and R. Liesenfeld (2001). Estimating time series models for count data using efficientimportance sampling. AStA Advances in Statistical Analysis 4 (85), 387–407.
Kastner, G. (2019). Sparse Bayesian time-varying covariance estimation in many dimensions.Journal of Econometrics 210 (1), 98–115.
Kastner, G. and S. Fruhwirth-Schnatter (2014). Ancillarity-sufficiency interweaving strategy(ASIS) for boosting MCMC estimation of stochastic volatility models. Computational Statis-tics & Data Analysis 76, 408–423.
Kastner, G., S. Fruhwirth-Schnatter, and H. F. Lopes (2017). Efficient Bayesian inferencefor multivariate factor stochastic volatility models. Journal of Computational and GraphicalStatistics 26 (4), 905–917.
Kaufmann, S. and C. Schumacher (2019). Bayesian estimation of sparse dynamic factor modelswith order-independent and ex-post mode identification. Journal of Econometrics 210 (1),116–134.
Kim, S., N. Shephard, and S. Chib (1998). Stochastic volatility: likelihood inference and com-parison with ARCH models. The Review of Economic Studies 65 (3), 361–393.
Koop, G., D. J. Poirier, and J. L. Tobias (2007). Bayesian econometric methods. CambridgeUniversity Press.
Koopman, S. J. and E. Hol Uspensky (2002). The stochastic volatility in mean model: empiricalevidence from international stock markets. Journal of applied Econometrics 17 (6), 667–689.
Koopman, S. J., N. Shephard, and J. A. Doornik (2008). SsfPack 3.0: Statistical algorithms formodels in state space form. London: Timberlake Consultants Press.
Kurose, Y. and Y. Omori (2016). Dynamic equicorrelation stochastic volatility. ComputationalStatistics & Data Analysis 100, 795–813.
Lindsten, F., M. I. Jordan, and T. B. Schon (2014). Particle gibbs with ancestor sampling.Journal of Machine Learning Research 15 (1), 2145–2184.
Lopez, J. A. and C. A. Walter (2001). Evaluating covariance matrix forecasts in a value-at-riskframework. The Journal of Risk 3, 69–97.
Nakajima, J. (2017). Bayesian analysis of multivariate stochastic volatility with skew returndistribution. Econometric Reviews 36 (5), 546–562.
Nakajima, J. and Y. Omori (2012). Stochastic volatility model with leverage and asymmetricallyheavy-tailed error using GH skew Student’s t-distribution. Computational Statistics & DataAnalysis 56 (11), 3690–3704.
Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econo-metrica: Journal of the Econometric Society , 347–370.
Oh, D. H. and A. J. Patton (2017). Modeling dependence in high dimensions with factor copulas.Journal of Business & Economic Statistics 35 (1), 139–154.
Patton, A. J. (2004). On the out-of-sample importance of skewness and asymmetric dependencefor asset allocation. Journal of Financial Econometrics 2 (1), 130–168.
34
Pitt, M. and N. Shephard (1999a). Time varying covariances: a factor stochastic volatilityapproach. Bayesian statistics 6, 547–570.
Pitt, M. K., R. dos Santos Silva, P. Giordani, and R. Kohn (2012). On some properties ofMarkov chain Monte Carlo simulation methods based on the particle filter. Journal of Econo-metrics 171 (2), 134–151.
Pitt, M. K. and N. Shephard (1999b). Filtering via simulation: Auxiliary particle filters. Journalof the American statistical association 94 (446), 590–599.
Richard, J.-F. and W. Zhang (2007). Efficient high-dimensional importance sampling. Journalof Econometrics 141 (2), 1385–1411.
Santos, A. A. and G. V. Moura (2014). Dynamic factor multivariate garch model. ComputationalStatistics & Data Analysis 76, 606–617.
Scharth, M. and R. Kohn (2016). Particle efficient importance sampling. Journal of Economet-rics 190 (1), 133–147.
Shephard, N. and M. K. Pitt (1997). Likelihood analysis of non-gaussian measurement timeseries. Biometrika 84 (3), 653–667.
Snyder, C., T. Bengtsson, P. Bickel, and J. Anderson (2008). Obstacles to high-dimensionalparticle filtering. Monthly Weather Review 136 (12), 4629–4640.
Watanabe, T. and Y. Omori (2004). A multi-move sampler for estimating non-gaussian timeseries models: Comments on shephard & pitt (1997). Biometrika, 246–248.
Yu, J. (2005). On leverage in a stochastic volatility model. Journal of Econometrics 127 (2),165–178.
Yu, Y. and X.-L. Meng (2011). To center or not to center: That is not the question—Anancillarity–sufficiency interweaving strategy (ASIS) for boosting MCMC efficiency. Journalof Computational and Graphical Statistics 20 (3), 531–570.
35