13
Computational Statistics and Data Analysis 55 (2011) 248–260 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda The hierarchical-likelihood approach to autoregressive stochastic volatility models Woojoo Lee a , Johan Lim a , Youngjo Lee a,* , Joan del Castillo b a Department of Statistics, Seoul National University, Seoul 151-747, Republic of Korea b Universitat Autonoma de Barcelona, Barcelona 08192, Spain article info Article history: Received 18 March 2009 Received in revised form 28 February 2010 Accepted 20 April 2010 Available online 27 April 2010 Keywords: Autoregressive stochastic volatility model Hierarchical generalized linear model Hierarchical likelihood Sparse matrix computation Prediction abstract Many volatility models used in financial research belong to a class of hierarchical generalized linear models with random effects in the dispersion. Therefore, the hierarchical-likelihood (h-likelihood) approach can be used. However, the dimension of the Hessian matrix is often large, so techniques of sparse matrix computation are useful to speed up the procedure of computing the inverse matrix. Using numerical studies we show that the h-likelihood approach gives better long-term prediction for volatility than the existing MCMC method, while the MCMC method gives better short-term prediction. We show that the h-likelihood approach gives comparable estimations of fixed parameters to those of existing methods. © 2010 Elsevier B.V. All rights reserved. 1. Introduction The basic model by Black and Scholes (1973) for pricing and hedging options and other derivative securities can be improved when the volatility is specified to follow some latent stochastic process. This leads to a diffusion process with a stochastically varying volatility parameter, initially introduced by Hull and White (1987), Scott (1987) and Wiggins (1987). These articles suggest a geometric Ornstein–Uhlenbeck (OU) process for dealing with volatility. Stein and Stein (1991) derive a closed-form solution for the distribution of stock prices when the volatility parameter of the diffusion process is driven by an arithmetic OU process. Heston (1993) extends the model allowing arbitrary correlation between volatility and spot-asset returns. The closed-form solutions provided by these affine models made popular articles for practitioners. In research into empirical derivatives, unspecified parameters for the diffusion process have to be estimated from the observed data, which is generally available at fixed points in time. The stochastic Euler scheme, often used for simulation purposes, provides a straight discrete time approximation for diffusion processes. The geometric OU assumption leads to the autoregressive stochastic volatility (ARSV) model introduced by Taylor in 1986: see also Taylor (2005). The estimation procedures for this model have been considered extensively in the literature. Recently, Castillo and Lee (2008) showed that many volatility models in which volatilities over time are independent of each other can be viewed as a generalized linear model (GLM) with random effects in the dispersion. They showed that the h-likelihood approach provides an efficient procedure to estimate SV models with independent random effects. In this paper we extend their results to SV models with autoregressive random effects. In the ARSV models it is hard to implement the h-likelihood method proposed by Lee and Nelder (1996, 2006) because it involves high-dimensional matrix inversion. Using the banded matrix structure in the Hessian matrix, we show in the Appendix how the sparse matrix technique can be used to give an algorithm for inferences. To estimate the ARSV model, several methods, both Bayesian and non-Bayesian (likelihood based), are proposed. The Bayesian approach to estimating the model was first proposed by Jacquier et al. (1994). However, Harvey et al. (1994) * Corresponding author. Tel.: +82 2 880 6568; fax: +82 2 883 6144. E-mail addresses: [email protected], [email protected] (Y. Lee). 0167-9473/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2010.04.014

The hierarchical-likelihood approach to autoregressive stochastic volatility models

Embed Size (px)

Citation preview

Page 1: The hierarchical-likelihood approach to autoregressive stochastic volatility models

Computational Statistics and Data Analysis 55 (2011) 248–260

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis

journal homepage: www.elsevier.com/locate/csda

The hierarchical-likelihood approach to autoregressive stochasticvolatility modelsWoojoo Lee a, Johan Lim a, Youngjo Lee a,∗, Joan del Castillo ba Department of Statistics, Seoul National University, Seoul 151-747, Republic of Koreab Universitat Autonoma de Barcelona, Barcelona 08192, Spain

a r t i c l e i n f o

Article history:Received 18 March 2009Received in revised form 28 February 2010Accepted 20 April 2010Available online 27 April 2010

Keywords:Autoregressive stochastic volatility modelHierarchical generalized linear modelHierarchical likelihoodSparse matrix computationPrediction

a b s t r a c t

Many volatility models used in financial research belong to a class of hierarchicalgeneralized linear models with random effects in the dispersion. Therefore, thehierarchical-likelihood (h-likelihood) approach can be used. However, the dimension ofthe Hessian matrix is often large, so techniques of sparse matrix computation are usefulto speed up the procedure of computing the inverse matrix. Using numerical studies weshow that the h-likelihood approach gives better long-term prediction for volatility thanthe existing MCMC method, while the MCMC method gives better short-term prediction.We show that the h-likelihood approach gives comparable estimations of fixed parametersto those of existing methods.

© 2010 Elsevier B.V. All rights reserved.

1. Introduction

The basic model by Black and Scholes (1973) for pricing and hedging options and other derivative securities can beimproved when the volatility is specified to follow some latent stochastic process. This leads to a diffusion process with astochastically varying volatility parameter, initially introduced by Hull and White (1987), Scott (1987) and Wiggins (1987).These articles suggest a geometric Ornstein–Uhlenbeck (OU) process for dealingwith volatility. Stein and Stein (1991) derivea closed-form solution for the distribution of stock prices when the volatility parameter of the diffusion process is driven byan arithmetic OU process. Heston (1993) extends themodel allowing arbitrary correlation between volatility and spot-assetreturns. The closed-form solutions provided by these affine models made popular articles for practitioners.In research into empirical derivatives, unspecified parameters for the diffusion process have to be estimated from the

observed data, which is generally available at fixed points in time. The stochastic Euler scheme, often used for simulationpurposes, provides a straight discrete time approximation for diffusion processes. The geometric OU assumption leads tothe autoregressive stochastic volatility (ARSV) model introduced by Taylor in 1986: see also Taylor (2005). The estimationprocedures for this model have been considered extensively in the literature. Recently, Castillo and Lee (2008) showedthat many volatility models in which volatilities over time are independent of each other can be viewed as a generalizedlinear model (GLM)with random effects in the dispersion. They showed that the h-likelihood approach provides an efficientprocedure to estimate SV models with independent random effects. In this paper we extend their results to SV modelswith autoregressive random effects. In the ARSV models it is hard to implement the h-likelihood method proposed by Leeand Nelder (1996, 2006) because it involves high-dimensional matrix inversion. Using the banded matrix structure in theHessian matrix, we show in the Appendix how the sparse matrix technique can be used to give an algorithm for inferences.To estimate the ARSV model, several methods, both Bayesian and non-Bayesian (likelihood based), are proposed. The

Bayesian approach to estimating the model was first proposed by Jacquier et al. (1994). However, Harvey et al. (1994)

∗ Corresponding author. Tel.: +82 2 880 6568; fax: +82 2 883 6144.E-mail addresses: [email protected], [email protected] (Y. Lee).

0167-9473/$ – see front matter© 2010 Elsevier B.V. All rights reserved.doi:10.1016/j.csda.2010.04.014

Page 2: The hierarchical-likelihood approach to autoregressive stochastic volatility models

W. Lee et al. / Computational Statistics and Data Analysis 55 (2011) 248–260 249

and Ruiz (1994) suggested a quasi-maximum likelihood (QML) estimator based on the Kalman filter. Meyer et al. (2003)and Shimada and Tsukuda (2005) employ approximated linear filteringmethods based on Laplace approximation to producea maximum likelihood estimator. Joe et al. (submitted for publication) use composite likelihood methods based on low-dimensionalmargins. Sandmann and Koopman (1998) propose the use of theMonte Carlo likelihood approach and Kim et al.(1998) propose approximation of the likelihood function usingMarkov ChainMonte Carlo (MCMC)methods, while Fridmanand Harris (1998) use numerical integration. All these approaches have their strengths and weaknesses. The QML methodsare fast but not accurate, whereas approximate maximum likelihood methods may be accurate but are computationallydemanding. In the ARSV models the prediction of volatility is often of interest. However, the volatility cannot be observeddirectly, such that classical likelihood theory has to be extended to allow for prediction of volatilities: see Lee et al. (2006).To use the Kalman filter algorithm, a crucial feature is the formulation of an ARSV model in a linear state space form. Inthe QML approach, the Kalman filter algorithm is used, assuming normality for volatility. We show that the h-likelihoodapproach gives a better prediction of volatility than the QML method by using the true distribution of volatility.This paper extends the idea suggested by Castillo and Lee (2008) of viewing the ARSV models as the GLM with varying

random effects, and applying the h-likelihood method to estimate it. Section 2 introduces the ARSV models considered inthis paper. In Section 3, we study the h-likelihood estimation (HLE)method to estimate the ARSVmodel and compare it withother existing methods. In Section 4, we show how the HLE method can be extended to various models. In Section 5, westudy the h-likelihood prediction procedure for volatility. In Section 6, we explain how to improve Laplace approximationwhen the accuracy of the approximation is poor. In Section 7, we numerically compare the proposed HLE method to someexisting methods, showing that our method provides comparable results to those of other existing methods. We also showthat the h-likelihood method gives a better long-term prediction for future volatility than Bayesian MCMC, followed byconcluding remarks in Section 8.

2. The autoregressive stochastic volatility models

Let st be the price of an asset and rt = log(st/st−1) be the return of the asset at time t , respectively. The standardformulation of the returns is

rt = µ+ σtεt , (1)

where µ is the mean return, εt are independent and identically distributed (i.i.d.) random errors from the standardnormal distribution, N(0, 1) (Gaussian white noise), and the volatilities {σt} follow a positive stationary stochastic process,stochastically independent of {εt}. From these assumptions the distribution of returns is a mixture of normal distributionswith higher kurtosis than that of the normal distribution, and their autocorrelations are zero at all positive lags (Taylor,2005).Volatility cannot be observed directly because it is a latent variable that is not traded. Hence, the return process in (1)

has two sources of noise and has to be considered a random-effect model. In the hierarchical GLM approach, first introducedby Lee and Nelder (1996), random effects appear linearly in the linear predictor bt = u (σt),

bt = γ + φbt−1 + wt . (2)

Several different link functions u (·) can be considered.SV models involve specifying a stochastic model for volatility. A large class of parametric Lévy models is obtained by

assuming

bt = log(σ 2t)= γ + wt ,

where exp (wt) is distributed as a generalized inverse Gaussian distribution; see Castillo and Lee (2008).The ARSV model often assumes the logarithm function as the link function and AR(1) serial correlation for volatilities:

log σ 2t = γ + φ log σ2t−1 + wt (3)

and wt is white Gaussian noise with variance σ 2w . The logarithm scale allows an alternative view of this model as a linearstate space (non-Gaussian) model since, from (1),

log (rt − µ)2 = log σ 2t + log ε2t (4)

and random effects bt = log(σ 2t)appear in the same scale in (3) and (4). This structure allows use of the (linear) Kalman

filter in the estimation procedures, as in the QML approach of Harvey et al. (1994) and the Monte Carlo likelihood estimator(MCL) from Sandmann and Koopman (1998). However, the Kalman filter algorithm cannot be applied to non-logarithm linkfunctions.Several link functions u (·) also appear in financial models for pricing and hedging derivative securities. The stochastic

Euler scheme provides discrete time approximations for diffusion processes used by Stein and Stein (1991) and Heston(1993) with the identity function as the link function σt = γ + φσt−1 + wt . In the same way, in order to avoid negativevalues, the square functionmay be appropriate, giving σ 2t = γ+φ σ

2t−1+wt . The estimationmethod developed in this paper

can be applied to any link functions considered in (2). In particular, it can be used in the same way for identity, logarithmand square functions with AR(1) serial correlation.

Page 3: The hierarchical-likelihood approach to autoregressive stochastic volatility models

250 W. Lee et al. / Computational Statistics and Data Analysis 55 (2011) 248–260

3. The h-likelihood procedure

Throughout this paper fθ () denotes probability functions with fixed unknown parameters θ . This section introduces theh-likelihood approach for estimating the ARSVmodel, given by (1) and (3). Below, we assume that yt = (rt − µ), whereµ isestimated by mean r or, for simplicity, µ = 0 following Jacquier et al. (1994) and Kim et al. (1998). Let y = (y1, y2, . . . , yn)be the observed returns, for a sample of size n, and b = (b1, b2, . . . , bn) be the unobserved log of volatilities (randomeffects).Let α = (γ , φ, σw) be the parameter vector. For the hierarchical GLM, Lee and Nelder (1996) and Lee and Nelder (2006)introduce the h-likelihood

h = h (b, α) =n∑t=1

log f (yt |bt)+ log fα(b1, b2, . . . , bn)

= −12

n∑t=1

{y2t exp(−bt)+ bt +1σ 2w(bt − γ − φ bt−1)2 + log σ 2w}. (5)

In the ARSV models, aside from the fixed parameters, the volatilities cannot be observed directly and therefore haveto be estimated. In the state space linear models with normal bt , the volatilities are the states of the system that canbe predicted with the predictive density provided by the Kalman filter. This is the QML approach, but volatility bt isnot normal. In the h-likelihood approach the volatilities (random effects) are estimated by solving the score equation∂h/∂b = (∂h/∂b1, ∂h/∂b2, . . . , ∂h/∂bn)′ = 0, using the Fisher scoring method. The first derivatives of h = h (b, α) withrespect to volatilities are given by

ht =∂h∂bt=12

(y2t exp(−bt)− 1

)−1σ 2w

(bt − γ − φbt−1)+φ

σ 2w(bt+1 − γ − φbt) , (6)

for t ≤ n− 1, and

hn = ∂h/∂bn =(y2n exp(−bn)− 1

)/2− (bn − γ − φbn−1) /σ 2w. (7)

To be specific, the estimate of b at step k, b(k), is updated until the algorithm converges by

b(k+1) = b(k) + H−1 (∂h/∂b)|b=b(k) , (8)

where H is a banded matrix whose (s, t)th element, hst = −∂2h/∂bt∂bs, is

hst =

(1/2)y2t exp(−bt)+ (1/σ

2w)(1+ φ

2) if t = s−(1/σ 2w)φ if |t − s| = 10 if |t − s| > 1.

(9)

for any s ≤ n−1 and t ≤ n−1, and hnn = (1/2)yn exp(−bn)+ (1/σ 2w). To start up the process, the initial state is often fixedat the unconditional expectation of b (Pollock, 2003), and for a large n, the results will be dominated by the informationfrom the data (Wei, 2006). Here we plug the average of b into the values of b0 which are needed to evaluate ∂h/∂b1. Theseresulting estimators b correspond to the smoothing estimators of volatilities in the Kalman filter method. Here n is oftenlarge, so direct inversion H−1 is numerically too slow, and therefore we use the banded structure of H for fast computationin the Appendix. The inverse of the informationmatrix from the h-likelihood gives the standard error estimators for randomeffects. Because there are no fixed parameters for the mean, the standard errors for bi − bi can be obtained from

√(H−1)ii;

see Lee and Ha (in press) for more detailed discussion.To estimate the fixed parameter α, Lee and Nelder (1996) propose the use of the adjusted profile h-likelihood,

hP (α) = h− (1/2) log{det(H/2π)

}∣∣b=b

= h(b(α), α)−12log{det(H (b(α), α)/2π)

}, (10)

where b is the solution to ∂h/∂b = 0. The standard errors for the fixed parameter can be obtained from the inverse of theinformation matrix from hP(α).In summary, the h-likelihoodmethod estimates (α, b) as follows: (i) given the current estimate of α, estimate the random

effects b bymaximizing the h-likelihood (5), (ii) given the estimate of b update the estimate of α bymaximizing the adjustedprofile h-likelihood (10). Here, the log-likelihood m = log

∫exp(h)db for α can be obtained by integration. In the ARSV

models there is no explicit form for m, so an approximation has to be used to compute the maximum likelihood (ML)estimators for α. Here hP (α) is the Laplace approximation to m. In our approach we need to compute the matrix inversionH−1. Because H is a large-scale banded matrix whose (s, t)th element hst = 0 if |t − s| > 1, we employ some sparse matrixcomputation techniques to speed up the computation of the inverse matrix. In the Appendix, detailed computation termsfor the score equation and Hessian matrix are reported.

Page 4: The hierarchical-likelihood approach to autoregressive stochastic volatility models

W. Lee et al. / Computational Statistics and Data Analysis 55 (2011) 248–260 251

Shimada and Tsukuda (2005) and Meyer et al. (2003) also propose use of the Laplace approximation to marginalizelikelihood m successively. Even though their proposed method uses the Laplace approximation, it is not the same asthe h-likelihood method. Let y(t) = {y1, . . . , yt}. They use the second-order Taylor expansion around the mode off (yt |bt)f (bt |y(t−1)) as well as additional normal approximation to the conditional distribution of bt on y(t−1) to applythe Kalman filter. In the h-likelihood procedure, however, we do not use such additional normal approximation to theconditional distribution. Furthermore, when computing smoothing estimates of volatility, they do need an additionalapproximate smoothing step. In the h-likelihood method, the smoothing estimates of volatility are the mode estimatesdescribed above, and, hence, we do not need such an additional step.Skaug and Fournier (2006) proposed the use of automatic differentiation (AD) to derive the first-order and second-order

derivatives for the Laplace approximation. However, their implementation is not freely available, so we do not make acomparison. However, we derive an explicit form and solve them analytically. Thus, our method can be easily extended toallow a restricted maximum likelihood procedure (Lee et al., 2006).Joe et al. (submitted for publication) propose the use of composite likelihoodmethods based on the sumof log-likelihoods

of low-dimensional margins. They have proved that the composite likelihood estimators for the ARSV model are consistentand asymptotically normal as the length of the time series increases to infinity. The bivariate composite likelihood withindex s (BCL(s)), including all bivariate marginal pairs of lag s and smaller, is

mBCL(s) =n−s∑j=1

s∑`=1

log f (yj, yj+`).

They also consider trivariate composite likelihood. From their simulation studies, BCL(4) is the best among the compositelikelihood methods. Although this method avoids high-dimensional integration, it cannot be used for predicting volatility.Sandmann and Koopman (1998); Kim et al. (1998) propose approximating the likelihood function using Markov Chain

Monte Carlo (MCMC) methods. Fridman and Harris (1998) developed a recursive numerical integration procedure thatdirectly calculates the marginal likelihood. We use the Laplace approximation. Performances of these approximate MLestimators are reported numerically in the following section. Fridman and Harris (1998) show that their method performsbetter than the Bayesian MCMC method in various simulated settings. However, their method depends on an a priori andsubjectively chosen grid thatmay not be optimal. Amajor drawback of their approach is that it cannot be used for predictionof volatility. Using numerical studies we show that the h-likelihood approach gives comparable fixed-parameter estimatorsto Fridman and Harris’s (1998) method.

4. Extensions

In this section, we show how the h-likelihood method for the standard ARSV model can be extended to models withheavy-tailed error distributions, leverage effect models and multivariate SV models.SV models with heavy-tailed distribution have been proposed to deal with many financial time series allowing larger

kurtosis than that by the standard ARSV models with normal error distribution. Lee and Nelder (2006) showed that sucha heavy-tailed distribution for εt can be generated by introducing random effects in the dispersion of εt . For example, ifεt |ut ∼ N(0, exp(ut)) and exp(ut) ∼ k/χ2k , then the marginal distribution of εt becomes Student’s t-distribution withdegree of freedom k. Here the h-likelihood for the ARSV model with t-distribution is

h = h (b, u, α) =n∑t=1

log f (yt |bt , ut)+ log fα(b1, b2, . . . , bn)+ log fα(u1, u2, . . . , un)

=

n∑t=1

[−12{y2t exp(−bt − ut)+ bt + ut +

1σ 2w(bt − γ − φ bt−1)2 + log σ 2w}

+(−k/2− 1)ut − k/(2 exp(ut))+ ut − logΓ (k/2)+ k/2 log(k/2)].

Then, b and u are obtained from solving ∂h/∂b = 0 and ∂h/∂u = 0, respectively. For estimating the fixed parameter α, wecan use the adjusted profile h-likelihood

h(b(α), u(α), α)−12log{det(H (b(α), u(α), α)/2π)

}.

Here H consists of four n× n submatrices, which are a banded matrix and three diagonal matrices. The inverse of H can beobtained by combining inversion of block matrices. Thus, the proposed sparse techniques in the Appendix can be applied toeach block matrix.Harvey and Shephard (1996) introduced a leverage effect into the SV model by allowing a dependence between εt and

wt . This dependence allows the SVmodel to pick up the kind of asymmetric behavior that is often found in stock prices. Thismodel can be formulated as (1) and (2) with(

εtwt

)∼ N

((00

),

(1 ρσwρσw σ 2w

)).

Page 5: The hierarchical-likelihood approach to autoregressive stochastic volatility models

252 W. Lee et al. / Computational Statistics and Data Analysis 55 (2011) 248–260

Here the h-likelihood is given by

h(b, α) =n∑t=1

log(f (yt , bt |y(t−1), b(t−1)))

where α = (γ , φ, ρ, σ 2w).Because εt |wt ∼ N(ρwt/σw, 1 − ρ2), f (yt |bt , bt−1, wt) is a normal density function with mean exp(bt/2)ρwt/σw andvariance exp(bt)(1− ρ2). Because

f (yt , bt |y(t−1), b(t−1)) = f (yt |bt , bt−1)f (bt |bt−1)= f (yt |bt , bt−1, wt)f (bt |bt−1),

we can computelog f (yt , bt |y(t−1), b(t−1)) = −1/2 log(2π exp(bt)(1− ρ2))− 1/2(yt − exp(bt/2)ρ(bt − γ − φbt−1)/σw)2

/(exp(bt)(1− ρ2))− 1/2 log(2πσ 2w)− 1/2(bt − γ − φbt−1)2/σ 2w.

Again, H is a banded matrix, so our sparse techniques given in the Appendix can also be applied to this type of model.Liesenfeld and Richard (2003) studied amultivariate SVmodel for portfolio allocation and asset pricing. Thismultivariate

SV model accounts not only for the volatility dynamics of individual asset returns, but also for the typically observed serialdependence between them. Let yt = (y1,t , . . . , yp,t)T denote a vector of p asset returns. A multivariate one-factor SV modelfor yt is

yt = Drt + etrt = exp(bt/2)εtbt = γ + φbt−1 + wt

where D = (d1, . . . , dp)T denotes a vector of factor loadings, rt a latent factor following univariate SV process andet = (e1,t , . . . , ep,t)T a vector of serially independent idiosyncratic errors with mean 0 and covarianceΣe = diag(σ 2e,j)(j =1, . . . , p). In order to achieve identifiability, Liesenfeld and Richard (2003) used a restriction d1 = 1. Here the h-likelihoodis

h (b, α) =n∑t=1

log fα(yt |bt)+ log fα(b1, b2, . . . , bn)

= −12

n∑t=1

{yTt (DD

T exp(bt)+Σε)−1yt + log |DDT exp(bt)+Σε | +1σ 2w(bt − γ − φ bt−1)2 + log σ 2w

}.

Since H has the same banded structure as the standard ARSV model, we can use the sparse matrix techniques given in theAppendix.

5. Prediction of volatilities

In addition to the smoothing estimate of volatilities based on thewhole sample, variousmethods allow algorithms for theone-step-ahead prediction. This is also possible in the h-likelihood approach. The h-likelihood prediction procedure consistsof two repeated steps with initialization b0 = γ /(1− φ).

b1p = γ + φb0

b1u = argminb1

(y21 exp(−b1)+ b1 +

1σ 2(b1 − γ − φb0)2

)b2p = γ + φb1u

b2u = argminb2

(y22 exp(−b2)+ b2 +

1σ 2(b2 − γ − φb1u)2

)...

where argmin stands for the value that attains theminimum value of the function. Here btp is the one-step-ahead predictionfor volatility bt based on the information up to time t − 1, while btu is the updated estimate for volatility bt based onthe information up to time t . Note that we use b(t−1)u when we compute both btp and btu for t = 2, . . . , n. Our updatingprocedure gives an estimate of bt that maximizes f (yt |bt)f (bt |b(t−1)u). Furthermore, because

f (bt |yt , b(t−1)u) ∝ f (yt |bt)f (bt |b(t−1)u),

btu is based on the current observation yt and b(t−1)u. The updating procedure can be computed by applying theNewton–Raphson algorithm in one dimension.

Page 6: The hierarchical-likelihood approach to autoregressive stochastic volatility models

W. Lee et al. / Computational Statistics and Data Analysis 55 (2011) 248–260 253

6. Improving the Laplace approximation

Shun and McCullagh (1995) showed that first-order Laplace approximation is valid if the dimension of the integral isincreasing with O(n1/3). In ARSV models, however, such a theoretical support is not possible because the dimension ofthe integral is O(n). Thus, when the accuracy of the Laplace approximation is poor, Skaug and Fournier (2006) proposedthe use of importance sampling to improve on the Laplace approximation (10). First, draw M independent randomvectors b(1), . . . , b(M) from a multivariate normal distribution with mean b(α) and covariance matrix −H−1(α) and thenapproximate the marginal likelihoodm by

LI(α) = M−1M∑j=1

exp(h(b(j), α))/q(b(j), b(α),−H−1(α)),

where q(·, µ,Σ) denotes a multivariate normal density function with meanµ and covariance matrixΣ . However, this canbe computationally intensive.

7. Numerical study

To understand the finite sample performance of the h-likelihood method, we perform a simulation study using theunivariate ARSV model. We select six sets of parameters as in Jacquier et al. (1994) and Sandmann and Koopman (1998)to generate 500 data sets of time length T = 500. The values of the autoregressive parameter φ are set at 0.90, 0.95 and 0.98and the values of σw are selected so that the coefficient of variation (cv) takes the values

√10 and 1; γ is chosen such that

the expected variable E(σ 2t ) = 0.0009. From (1) and (3), the parameters can be interpreted in the following way (Taylor,2005):

var (yt) = E(σ 2t)= exp

1− φ+

σ 2w

2(1− φ2

)] (11)

and

kurtosis (yt) = 3(1+ cv2

(σ 2t))= 3 exp

[σ 2w

1− φ2

]. (12)

7.1. Prediction of volatilities

Given fixed parameter estimators, the smoothing estimates of volatilities can be obtained by solving Eq. (8). Using theresults from Section 5, the one-step-ahead prediction can be obtained. Ruiz (1994) studied the QMLmethod andMeyer andYu (2000) studied the Bayesian MCMC method for estimation of the ARSV models.For the six sets of true parameters in Table 1 and for time length F = 10, 30 and 50, we run 500 data sets and compare

the h-likelihood, Bayesian MCMC and QML prediction approaches with the predictive root mean squared error (PRMSE),

PRMSE =

√√√√1F

F∑t=1

(btp − bt)2.

In each simulated dataset, the parameters (γ , φ, σw) are estimated to get the prediction of volatilities. The posterior meansare reported as the Bayesian MCMC estimates, based on 10,000 iterations after a burn-in of 5000. Table 1 shows the meansof the PRMSE in the 500 runs. The h-likelihood and Bayesian MCMC approaches are quite comparable. The h-likelihoodpredictors are uniformly better than the QML predictors. For smoothed volatilities (Durbin and Koopman, 2000) for theobserved data (y1, . . . , yT ), Meyer and Yu (2000)’s MCMC method is slightly better in PRMSE. We also study the predictionof future volatilities for (yT+1, . . . , yT+F ). For short-term prediction F = 10, the Bayesian MCMC approach is slightly better,but for long-term predictions F = 30 and 50, the h-likelihood method is slightly better.

7.2. Parameter estimation

We compare the h-likelihood estimator (HLE) to the Monte Carlo likelihood estimator from Sandmann and Koopman(1998) (MCL), the approximate ML estimator from Fridman and Harris (1998) (FHL), Quasi-Maximum Likelihood estimatorfromHarvey et al. (1994) (QML), approximateML estimator from Shimada and Tsukuda (2005) (STL), MCMC of Jacquier et al.(1994) and composite likelihood estimator from Joe et al. (submitted for publication) (BCL(4)). When cv is small (

√0.1), as

pointed out by Sandmann and Koopman (1998), the ARSV model is almost indistinguishable from the constant volatilitymodel, and all methods perform badly. In Table 2 we report the MCMC results from Jacquier et al. (1994). Table 2 showsthat no method is uniformly better than the others. When cv is not small (

√10 or 1), the h-likelihood method gives similar

estimates of φ and σw to those of Fridman and Harris (1998). In estimating φ and σw , the HLE has a uniformly smaller RMSEthan the QML estimator. In comparing the HLE with BCL(4), the RMSEs for γ and φ are comparable. Also, its performance

Page 7: The hierarchical-likelihood approach to autoregressive stochastic volatility models

254 W. Lee et al. / Computational Statistics and Data Analysis 55 (2011) 248–260

Table 1The mean PRMSE of volatility predictions from the 500 simulated items of data.

The mean PRMSE CasesH-likelihood MCMC QML F γ φ σw sample

0.775 0.748 0.873 smooth −0.821 0.9 0.675 5000.646 0.624 0.742 smooth −0.411 0.95 0.484 5000.509 0.491 0.596 smooth −0.164 0.98 0.308 5000.540 0.531 0.629 smooth −0.736 0.9 0.363 5000.457 0.448 0.548 smooth −0.368 0.95 0.26 5000.368 0.355 0.449 smooth −0.147 0.98 0.166 500

1.248 1.233 1.267 10 −0.821 0.9 0.675 5001.072 1.061 1.100 10 −0.411 0.95 0.484 5000.819 0.784 0.976 10 −0.164 0.98 0.308 5000.707 0.706 0.725 10 −0.736 0.9 0.363 5000.624 0.614 0.647 10 −0.368 0.95 0.26 5000.498 0.479 0.526 10 −0.147 0.98 0.166 500

1.409 1.410 1.415 30 −0.821 0.9 0.675 5001.292 1.322 1.301 30 −0.411 0.95 0.484 5001.060 1.042 1.193 30 −0.164 0.98 0.308 5000.775 0.787 0.781 30 −0.736 0.9 0.363 5000.722 0.735 0.729 30 −0.368 0.95 0.26 5000.610 0.600 0.620 30 −0.147 0.98 0.166 500

1.472 1.477 1.475 50 −0.821 0.9 0.675 5001.393 1.449 1.397 50 −0.411 0.95 0.484 5001.197 1.214 1.326 50 −0.164 0.98 0.308 5000.804 0.824 0.807 50 −0.736 0.9 0.363 5000.767 0.799 0.772 50 −0.368 0.95 0.26 5000.672 0.679 0.679 50 −0.147 0.98 0.166 500

is comparable to that of the other methods considered. However, the HLE shows serious bias in estimating γ . Joe et al.(submitted for publication) show that the reported results for QML and MCL of Sandmann and Koopman (1998) have meansquared errors for the parameter γ that are much too small. We list Sandmann and Koopman’s (1998) original reports inTable 2 because our conclusion does not change. To check whether the biases decrease with the sample size, we repeat thesimulation with T = 1000. Table 3 shows that the biases indeed decrease as the sample size increases.Instead of γ , a natural parameter would be the expected value of the random effects bt in Eq. (2), that is the unconditional

expected value of random effects,

β = E(bt) =γ

1− φ. (13)

This is exactly the equilibrium level of the process {bt}. From the financial point of view, in pricing derivatives,which is one ofthe most important applications of the SV models, the estimation of the long-term volatility is of the main interest, becausethe price depends on the volatility in the date of maturity of the product (in general measured in months or years). Thus,the correct estimation of β is important for the long-term estimation. Ruiz (1994) and Meyer and Yu (2000) also used thisparameterization. Lee and Nelder (2004) showed that parameterizations for the marginal moments achieve insensitivity ofestimates to model assumptions, compared with those for the conditional moments.We run 500 data sets and compare the h-likelihood, Bayesian MCMC and QML approaches with the root mean squared

error (RMSE),

RMSE =√∑

t

(βt − β)2/500.

The delta method is used in the h-likelihood method to estimate β . For the Bayesian MCMC estimates, posterior means areused from 10,000 MCMC iterations. Table 4 shows that the h-likelihood method is uniformly better than the MCMC andQML methods in estimating β .

7.3. Analysis of Pound/Dollar exchange rates

To illustrate a computational efficiency, we consider the Pound/Dollar daily exchange rate data from 01/10/81 to28/06/85, which has been previously analyzed by Meyer and Yu (2000) and Durbin and Koopman (2000). Estimates andstandard errors of β , φ and σw from the ARSVmodel (2) are given in Table 5. The h-likelihood estimates for fixed parametersare implemented in MATLAB. We report in Table 5 the CPU time on a 3.21 GHz PC running on WINDOWS XP. The numberof Newton–Raphson iterations is 30. QML estimates are obtained using Ruiz’s (1994) method. Bayesian MCMC estimates

Page 8: The hierarchical-likelihood approach to autoregressive stochastic volatility models

W. Lee et al. / Computational Statistics and Data Analysis 55 (2011) 248–260 255

Table 2Mean and root mean square error (RMSE) of the estimators. The numbers in each cell are the mean and the mean square error (in parentheses) for eachof the estimation methods considered. The HLE, STL, MCL, FHL, MCMC, QMLE and BCL(4) are respectively obtained from Table 1 of Shimada and Tsukuda(2005), Table 2 of Sandmann and Koopman (1998), Table of Fridman and Harris (1998), Table 7 of Jacquier et al. (1994), Table 2 of Sandmann and Koopman(1998) and Table 1 of Joe et al. (submitted for publication).

T = 500cv =

√10 γ φ σw γ φ σw γ φ σw

TRUE −0.821 0.9 0.675 −0.411 0.95 0.484 −0.164 0.98 0.308

HLE −0.980 0.881 0.680 −0.555 0.933 0.497 −0.313 0.962 0.336RMSE (0.373) (0.045) (0.085) (0.276) (0.034) (0.072) (0.225) (0.028) (0.065)STL −0.905 0.880 0.727 −0.510 0.931 0.534 −0.259 0.965 0.343RMSE (0.278) (0.037) (0.097) (0.226) (0.031) (0.089) (0.172) (0.023) (0.066)FHL −0.896 0.890 0.685 −0.505 0.940 0.495 −0.100 0.986 0.320RMSE (0.280) (0.034) (0.080) (0.180) (0.020) (0.070) (0.080) (0.010) (0.050)MCL −0.837 0.915 0.579 −0.417 0.953 0.436 −0.166 0.977 0.290RMSE (0.034) (0.025) (0.119) (0.021) (0.020) (0.077) (0.010) (0.020) (0.053)MCMC −0.679 0.916 0.562 −0.464 0.940 0.460 −0.190 0.980 0.350RMSE (0.220) (0.026) (0.120) (0.160) (0.020) (0.055) (0.080) (0.010) (0.060)QML −0.821 0.884 0.703 −0.410 0.938 0.502 −0.164 0.970 0.321RMSE (0.032) (0.055) (0.170) (0.032) (0.045) (0.130) (0.00) (0.032) (0.10)BCL(4) −0.931 0.890 0.675 −0.541 0.930 0.504 −0.334 0.960 0.348RMSE (0.370) (0.040) (0.130) (0.310) (0.040) (0.130) (0.310) (0.040) (0.130)

cv = 1 γ φ σw γ φ σw γ φ σw

TRUE −0.736 0.9 0.363 −0.368 0.95 0.26 −0.147 0.98 0.166

HLE −1.089 0.853 0.398 −0.613 0.917 0.290 −0.397 0.946 0.201RMSE (0.770) (0.103) (0.105) (0.551) (0.045) (0.084) (0.703) (0.098) (0.075)STL −0.926 0.872 0.422 −0.526 0.927 0.303 −0.278 0.961 0.200RMSE (0.424) (0.059) (0.108) (0.390) (0.053) (0.089) (0.246) (0.034) (0.067)FHL −0.870 0.880 0.370 −0.510 0.930 0.280 −0.090 0.987 0.180RMSE (0.430) (0.050) (0.080) (0.306) (0.040) (0.070) (0.060) (0.015) (0.040)MCL −0.745 0.897 0.325 −0.372 0.93 0.233 −0.148 0.97 0.161RMSE (0.022) (0.100) (0.080) (0.011) (0.102) (0.075) (0.010) (0.071) (0.050)MCMC −0.870 0.880 0.350 −0.560 0.920 0.280 −0.220 0.970 0.230RMSE (0.340) (0.046) (0.067) (0.340) (0.046) (0.065) (0.140) (0.020) (0.080)QML −0.736 0.845 0.417 −0.368 0.906 0.302 −0.147 0.942 0.203RMSE (0.001) (0.187) (0.221) (0.001) (0.182) (0.173) (0.001) (0.170) (0.158)BCL(4) −1.006 0.860 0.393 −0.648 0.910 0.300 −0.327 0.960 0.186RMSE (0.630) (0.090) (0.120) (0.550) (0.070) (0.120) (0.480) (0.070) (0.110)

Table 3Mean and root mean square error (RMSE) of the estimators. The numbers in each cell are the mean and the mean square error (in parentheses) for HLEwith T = 1000.

T = 1000

cv =√10 γ φ σw γ φ σw γ φ σw

TRUE −0.821 0.9 0.675 −0.411 0.95 0.484 −0.164 0.98 0.308

HLE −0.881 0.893 0.664 −0.462 0.944 0.485 −0.220 0.973 0.321RMSE (0.189) (0.023) (0.056) (0.134) (0.016) (0.047) (0.101) (0.012) (0.040)

cv = 1 γ φ σw γ φ σw γ φ σw

TRUE −0.736 0.9 0.363 −0.368 0.95 0.26 −0.147 0.98 0.166

HLE −0.808 0.890 0.365 −0.431 0.941 0.277 −0.207 0.972 0.179RMSE (0.178) (0.024) (0.047) (0.142) (0.019) (0.040) (0.100) (0.014) (0.033)

Table 4The mean RMSE of β from the 500 simulated data.

The mean RMSE of β CasesH-likelihood MCMC QML β φ σw sample

0.323 0.402 0.334 −8.210 0.9 0.675 5000.456 0.827 0.465 −8.220 0.95 0.484 5000.719 1.262 0.726 −8.200 0.98 0.308 5000.185 0.315 0.200 −7.360 0.9 0.363 5000.252 0.575 0.266 −7.360 0.95 0.26 5000.391 0.730 0.403 −7.350 0.98 0.166 500

Page 9: The hierarchical-likelihood approach to autoregressive stochastic volatility models

256 W. Lee et al. / Computational Statistics and Data Analysis 55 (2011) 248–260

Table 5Comparison of h-likelihood estimates with Bayesian MCMC and QML estimates for Pound/Dollar data.

H-likelihood MCMC QMLMean s.e. Mean s.e. Mean s.e.

β −1.003 0.251 −0.708 0.300 −0.964φ 0.968 0.009 0.980 0.011 0.989 0.008σw 0.186 0.025 0.153 0.031 0.086 0.033

variance 0.480 0.662 0.452kurtosis 5.147 5.418 4.207

time(sec) 68 1173 11.9

are obtained using the BUGS program of Meyer and Yu (2000) and they are based on 100,000 iterations after a burn-inperiod of 10,000. In the Table 5, using (11) and (12), the variance and kurtosis estimators from the h-likelihood, MCMC andQML methods are reported to check which estimates provide a better match for the sample variance 0.506 and the samplekurtosis 6.846. Three methods give similar estimates, but the MCMC method is computationally very inefficient.

8. Concluding remarks

In this paper we have shown how to use sparse matrix techniques to enhance the computation for the ARSV model andvarious extensions. The simulation study shows that, for fixed parameter estimation, the performance of the h-likelihoodmethod is comparable to that of the others, but it shows nonnegligible bias in estimating γ . However, with a naturalparameterization of β it outperforms the MCMC method. For predicting volatilities, the Bayesian MCMC method has lowerPRMSE than the h-likelihood method for short-term prediction, but the h-likelihood method shows better performance forlong-term prediction.

Appendix. Computation of the score equation and the Hessian matrix

An n× nmatrix A =(aij)is a banded matrix with bandwidth p if aij = 0 for |i− j| > p, for some constant p. That is, all

entries are zero outside a diagonally bordered band. Given a vector(aj)and a constant c , we denote BD

[aj, c

]the banded

matrix with p = 1 whose diagonal elements are aj and one-step off-diagonal elements are constant c . For a non-singular

matrix of functions A(x), we have ddx log {det (A(x))} = trace(A (x)−1 dA(x)dx

).

The h-likelihood estimation requires three sparse matrix computation techniques. The first two are from Section 4.3 ofGolub and Loan (1996). Below, we explain the techniques we use:

(T1) the computation of A−1y, where A is an n× n symmetric banded matrix with bandwidth p;(T2) the inverse of the symmetric banded matrix A; and(T3) finally, the computation of BA.

For (T1), A−1y can be computed by solving the linear equation Ax = y. This linear equation could be solved with the order ofn(p2 + 3p) using the Cholesky decomposition of A (Golub and Loan, 1996). For (T2), we iteratively solve the linear systemsAx = ek for k = 1, 2, . . . n, where ek is the vector whose kth element is 1 and whose others are 0. Thus, it requires aboutn2(p2 + 3p) operations. For (T3), suppose that αk is the kth off-diagonal element of A for k = 1, . . . , p, and βk is its kthdiagonal element. Then, simple algebra shows that the kth column vector of C = BA, say C(:, k), is, for k = 1, 2, . . . , n,

C(:, k) = βkB(:, k)+p∑i=1

αi{B(:, k− i)+ B(:, k+ i)

},

where B(:, j)is a zero vector for j ≤ 0.

Smoothing estimates of volatilities

Consider the h-likelihood h (b, α) (5). The score equations for the h-likelihood and the Hessian matrix have theexpressions (6)–(9). The estimate of b at step k, b(k), is updated until the algorithm converges by

b(k+1) = b(k) + H−1 (∂h/∂b)|b=b(k)

where H = BD[ 12y2t exp(−bt) +

1σ 2w(1 + φ2),− 1

σ 2wφ]. Thus, using the technique for (T1) we can update the random-effect

estimators.

Page 10: The hierarchical-likelihood approach to autoregressive stochastic volatility models

W. Lee et al. / Computational Statistics and Data Analysis 55 (2011) 248–260 257

Parameter estimation

Let h∗ (b, α) be defined byh∗ (b, α) = h (b, α)− (1/2) log {det(H (b, α) /2π)} .

Then, the profile h-likelihood in (10) becomes

hP (α) ≡ h∗(b(α), α), (14)

where b (α) is the solution to ∂h/∂b = 0. Here, b = (b1, b2, . . . , bn) are the random effects and α = (α1, α2, α3) =(γ , φ, σ 2w

)are the fixed parameters. Below, we use b instead of b (α) for notational simplicity.

The score equations for the profile h-likelihood and the Hessian matrix have the expressions

∂hP∂αj=

n∑t=1

∂h∗

∂bt

∂bt∂αj+∂h∗

∂αj= 0 (15)

and∂2hP∂αi∂αj

=

n∑t,s=1

∂2h∗

∂bt∂bs

(∂bt∂αi

)( ∂bs∂αj

)+

n∑t=1

∂h∗

∂bt

( ∂2bt∂αi∂αj

)+

+

n∑t=1

( ∂2h∗

∂αj∂bt

)(∂bt∂αi

)+

n∑t=1

( ∂2h∗

∂αi∂bt

)(∂bt∂αj

)+

∂2h∗

∂αi∂αj. (16)

A.1. Details of the score equation

In this section, we show the detailed computation of terms for the score equation (15).

∂h∗

∂bt=12y2t exp(−bt)−

12−1σ 2w

(bt − γ − φbt−1

)+φ

σ 2w

(bt+1 − γ − φbt

)−12trace

(H−1

∂H∂bt

), (17)

where ∂H/∂bt = BD[−12y2t exp(−bt), 0

].

Now we compute ∂h∗/∂αj for each of αj. We first consider the case αj = γ ,

∂h∗

∂γ=1σ 2w

n∑t=1

(bt − γ − φbt−1

), (18)

since ∂H/∂γ = 0, from (9). For the case αj = φ,

∂h∗

∂φ=1σ 2w

n∑t=1

bt−1(bt − γ − φbt−1

)−12trace

(H−1

∂H∂φ

)−12

n∑t=1

trace(H−1

∂H∂bt

)(∂bt∂φ

), (19)

where ∂H∂φ= BD

[2φσ 2w,− 1

σ 2w

].

Finally, for the case αj = σ 2w ,

∂h∗

∂σ 2w=

12σ 4w

n∑t=1

(bt − γ − φbt−1

)2−n2σ 2w−12trace

(H−1

∂H∂σ 2w

)−12

n∑t=1

trace(H−1

∂H∂bt

) ∂bt∂σ 2w

, (20)

where ∂H∂σ 2w= BD

[−1+φ2

σ 4w,φ

σ 4w

].

To compute(∂bt/∂αj

), we differentiate both sides of (6) with respect to αj and we have, for αj = γ ,

−12y2t exp

(−bt

)∂bt∂γ−1σ 2w

{(∂bt∂γ− 1− φ

∂bt−1∂γ

)− φ

(∂bt+1∂γ− 1− φ

∂bt∂γ

)}= 0. (21)

Thus,(∂bt/∂γ

)is the solution to

BD[−y2t exp(−bt)−

2σ 2w

(1+ φ2

),2φσ 2w

](∂bt/∂γ

)=2(φ − 1)σ 2w

. (22)

The computations for the cases αj = φ or σ 2w are similar to the above. Thus, below, we only report the results. For thecase αj = φ,

(∂bt/∂φ

)is the solution to

BD[−y2t exp(−bt)−

2σ 2w

(1+ φ2

),2φσ 2w

](∂bt/∂φ

)=2σ 2w

{−bt−1 + φbt −

(bt+1 − γ − φbt

)}. (23)

Page 11: The hierarchical-likelihood approach to autoregressive stochastic volatility models

258 W. Lee et al. / Computational Statistics and Data Analysis 55 (2011) 248–260

and(∂bt/∂σ 2w

)is the solution to

BD[−12y2t exp(−bt)−

1+ φ2

σ 2w,φ

σ 2w

](∂bt/∂σ 2w

)= −

1σ 4w

(bt − γ − φbt−1

)+φ

σ 4w

(bt+1 − γ − φbt

). (24)

A.2. Details of the Hessian matrix

This section shows details of the computation of the Hessian matrix (16) to estimate α =(γ , φ, σ 2w

).

∂2h∗

∂b2t= −

12y2t exp(−bt)−

1σ 2w−φ2

σ 2w+12trace

(H−1

∂H∂btH−1

∂H∂bt

)−12trace

(H−1

∂2H∂b2t

). (25)

Twice differentiation about γ

∂2h∗

∂γ ∂bt=1− φσ 2w+12trace

(H−1

∂H∂btH−1

∂H∂bt

)(∂bt∂γ

)−12trace

(H−1

∂2H∂b2t

)(∂bt∂γ

). (26)

∂2h∗

∂γ 2= −

nσ 2w, (27)

and(∂2bt/∂γ 2

)is the solution to

BD[−12y2t exp(−bt)−

1+ φ2

σ 2w,φ

σ 2w

](∂2bt/∂γ 2

)= −

12y2t exp(−bt)

(∂bt∂γ

)2. (28)

Twice differentiation about φ

∂2h∗

∂φ∂bt=bt−1σ 2w+1σ 2w

(bt+1 − γ − φbt

)−φbtσ 2w+12trace

(H−1

∂H∂btH−1

∂H∂bt

)(∂bt∂φ

)−12trace

(H−1

∂2H∂b2t

)(∂bt∂φ

)+12trace

(H−1

∂H∂φH−1

∂H∂bt

)−12trace

(H−1

∂2H∂bt∂φ

), (29)

where ∂2H∂φ∂bt

= 0,

∂2h∗

∂φ2= −

1σ 2w

n∑t=1

b2t−1 +12trace

(H−1

∂H∂φH−1

∂H∂φ

)−12trace

(H−1

∂2H∂φ2

), (30)

and(∂2bt/∂φ2

)is the solution to

BD[−y2t exp

(−bt

)− 21+ φ2

σ 2w,2φσ 2w

](∂2bt∂φ2

)= −y2t exp

(−bt

)(∂bt∂φ

)2+8φσ 2w

∂bt∂φ+4σ 2w

{−∂bt−1∂φ+ bt −

∂bt+1∂φ

}.

(31)

Twice differentiation about σ 2w

∂2h∗

∂σ 2w∂bt=

1σ 4w

(bt − γ − φbt−1

)−φ

σ 4w

(bt+1 − γ − φbt

)+12trace

(H−1

∂H∂tH−1

∂H∂bt

)( ∂bt∂σ 2w

)−12trace

(H−1

∂2H∂b2t

)( ∂bt∂σ 2w

)+12trace

(H−1

∂H∂σ 2w

H−1∂H∂bt

)−12trace

(H−1

∂2H∂bt∂σ 2w

),

∂2h∗

∂σ 4w= −

1σ 6w

n∑t=1

(bt − γ − φbt−1

)2+n2σ 4w+12trace

(H−1

∂H∂σ 2w

H−1∂H∂σ 2w

)−12trace

(H−1

∂2H∂σ 4w

)(32)

Page 12: The hierarchical-likelihood approach to autoregressive stochastic volatility models

W. Lee et al. / Computational Statistics and Data Analysis 55 (2011) 248–260 259

and(∂2bt/∂σ 4w

)is the solution to

BD[−12y2t exp(−bt)−

1+ φ2

σ 2w,φ

σ 2w

](∂2bt/(∂σ 2w)

2)

= −12y2t exp(−bt)

( ∂bt∂σ 2w

)2+2σ 6w

(bt − γ − φbt−1

)−2φσ 6w

(bt+1 − γ − φbt

)−2σ 4w

( ∂bt∂σ 2w− φ

∂bt−1∂σ 2w

)+2φσ 4w

(∂bt+1∂σ 2w

− φ∂bt∂σ 2w

).

Differentiation about γ , and φ

∂2h∗

∂γ ∂φ= −

1σ 2w

n∑t=1

bt−1, (33)

and(∂2bt/∂γ ∂φ

)is the solution to

BD[−12y2t exp(−bt)−

1+ φ2

σ 2w,φ

σ 2w

](∂2bt/∂γ ∂φ

)= −

12y2t exp(−bt)

(∂bt∂γ

)(∂bt∂φ

)−1σ 2w

(∂bt−1∂γ+∂bt+1∂γ

)+1σ 2w+2φσ 2w

(∂bt∂γ

). (34)

Differentiation about γ and σ 2w

∂2h∗

∂γ ∂σ 2w= −

1σ 4w

n∑t=1

(bt − γ − φbt−1

), (35)

and(∂2bt/∂γ ∂σ 2w

)is the solution to

BD[−12y2t exp(−bt)−

1+ φ2

σ 2w,φ

σ 2w

](∂2bt/∂γ ∂σ 2w

)= −

12y2t exp(−bt)

(∂bt∂γ

)( ∂bt∂σ 2w

)−1+ φ2

σ 4w

(∂bt∂γ

)+φ

σ 4w

(∂bt−1∂γ+∂bt+1∂γ

)−φ − 1σ 4w

.

Differentiation about φ and σ 2w

∂2h∗

∂φ∂σ 2w= −

1σ 4w

n∑t=1

bt−1(bt − γ − φbt−1

)−12trace

(H−1

∂2H∂σ 2w∂φ

)+12trace

(H−1

∂H∂σ 2w

H−1∂H∂φ

), (36)

where∂2H∂σ 2w∂φ

= BD[−2φσ 4w

,1σ 4w

], and

∂H∂σ 2w= BD

[−1+ φ2

σ 4w,φ

σ 4w

], (37)

and∂2H∂σ 2w∂bt

= 0. (38)(∂2bt/∂φ∂σ 2w

)is the solution to

BD[−y2t exp(−bt)− 2

1+ φ2

σ 2w,2φσ 2w

](∂2bt/∂φ∂σ 2w

)= −y2t exp(−bt)

(∂bt∂φ

)( ∂bt∂σ 2w

)+ y2t exp(−bt)

(∂bt∂φ

)( 1σ 2w

)+4φσ 2w

( ∂bt∂σ 2w

)−2φσ 2w

(∂bt+1∂σ 2w

)−2φσ 2w

(∂bt−1∂σ 2w

).

References

Black, F., Scholes, M., 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81, 637–659.Castillo, J., Lee, Y., 2008. GLM-methods for volatility models. Statistical Modelling 8, 263–283.Durbin, J., Koopman, S.J., 2000. Time series analysis of non-Gaussian observations based on state spacemodels fromboth classical and Bayesian perspectives(with discussion). Journal of Royal Statistical Society Series B 62, 3–56.

Page 13: The hierarchical-likelihood approach to autoregressive stochastic volatility models

260 W. Lee et al. / Computational Statistics and Data Analysis 55 (2011) 248–260

Fridman,M., Harris, L., 1998. Amaximum likelihood approach for the non-Gaussian stochastic volatilitymodels. Journal of Business and Economic Statistics16, 284–291.

Golub, G., Loan, C.V., 1996. Matrix computation, third edition. The Johns Hopkins University Press, London.Harvey, A., Ruiz, E., Shephard, N., 1994. Multivariate stochastic variance models. Review of Economic Studies 61, 247–264.Harvey, A., Shephard, N., 1996. Estimation of an asymmetric stochastic volatility model for asset returns. Journal of Business & Economic Statistics 96,429–434.

Heston, S., 1993. A closed-form solution for options with stochastic volatility with applications to bond and currency options. The Review of FinancialStudies 6, 327–343.

Hull, J., White, A., 1987. The pricing of options on assets with stochastic volatilities. The Journal of Finance 42, 281–300.Jacquier, E., Polson, N., Rossi, P., 1994. Bayesian analysis of stochastic volatility models (with discussion). Journal of Business and Economic Statistics 12,371–417.

Joe, H., Ng, T., Qu, J., Lee, Y., 2009. Composite likelihood approach to stochastic volatility models (submitted for publication).Kim, S., Shephard, N., Chib, S., 1998. Stochastic volatility: likelihood inference and comparisonwith ARCHmodels. Review of Economic Studies 65, 361–393.Lee, Y., Ha, I., 2010. Orthodox BLUP versus h-likelihood methods for inferences about random effects in Tweedie mixed models, Statistics and Computing(in press).

Lee, Y., Nelder, J.A., 1996. Hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society Series B 58, 619–678.Lee, Y., Nelder, J.A., 2004. Conditional and marginal models: another view (with discussion). Statistical Science 19, 219–238.Lee, Y., Nelder, J.A., 2006. Double Hierarchical generalized linear models (with discussion). Applied Statistics 55, 139–185.Lee, Y., Nelder, J., Pawitan, Y., 2006. Generalized Linear Models with Random Effects: Unified Analysis via H-Likelihood. Chapman and Hall, London.Liesenfeld, R., Richard, J.F., 2003. Univariate and multivariate stochastic volatility models: estimation and diagnostics. Journal of Empirical Finance 10,505–531.

Meyer, R., Fournier, D., Berg, A., 2003. Stochastic volatility: Bayesian computation using automatic differentiation and extended Kalman filter. EconometricsJournal 6, 408–420.

Meyer, R., Yu, J., 2000. BUGS for a Bayesian analysis of stochastic volatility models. Econometrics Journal 3, 198–215.Pollock, D.S.G., 2003. Recursive estimation in econometrics. Computational Statistics & Data Analysis 44, 37–75.Ruiz, E., 1994. Quasi-maximum likelihood estimation of stochastic volatility models. Journal of Econometrics 63, 284–306.Sandmann, G., Koopman, S., 1998. Estimation of stochastic volatility models via Monte Carlo maximum likelihood. Journal of Econometrics 87, 271–301.Scott, L., 1987. Option pricing when the variance changes randomly: theory, estimation, and an application. The Journal of Financial and QuantitativeAnalysis 22, 419–438.

Skaug, H.J., Fournier, D.A., 2006. Automatic approximation of the marginal likelihood in non-Gaussian hierarchical models. Computational Statistics & DataAnalysis 51, 699–709.

Stein, E., Stein, J., 1991. Stock price distributions with stochastic volatility: an analytic approach. The Review of Financial Studies 4, 727–752.Shimada, J., Tsukuda, Y., 2005. Estimation of stochastic volatility models: an approximation to the nonlinear state space representation. Communicationsin Statistical Simulation and Computation 34, 429–450.

Shun, Z., McCullagh, P., 1995. Laplace approximation of high dimensional integrals. Journal of Royal Statistical Society, Ser. B 57, 749–760.Taylor, S., 2005. Asset Price Dynamics, Volatility and Prediction. Princeton University Press.Wei, W.S., 2006. Time Series Analysis: Univariate and Multivariate Methods. Addison Wesley.Wiggins, J., 1987. Option values under stochastic volatility: theory and empirical estimates. Journal of Financial Economics 19, 351–372.