Multivariate distribution models with generalized ... · R. Schmidt et al., 2006, Multivariate distribution models with generalized hyperbolic margins, Comput. Statist. Data Anal

R. Schmidt et al., 2006, Multivariate distribution models with generalized hyperbolic margins,Comput. Statist. Data Anal. 50, 2065–2096.

Multivariate distribution models with

generalized hyperbolic margins

Rafael Schmidt 1

Department of Economic and Social Statistics, University of Cologne.Department of Statistics, The London School of Economics & Political Science.

Tomas Hrycej, Eric Stutzle

Department of Information Mining, DaimlerChrysler AG - Research andTechnology.

Abstract

Multivariate generalized hyperbolic distributions represent an attractive family ofdistributions (with exponentially decreasing tails) for multivariate data modelling.However, in a limited data environment, robust and fast estimation procedures arerare. An alternative class of multivariate distributions (with exponentially decreas-ing tails) is proposed which comprises affine-linearly transformed random vectorswith stochastically independent and generalized hyperbolic marginals. The latterdistributions possess good estimation properties and have attractive dependencestructures which are explored in detail. In particular, dependencies of extreme events(tail dependence) can be modelled within this class of multivariate distributions. Inaddition the necessary estimation and random-number generation procedures areprovided. Various advantages and disadvantages of both types of distributions arediscussed and illustrated via a simulation study.

Key words: Multivariate distributions, Generalized hyperbolic distributions,Affine-linear transforms, Copula, Tail dependence, Estimation, Random numbergeneration

Email address: [email protected] (Rafael Schmidt).URL: www.uni-koeln.de/wiso-fak/wisostatsem/autoren/schmidt (Rafael Schmidt).

1 Department of Economic and Social Statistics, University of Cologne, Albertus-Magnus-Platz,50923 Koeln, Germany. Tel.: +49 (0) 221 470 2283, Fax: +49 (0) 221 470 5074. This work wassupported by a fellowship within the Postdoc-Programme of the German Academic ExchangeService (DAAD).

1 Introduction

Data analysis with generalized hyperbolic distributions has become quite popular in variousareas of theoretical and applied statistics. Originally, Barndorff-Nielsen (1977) utilized this classof distributions to model grain size distributions of wind-blown sand (cf. also Barndorff-Nielsenand Blæsild (1981) and Olbricht (1991)). In econometrical finance the latter family of distribu-tions has been used for multidimensional asset-return modelling. In this context, the generalizedhyperbolic distribution replaces the Gaussian distribution, which is not able to describe the fattails and the (distributional) skewness of most financial asset-return data. References are Eber-lein and Keller (1995), Prause (1999), Bauer (2000), Bingham and Kiesel (2001), and Eberlein(2001).

Multivariate generalized hyperbolic distributions (in short: MGH distributions) were introducedand investigated by Barndorff-Nielsen (1978) and Blæsild and Jensen (1981). These distributionshave attractive analytical and statistical properties whereas robust and fast parameter estima-tion turns out to be difficult in higher dimensions. Furthermore, MGH distribution functionspossess no parameter constellation for which they are the product of their marginal distributionfunctions. However, many applications require the multivariate distribution function to modelboth: Marginal dependence and independence. Because of these and other shortcomings (see alsoSection 4) we introduce and explore a new class of multivariate distributions, the so called mul-tivariate affine generalized hyperbolic distributions (in short: MAGH distributions). This classof distributions has an appealing stochastic representation and, in contrast to the MGH distri-butions, the estimation and simulation algorithms are easier. Moreover, our simulation studyreveals that the goodness-of-fit of the MAGH distribution is comparable to that of the MGHdistribution. The one-dimensional marginals of an MAGH distribution are even more flexibledue to more flexibility in the parameter choice.

After a brief introduction of MGH and MAGH distributions in Sections 2 and 3, we start witha discussion of advantages and disadvantages of both types of distributions in data modellingand data analysis (see Section 4).

The analysis of the paper is divided into four different stages:

(1) Elaboration of statistical-mathematical properties of MGH and MAGH distributions (Sec-tion 5),

(2) Computational procedures for the parameter estimation of MGH and MAGH distributions(Section 6),

(3) Random number generation for MGH and MAGH distributions (Section 7),

(4) Simulation study and real data analysis (Sections 8 and 9).

The paper’s main contributions on each stage are as follows: At the first stage we concentrate onthe dependence structure of both distributions by utilizing the theory of copulae. In particular,we show that the dependence structure of an MAGH distribution is very appealing for datamodelling. This is because the correlation matrix as an important dependence measure is moreintuitive and easier to handle for an MAGH distribution than for an MGH distribution. Further,certain parameter constellations imply independent margins of the MAGH distribution whereasthe margins of an MGH distribution do not have this property. Moreover, in contrast to MGHdistributions, MAGH distributions can model dependencies of extreme events (so called taildependence) which is an important property for financial risk analysis. At the next stage we show

2

that the parameters of the MAGH distribution can be estimated in a simple two-stage method.This procedure reduces to the estimation of the covariance matrix and the parameters whichare related to the univariate marginal distributions. Thus, generally speaking, the estimationsimplifies to a one-dimensional estimation problem. In contrast, this type of estimation for MGHdistributions can only be perform within the subclass of elliptically contoured distributions.The third stage establishes a fast and simple random-number generator which is based on thewell-known rejection algorithm. In particular, we provide the explicit algorithm, while avoidingdifficult minimization routines, which outperforms the known algorithms in the literature forMGH distributions under particular parameter constellations. Finally, at the fourth stage wepresent a detailed simulation study to illustrate the suitability of MAGH distributions for datamodelling. The Appendix contains various results and proofs which are mainly related to thetail behavior of MGH and MAGH distributions.

2 Multivariate generalized hyperbolic model (MGH)

In the first place, a subclass of MGH distributions, namely the hyperbolic distributions, hasbeen introduced via so-called variance-mean mixtures of inverse Gaussian distributions. Thissubclass suffers from not having hyperbolic distributed marginals, i.e., the subclass is not closedwith respect to passing to the marginal distributions. Therefore and because of other theoreticalaspects, Barndorff-Nielsen (1977) extended this class to the family of MGH distributions. Manydifferent parametric representations of MGH density functions are provided in the literature,see e.g. Blæsild and Jensen (1981). The following density representation is appropriate in ourcontext.

Definition 1 (MGH distribution) An n-dimensional random vector X is said to have a mul-tivariate generalized hyperbolic (MGH) distribution with location vector µ ∈ IRn and scalingmatrix Σ ∈ IRn×n if it has the stochastic representation X

d= A′Y + µ for some lower triangularmatrix A′ ∈ IRn×n such that A′A = Σ is positive-definite and Y has a density function of theform (y ∈ IRn) :

fY (y) = cKλ−n/2(α

√1 + y′y)

(1 + y′y)n/4−λ/2eαβ′y, with c =

αn/2 (1− β′β)λ/2

(2π)n/2Kλ(α√

1− β′β). (1)

Kν denotes the modified Bessel-function of the third kind with index ν (cf. Magnus, Oberhet-tinger, and Soni (1966), pp. 65) and the parameter domain is ‖β‖2 < 1, α > 0 and λ ∈ IR (‖ ·‖2

denotes the Euclidian norm). The family of n-dimensional generalized hyperbolic distributionsis denoted by MGHn(µ,Σ, ω), where ω := (λ, α, β).

An important property of the above parameterization of the MGH density function is its invari-ance under affine-linear transformations. For λ = (n+1)/2 we obtain the multivariate hyperbolicdensity and for λ = −1/2 the multivariate normal inverse Gaussian density. Hence λ = 1 leadsto hyperbolically distributed one-dimensional margins. It can be shown that MGH distributionswith λ = 1 are closed with respect to passing to the marginal distributions and under affine-linear transformations. The latter subclass turns out to be important for practical applications(see also Section 8.1).

An MGH distribution belongs to the class of elliptically contoured distributions if and only ifβ = (0, . . . , 0)′. In this case the density function of X can be represented as

fX(x) = |Σ|−1/2g((x− µ)′Σ−1(x− µ)), x ∈ IRn, (2)

3

for some density generator function g : IR+ → IR+. Consequently, the random vector Y inDefinition 1 is spherically distributed. The density generator g in (2) is given by g(u) =cKλ−n/2(α

√1 + u)/(1 + u)n/4−λ/2, u ∈ IR, with some normalizing constant c. For a detailed

treatment of elliptically contoured distributions, see Fang, Kotz, and Ng (1990) or Cambanis,Huang, and Simons (1981).

Remark. Usually the following representation of an MGH density is given in the literature:

fX(x) = cKλ−n/2(α

√δ2 + (x− µ)′Σ−1(x− µ))

(α−1

√δ2 + (x− µ)′Σ−1(x− µ)

)n/2−λeβ′(x−µ), x ∈ IRn, (3)

with some normalizing constant c. The domain of variation 2 of the parameter vector ω =(λ, α, δ, β) is as follows: λ, α ∈ IR, β, µ ∈ IRn, δ ∈ IR+, β′Σβ < α2 and Σ ∈ IRn×n being a positive-definite matrix with determinant |Σ| = 1. The one-to-one mapping between the parameter vectorω corresponding to (1) and ω corresponding to (3) is given by: λ = λ, µ = µ, α = αδ, β =1/α · Aβ, A′A = Σ, and Σ = δ2Σ.

3 Multivariate affine generalized hyperbolic model (MAGH)

A disadvantage of multivariate generalized hyperbolic distributions (and of many other familiesof multivariate distributions) is that the margins Xi of X = (X1, . . . , Xn)′ are not mutuallyindependent for some choice of the scaling matrix Σ. In other words, they do not allow themodelling of phenomena where random variables result as the sum of independent randomvariables. This shortcoming is serious since the independence may be an undisputable propertyof the problem for which the stochastic model is sought. Furthermore, in case of asymmetry (i.e.,β 6= 0) the covariance matrix is in a complex relationship with the matrix Σ, which is shown inthe next section.

Therefore we propose an alternative concept. Instead of a multivariate generalized hyperbolicdistribution, a distribution is considered which is composed of n independent margins with uni-variate generalized hyperbolic distributions with zero location and unit scaling. Such a canonicalrandom vector is then subject to an affine-linear transformation. As a consequence, the trans-formation matrix can be modelled proportionally to the square root of the covariance-matrixinverse even in the asymmetric case. This property holds, for example, for multivariate normaldistributions.

Definition 2 (MAGH distribution) An n-dimensional random vector X is said to be mul-tivariate affine generalized hyperbolic (MAGH) distributed with location vector µ ∈ IRn andscaling matrix Σ ∈ IRn×n if it has the following stochastic representation X

d= A′Y +µ for somelower triangular matrix A ∈ IRn×n such that A′A = Σ is positive-definite and the random vectorY = (Y1, . . . , Yn)′ consists of mutually independent random variables Yi ∈ MGH1(0, 1, ωi), i =1, . . . , n. In particular the one-dimensional margins of Y are generalized hyperbolic distributed.The family of n-dimensional affine generalized hyperbolic distributions is denoted by MAGHn(µ,Σ, ω),where ω := (ω1, . . . , ωn) and ωi := (λi, αi, βi)′, i = 1, . . . , n.

2 This representation omits the limiting distributions obtained at the boundary of the parameterspace; see e.g. Blæsild and Jensen (1981)

4

Observe that an MAGH distribution has independent margins if the scaling matrix Σ equals theidentity matrix I. However, no MAGH distribution belongs to the class of elliptically contoureddistributions for dimension n ≥ 2 which is illustrated by the density contour-plots in Figure 2.

Fig. 1. Contour-plots of the bivariate density function of an MGH2(0, I, ω) distribution withparameters λ = 1, α = 1 and β = (0, 0)′ (left figure), β = (0.5, 0.25)′ (right figure)

Fig. 2. Contour-plots of the bivariate density function of an MAGH2(0, I, ω) distribution withparameters λ = (1, 1)′, α = (1, 1)′ and β = (0, 0)′ (left figure), β = (0.5, 0.25)′ (right figure).

General affine transformations. The consideration of the lower triangular matrix A′ in thestochastic representations of Definitions 1 and 2 is essential since any other decomposition ofthe scaling matrix Σ would lead to a different class of distributions. This phenomenon and apossible extension are discussed below.

Only the elliptically contoured subclass of the MGH distributions is invariant with respect todifferent decompositions A′A = Σ. In particular, all decompositions of the scaling matrix Σlead to the same distribution since they enter the characteristic function via the form Σ = A′A.Equation (2) also justifies the latter property. However, in the asymmetric or general affinecase this equivalence does not hold anymore. In this case, for example, the matrix A can besought via a singular value decomposition A = UWV ′, where W is a diagonal matrix having

5

the square roots of eigenvalues of Σ = A′A on its diagonal and where the matrix V consistsof the corresponding eigenvectors of Σ. The matrices W and V are directly determined fromΣ whereas the matrix U might be some arbitrary matrix with orthonormal columns (rotationand flip). However, the most common case, of course, is U = I. Here the matrix A is directlycomputed from Σ utilizing its eigenvalues and eigenvectors. Consequently, every margin of Y isdistributed according to a linear combination of the margins of X determined by the principalcomponents (PC) (i.e., the eigenvectors) of the covariance matrix Σ: Y = A′−1X = W−1V ′X.

4 MGH versus MAGH: Advantages and disadvantages

In this section we list and compare some advantages and disadvantages of MGH distributions andMAGH distributions. We start with the distributional flexibility to fit real data. An outstandingproperty of MAGH distributions is that, after an affine-linear transformation, all one-dimensionalmargins can be fitted separately via different generalized hyperbolic distributions. In contrast tothis, the one-dimensional margins of MGH distributions are not that flexible since the parametersα and λ relate to the entire multivariate distribution and determine a strong structural behavior(see Definition 1). However, this structure causes a large subclass of MGH distributions tobelong to the family of elliptically contoured distributions which inherit many useful statisticaland analytical properties from multivariate normal distributions. For example, the family ofelliptically contoured distribution is closed under linear regression and passing to the marginaldistributions (see Cambanis, Huang, and Simons (1981)).

Regarding the dependence structure, the MAGH distributions may have independent marginsfor some parameter constellation (see Theorem 4). In particular, they support models which arebased on a linear combination of independent factors. In contrast, the MGH distributions arenot capable of modelling independent margins. They even yield ”extremal” dependencies forbivariate distributions having correlation zero. Moreover, the correlation matrix of MAGH dis-tributions is proportional to the scaling matrix Σ within a large subclass of asymmetric MAGHdistributions (see Theorem 6), whereas Σ is hardly to interpret for skewed MGH distributions.Further, the copula of MAGH distributions, being the dependence structure of an affine-linearlytransformed random vector with independent components, is quite illustrative and possessesmany appealing modelling properties. On the other hand, the copula structure of MGH distri-butions may suffer from inflexibility. Regarding the dependence of extreme events, the MAGHdistributions can model tail dependence whereas MGH distributions are always tail independent.Therefore, MAGH distributions are suitable especially within the field of risk management.

Sections 6 and 8 reveal that in contrast to MGH distributions, parameter estimation for MAGHdistributions is considerably simpler and more robust. Even in an asymmetric environment,the parameters of MAGH distributions can be identified in a two-stage procedure which has aconsiderable computational advantage in higher dimensions. The same procedure can be appliedfor elliptically contoured MGH distributions (β = 0). The random vector generation algorithmsfor MGH and MAGH distributions turn out to be equally efficient and fast, irrespectively of thedimension.

The simulations in Sections 8 and 9 show that both distributions fit simulated and real data well.Thus, summarizing the above advantages and disadvantages, the MAGH distributions have muchto recommend them regarding their parameter estimation, dependence structure, and randomvector generation. However, it depends also on the kind of application and the user’s taste whichmodel to prefer.

6

5 Some properties of MGH and MAGH distributions

In this section we investigate and compare several statistical-mathematical properties of theMGH distribution and the MAGH distribution. In this context, the dependence structures willbe of our particular interest. It has been already mentioned that the one-dimensional marginalsof both types of distributions are quite flexible. The copula technique combines these marginalswith the respective dependence structure leading to a multidimensional MGH or MAGH distri-bution. The dependence structure of affine-linearly transformed distributions, such as MAGHdistributions, in terms of copulae has not attracted much attention in the literature yet. How-ever, the usefulness of copulae has been shown for many applications especially in finance (seeCherubini, Luciano, and Vecchiato (2004) for an overview). The main contribution of this sec-tion is a detailed analysis of the dependence structure of MGH and MAGH distributions. Weconsider the respective copulae and various dependence measures, such as the covariance andcorrelation coefficients, Kendall’s tau, and tail dependence. It turns out that the dependencestructure of MAGH distributions is quite different to the respective MGH counterpart, althoughfor example, the contour plot in Figure 2 does not reflect this fact. In particular, we show thatthe behavior of common extreme events is different and that for any parameter constellation themargins of the MGH distributions cannot be independent.

It is precisely the copula which encodes all information on the dependence structure unencum-bered by the information on marginal distributions and which couples the marginal distributionsto give the joint distribution. In particular if X = (X1, . . . , Xn)′ has joint distribution F with con-tinuous marginals F1, . . . , Fn, then the distribution function of the vector (F1(X1), . . . , Fn(Xn))′

is a copula C, and F (x1, . . . , xn) = C(F1(x1), . . . , Fn(xn)).

The MGH and MAGH copulae. According to Definitions 1 and 2, the MGH and MAGHdistributions are represented by affine-linear transformations of random vectors following a stan-dardized MGH distribution and MAGH distribution, respectively. Leaving the affine-linear trans-formation aside, we are interested in the dependence structure (copula) of the underlying randomvector. In particular we set the scaling matrix Σ = I and µ = 0. Note that the copula of anMGH or MAGH distribution does not depend on the location vector, i.e., µ is not a copulaparameter.

Theorem 3 Let Y ∈ MGHn(0, I, ω). Then the copula density function of Y is given by

c(u1, . . . , un) = cKλ−n/2(α

√1 + y′y)

(1 + y′y)n/4−λ/2

n∏

i=1

(1 + y2i )

1/4−λ/2

Kλ−1/2(α√

1 + y2i )

exp(αβ′y)exp(

∏ni=1 αβiyi)

∣∣∣∣∣∣yi=F−1

i (ui)

,

for ui ∈ [0, 1], i = 1, . . . , n, and some normalizing constant c. Here Fi refers to the distributionfunction of the one-dimensional margin Yi, i = 1, . . . , n. Let Y ∈ MAGHn(0, I, ω). Then thecorresponding copula equals the independence copula, i.e, the copula density function is given by

c(u1, . . . , un) ≡ 1, ui ∈ [0, 1], i = 1, . . . , n.

Proof. The first part follows from the copula definition. For the second part note that Y hasindependent margins if and only if Y possesses the independence copula C(u1, . . . , un) = u1 ·. . . · un according to Theorem 2.10.14 in Nelsen (1999). 2

Figure 3 clearly reveals that, in contrast to the MGH distribution, the copula of an MAGHdistribution shows strong dependence in the limiting corners of the respective quadrants. Thisproperty is investigated in more detail later when we discuss the concept of tail dependence.

7

Fig. 3. Copula-density function c(u, v) of an MGH2(0, I, ω) distribution (left figure) with pa-rameters λ = 1, α = 1 and β = (0, 0)′ and that of an MAGH2(0, I, ω) distribution (right figure)with arbitrary parameter constellation.

Theorem 4 (Limiting cases) Let Σ(m) := (σ(m)ij )i,j=1,2 be a sequence of symmetric positive-

definite matrices and ρ(m) := σ(m)12 /

√σ

(m)11 σ

(m)22 . Suppose that X(m) ∈ MGH2(µ,Σ(m), ω) or

X(m) ∈ MAGH2(µ,Σ(m), ω) for every m ∈ IN, and let C(m) denote the corresponding copula.Denote with W (u1, u2) = max(u1 +u2−1, 0) and M(u1, u2) = min(u1, u2) the well-known lowerand upper Frechet copula bounds and let Π(u1, u2) = u1 · u2 be the product or independencecopula. Then

i) C(m) → M pointwise if ρ(m) → 1, σ(m)ij → σij 6= 0, i, j = 1, 2, as m →∞,

ii) C(m) → W pointwise if ρ(m) → −1, σ(m)ij → σij 6= 0, i, j = 1, 2, as m →∞,

iii) if X(m) ∈ MGH2(µ,Σ(m), ω), then C(m) 6= Π for each parameter constellation, and

iv) if X(m) ∈ MAGH2(µ,Σ(m), ω), then C(m) = Π if and only if Σ(m) = I.

Proof. i)+ii) Consider the Cholesky decomposition of Σ(m) and use the fact that the copulaC of X is invariant under strictly increasing transformations of the margins since X possessescontinuous marginal distribution functions. iii) Suppose X(m) ∈ MGH2(µ,Σ(m), ω). Then X(m)

possesses the product copula if and only if it has independent margins (see Theorem 2.4.2 inNelsen (1999)). According to Definition 1, X(m) does not have independent margins if Σ(m) isnot a diagonal matrix. Thus it suffices to consider Σ(m) = I. Further we can put β = 0 since βhas no influence on the factorization of the density function of an MGH distribution (see formula(1)). However, in that case X(m) belongs to the family of elliptically contoured distributions.Therefore Theorem 4.11 in Fang, Kotz, and Ng (1990) implies that X(m) possesses independentmargins if and only if X(m) has a bivariate normal distribution. Since normal distributions andMGH distributions are disjoint classes of distributions, the assertion follows. Part iv) followswith the definition of MAGH distributions. 2

Remark. The results of Theorem 4 can be extended to n-dimensional MGH and MAGH dis-tributions. However, for n ≥ 3 the lower Frechet bound is not a copula function anymore; seeTheorem 2.10.13 in Nelsen (1999) for an interpretation of the lower Frechet bound in that case.

The covariance and correlation matrix. Among the large number of dependence measuresfor multivariate random vectors the covariance and the correlation matrix are still the most

8

favorite ones in most practical applications. However, according to Embrechts, McNeil, andStraumann (2002), these dependence measures should be considered with care for non-ellipticallycontoured distributions such as MAGH distributions.

Theorem 5 (Mean and covariance for MGH distributions)Let X ∈ MGHn(µ,Σ, ω) and define Rλ,i(x) := Kλ+i(x)

xiKλ(x). Then the mean vector and the covariance

matrix of X are given by

E[X] = µ + αRλ,1

(√α2(1− β′β)

)A′β and

Cov[X] = Rλ,1

(√α2(1− β′β)

)Σ +

[Rλ,2

(√α2(1− β′β)

)−R2

λ,1

(√α2(1− β′β)

)] A′ββ′A1− β′β

.

For the symmetric case β = (0, . . . , 0)′ and λ = 1, the mean vector and the covariance matrix ofX simplify to E[X] = 0 and Cov[X] = K2(α)/(αK1(α)) · Σ.

Proof. Let X ∈ MGHn(µ, Σ, ω) with parameter representation as in (3). Then X is distributedaccording to a variance-mean mixture of a multivariate normal distribution, i.e., X|(Z = z) ∼N(µ + zΣβ, zΣ), where the mixing random variable Z is distributed according to a generalizedinverse Gaussian distribution GIG(λ, δ,

√δ2(α2 − β′Σβ)) (see e.g. Barndorff-Nielsen, Kent, and

Sørensen (1982)). The respective mean vector and covariance can be calculated via this rep-resentation (see Eberlein and Prause (2002) for more details) utilizing the parameter mappingδ2(α2 − β′Σβ) = α2(1− β′β), Σβδ2 = ΣαA−1β and δ4Σββ′Σ = A′ββ′A. 2

Theorem 6 (Mean and covariance for MAGH distributions)Let X ∈ MAGHn(µ,Σ, ω). Then the mean vector and the covariance matrix are given by

E[X] = A′eY + µ and Cov[X] = A′CA,

where eY = (E[Y1], . . . , E[Yn])′ with E[Yi] = Rλi,1(√

α2i (1− β2

i ))αiβi and C = diag(c11, . . . ,

cnn) with

cii = Rλi,1(√

α2i (1− β2

i )) +[Rλi,2(

√α2

i (1− β2i ))−R2

λi,1(√

α2i (1− β2

i ))] β2

i

1− β2i

.

The covariance matrix Cov[X] is proportional to Σ if α = αi, β = βi and λ = λi for alli = 1, . . . , n.

Proof. The assertion follows immediately from Theorem 5 and Definition 2. 2

Kendall’s tau. The correlation coefficient is a measure of linear dependence between two ran-dom variables and therefore it is not invariant under monotone increasing transformations.However, not only does ”scale-invariance” present an undisputable requirement for a properdependence measure in general (cf. Joe (1997), Chapter 5), but also in practice ”scale-invariant”dependence measures play an increasing role in dependence modelling. Kendall’s tau is the mostfamous one and therefore we determine it for MGH and MAGH distributions.

Definition 7 (Kendall’s tau) Let X = (X1, X2)′ and X = (X1, X2)′ be independent bivariaterandom vectors with common continuous distribution function F and copula C. Kendall’s tau isdefined by

τ = IP((X1 − X1)(X2 − X2) > 0)− IP((X1 − X1)(X2 − X2) < 0) = 4∫

[0,1]2

C(u, v) dC(u, v)− 1.

9

Theorem 8 Let Σ = (σij)i,j=1,2 be a positive-definite matrix and ρ := σ12/√

σ11σ22.

i) If X ∈ MGH2(µ,Σ, ω) with β = 0, then τ = 2π arcsin(ρ).

ii) If X ∈ MAGH2(µ,Σ, ω) with stochastic representation Xd= A′Y + µ, A′A = Σ, then for

ρ 6= 0

τ =4|c|

∫

IR2

fY1(x1)fY2

(x2 − x1

c

)·

x1∫

−∞FY2

(x2 − z

c

)fY1(z) dzd(x1, x2)− 1, (4)

where c := sgn(ρ)√

1/ρ2 − 1. Further τ = 0 for ρ = 0.

The proof is given in the Appendix.

Remark. An explicit expression of Kendall’s tau for MAGH distributions (given by (4)) cannotbe expected. However, since the density functions of Y1 and Y2 are available, formula (4) yields atractable numerical solution. Further, the values of Kendall’s tau and the correlation coefficientfor the class of bivariate MGH and MAGH distributions cover the entire interval [−1, 1]. ForKendall’s tau, this can be seen from part i) and ii) in Theorem 4 (note that Kendall’s tau isa measure of concordance). For the correlation coefficient of an MGH distribution, Theorem 5implies that the parameter ρ ∈ [−1, 1] corresponds to the correlation coefficient if β = 0. Theconditions where ρ equals the correlation coefficient for an MAGH distribution are stated inTheorem 6.

Tail dependence. A strong emphasis in this paper is put on the dependence structure of MGHand MAGH distributions. In this context we establish the following result about the dependencestructure of extreme events (tail dependence or extremal dependence) related to the latter typesof distributions. The importance of tail dependence especially in financial risk management isaddressed in Hauksson et al. (2001) and Embrechts, Lindskog, and McNeil (2003). The followingdefinition (according to Joe (1997), p. 33) represents one of many possible definitions of taildependence.

Definition 9 (Tail dependence) Let X = (X1, X2)′ be a 2-dimensional random vector. Wesay that X is upper tail-dependent if

λU := limu→1−

IP(X1 > F−11 (u) | X2 > F−1

2 (u)) > 0, (5)

where the limit is assumed to exist and F−11 , F−1

2 denote the generalized inverse distributionfunctions of X1, X2, respectively. Consequently, we say that X is upper tail-independent if λU

equals 0. Similarly, X is said to be lower tail-dependent (lower tail-independent) if λL > 0(λL = 0), where λL := limu→0+ IP(X1 ≤ F−1

1 (u) | X2 ≤ F−12 (u)) provided they exist.

Figure 3 reveals that bivariate standardized MGH distributions show more evidence of de-pendence in the upper-right and lower-left quadrant of its distribution function than MAGHdistributions. However, the following theorem shows that MGH distributions are always tail in-dependent whereas non-standardized MAGH distributions can even model tail dependence. Forthe sake of simplicity we restrict ourselves to the symmetric case β = 0.

Theorem 10 Let Σ = (σij)i,j=1,2 be a positive-definite matrix and ρ := σ12/√

σ11σ22. Supposeβ = 0. Theni) the MGH2(µ,Σ, ω) distributions are upper and lower tail-independent,ii) the MAGH2(µ,Σ, ω) distributions are upper and lower tail-independent ifα2 < α1

√1/ρ2 − 1 or ρ ≤ 0, and

10

iii) the MAGH2(µ,Σ, ω) distributions are upper and lower tail-dependent ifα2 > α1

√1/ρ2 − 1 and ρ > 0.

The proof is given in the Appendix.

Remark. Additionally to Theorem 10 it can be shown that an MGH2(µ,Σ, ω) distribution(with β = 0) is tail independent if α2 = α1

√1/ρ2 − 1 and λ2 < λ1.

Other properties. It has been already mentioned that the family of MGH distributions isclosed with respect to the marginal distributions. Similarly, if X = (X1, . . . , Xn)′ is an MAGH-distributed random vector, then the partitions X(1) = (X1, . . . , Xk)′ and X(2) = (Xk+1, . . . ,Xn)′ are MAGH distributed. Further, the conditional distribution of X(2) given X(1) and regu-lar affine-linear transformations of X are MAGH, too. For general partitions of X and singularaffine-linear transformations, the latter properties hold also if the definition of an MAGH dis-tribution (see Definition 2) allows for the stochastic representation X = A′Y + µ, where Y isnot necessarily n-dimensional. However, in this case the stochastic representation is not uniqueanymore (which is comparable to elliptically contoured distributions). We may also allow forgeneral decompositions of the scaling matrix Σ = A′A in Definition 2 in order to remain inthe class of MAGH distributions after a permutation of the margins of X (cf. the discussion inSection 3).

6 Parameter estimation

Following a brief setup of the estimation procedure, we show that the parameters of the MAGHdistribution can be estimated with a simple two-stage method. This method refers to the estima-tion of the covariance matrix and the parameters which correspond to the univariate marginaldistributions. Thus, generally speaking, the estimation reduces to a one-dimensional estimationproblem. In contrast, for the MGH distributions one can only perform this type of estimationwithin the subclass of elliptically contoured distributions.

Minimizing the cross entropy. A common descriptive statistics to measure the similaritybetween two (multivariate) distribution or density functions f∗ and f, respectively, is given bythe (directed) Kullback divergence (see Kullback (1959) and Ullah (1996)) which is defined as

HK(f, f∗) :=∫

IRn

f∗(x) logf∗(x)f(x)

dx =∫

IRn

f∗(x) log f∗(x)dx−∫

IRn

f∗(x) log f(x)dx.

The Kullback divergence HK is zero if and only if the densities f and f∗ coincide. It is alsoadditive across the marginal densities if the marginal distributions are stochastically indepen-dent. This is one desired property of a pseudo-distance measure for multidimensional randomvectors. In our context, we use the Kullback divergence to measure the similarity between the”true” density f∗ and its approximation f . For example, minimizing the Kullback divergence byvarying f is a common way to find a good approximation (or good fit) of the ”true” density f∗.This kind of descriptive statistics is frequently used if common goodness-of-fit tests such as theχ2-square test turn out to be too complicated. The latter is often the case for high-dimensionaldistributions. In Sections 8 and 9 we investigate the problem of goodness-of-fit for simulatedand real data by means of Kullback divergence and χ2-square tests.

Note that the first term of HK is constant and can be dropped. The resulting expression iscalled cross entropy. Because f∗ is unknown, we approximate it by its empirical counterpart. In

11

this case, minimizing the cross entropy (or Kullback divergence) coincides with the concept ofmaximum likelihood.

For computational reasons it may be advantageous to work with the standardized vector y =B(x−µ) where B := A′−1. The density transformation theorem yields that the density functionfX of X

d= A′Y + µ ∈ MGH(µ,Σ, ω) is given by fX(x) = fA′Y +µ(x) = |B|fY (y), with |B| > 0being the determinant of B and parameters η = (µ,B, ω) and ω = (λ, α, β).

While the location vector µ can take arbitrary values in IRn, B and ω are subject to variousconstraints. The matrix B must be triangular with positive diagonal such that A′A is positive-definite. The parameter α is supposed to be positive and the vector β must fulfill ||β||2 ≤ 1.

Many approaches are possible regarding the latter optimization problem, inter alia we mentiontwo methods:

• constrained nonlinear optimization methods, and

• unconstrained nonlinear optimization methods after suitable parameter transformations.

We prefer the unconstrained approach due to reasons of robustness. The following parametertransformations are appropriate. The matrix B can be sought of the form B = UD where Dis some diagonal matrix having strictly positive elements and U is a triangular matrix havingonly ones on its diagonal. In order to enforce the strict positivity of the diagonal elementsdii, i = 1, . . . , n, of D, the following transformations are applied:

bii = dii = eνi , i = 1, . . . , n,

bij = diiuij = eνiuij , i = 1, . . . , n− 1, j = 2, . . . , n, j > i,

with unknown parameters νi and uij . The parameter α is estimated via the same exponentialmap. For the vector β we utilize the smooth transformation

β = γ · 11 + exp(−‖γ‖2)

· 1‖γ‖2

. (6)

The optimization algorithm we use belongs to the probabilistic ones (note that the objectivefunction is non-convex) and consists of two characteristic phases:

1. In a global phase the objective function is evaluated at a random number of points being partof the search space.

2. In a local phase the samples are transformed to become candidates for local optimizationroutines.

Results concerning the convergence of probabilistic algorithms to the global minimum are, forexample, given in Rinnooy Kan and Timmer (1987). The latter reference motivates us to use aMulti-Level Single-Linkage procedure. Here the global optimization method generates randomsamples from the search space by identifying the samples belonging to the same objective func-tion attractor and eliminating multiple ones. For the remaining samples in the local phase aconjugate gradient method (see Fletcher (1987)) is started. Further, a Bayesian stopping rule is

12

applied in order to assess the probability that all attractors have been explored. For an accounton the Bayesian stopping rule we refer the reader to Boender and Rinnooy Kan (1987).

Estimation of MAGH distributions. A big advantage of the MAGH distribution is that itsparameters can be easily identified in a two-stage procedure comprising the following steps:

(1) Compute the sample covariance matrix S of the random vector X. Transform X to thevector Y = BX with independent margins. The matrix B is received via Cholesky decom-position S−1 = B′B.

(2) Identify the parameters λi, αi and βi which belong to the univariate marginal distributionsof Y. The location vector µ can be received via B−1e with ei being the location parameterof Yi (see Theorem 5). The scaling matrix Σ equals B−1′DB−1 where the diagonal matrixD is determined by the scaling parameters of Yi on its diagonal.

The latter procedure considerably simplifies the complexity of the numerical optimization. Notethat the parameters of the one-dimensional MAGH distributions can be estimated via uncon-strained optimization as explained above. We remark that the two-stage estimation may affectthe (asymptotic) efficiency of the estimation. However, a theoretical analysis of the loss of effi-ciency goes beyond the scope of this article. Though, our empirical results show that this two-stage estimation is quite robust with respect to the finite-sample volatility of the correspondingestimators.

It is important for applications that the univariate densities are not necessarily identically pa-rameterized. This means that the margins may have different parameters λi, αi, βi, i = 1, . . . , n.In other words, there is a considerable freedom of choosing the parameters λ, α, and β. There-fore, in addition to a parameterization similar to that for MGH distributions (i.e., the sameλ and α for all one-dimensional margins and different βi) two further extreme alternatives arepossible:

• Minimum parameterization: Equal parameters λ, α and β for all one-dimensional margins.

• Maximum parameterization: Individual parameters λi, αi and βi for all one-dimensional mar-gins.

The appropriate parameterization depends on the kind of application and the available datavolume. The optimization procedure presented in this section is a special case of an identifica-tion-algorithm for conditional distributions explored in Stutzle and Hrycej (2001, 2002a, 2002b).

7 Sampling from MGH and MAGH distributions

Complementing the question of estimation as described in Section 6, we provide now an efficientand self-contained generator of random vectors for the families of MGH and MAGH distribu-tions. The generator, which is based on a rejection algorithm, comprises several features whichto our knowledge have not been published yet. In particular, we provide an explicit algorithm,while avoiding difficult minimization routines, which outperforms the known algorithms in theliterature for MGH distributions with general parameter constellations. In principle, the gen-eration of random vectors reduces to the generation of one-dimensional random numbers. For

13

MGHn(µ,Σ, ω) distributions this is possible via the following variance-mean mixture represen-tation. Let the random variable Z be distributed according to a generalized inverse Gaussiandistribution with parameters λ, χ, and ψ. In particular the latter family is referred to as theGIG(λ, χ, ψ) distributions. Then, X ∈ MGHn(µ,Σ, ω) is conditionally normally distributedwith mixing random variable Z, i.e., X|(Z = z) ∼ Nn(µ + zβ, z∆), where ∆ ∈ IRn×n is a sym-metric positive-definite matrix with determinant |∆| = 1 and µ, β ∈ IRn. The parameters Σ andω = (λ, α, β) are given by

α =√

(ψ + β′∆β)χ, β = 1/

√ψ + β′∆βLβ, Σ = χ ·∆,

with Cholesky decomposition L′L = ∆. The inverse map is given by

χ = |Σ|1/n, ψ = α2/|Σ|1/n · (1− β′β), ∆ = Σ/|Σ|1/n, and β = α · (A)−1β,

with Cholesky decomposition A′A = Σ.

The sampling algorithm is now of the following form: A pseudo random number is sampled froma random variable Z having a generalized inverse Gaussian distribution with parameters λ, χ,and ψ. Then an n-dimensional random vector X being conditionally normally distributed withmean vector µ + Z∆β (”drift”) and covariance matrix Z∆ (determinant |∆| = 1) is generated.

The density function of the generalized inverse Gaussian distribution GIG(λ, χ, ψ), is given by

fZ(x) = c · xλ−1 exp(− χ

2x− ψx

2

), x > 0, (7)

with normalizing constant c = (ψ/χ)λ/2/(2Kλ(√

ψχ)). The range of the parameters is given by

i) χ > 0, ψ ≥ 0 if λ < 0, or

ii) χ > 0, ψ > 0 if λ = 0, or

iii) χ ≥ 0, ψ > 0 if λ > 0.

The following algorithm is formulated for generalized hyperbolic distributions MGHn(µ,Σ, ω)with parameter λ = 1. Section 8.1 justifies the restriction to this class of distributions. In thiscontext the GIG(λ, χ, ψ) distribution is referred to as inverse Gaussian distribution. However,the algorithm can be extended to general λ. Our empirical study shows that the algorithmoutperforms the efficiency of the sampling algorithm proposed by Atkinson (1982) which suitesto a larger class of distributions (see also Prause (1999), Section 4.6). Moreover, the algorithmavoids tedious minimization routines and time-consuming evaluations of the Bessel functionKλ. The generation utilizes a rejection method (see Ross (1997), pp. 565) with a three partrejection-envelop. We define the envelop d : IR+ → IR+ by

d(x) :=

d1(x) = ca1 exp(b1x), if 0 < x < x1,

d2(x) = ca2, if x1 ≤ x < x2,

d3(x) = ca3 exp(−b3x), if x2 ≤ x < ∞,

(8)

with ai > 0, i = 1, . . . , 3, bi > 0, i = 1, 3, and x1 ≤ x2 ≤ x3 to be defined later. Letzi, i = 1, 3 denote the inflection points and z2 =

√χ/ψ denote the mode of the unimodal

14

density fZ . Further we require d1(z1) = fZ(z1), d2(z2) = fZ(z2), d3(z3) = fZ(z3). The pointsx1 > 0 and x2 > 0 correspond to the intersection points of d1, d2 and d2, d3, respectively, i.e.d1(x1) = d2(x1), d2(x2) = d3(x2).

Fig. 4. Three part envelop d for the inverse Gaussian density function fZ with parametersχ = 1 , ψ = 1.

Primarily, the rejection method requires the generation of random numbers with density s ·d(x)where the scaling factor s has to be computed in order to obtain a density function s·d(x), x > 0.This scaling factor is derived below.

Pseudo algorithm for generating an inverse Gaussian random number:

(1) Compute the zeros z1, z2 for ψ2z4 − 2χψz2 − 4χz + χ2 = 0.

(2) Set b1 = (χ/z21 − ψ)/2 and a1 = exp(−χ/z1).

(3) Set a2 = exp(−√χψ).

(4) If (ψ − χ/z22)/2 > 0 then

Set b3 = (ψ − χ/z22)/2 and a3 = exp(−χ/z2).

Else Set b3 = ψ/2 and a3 = 1.

(5) Set x1 = ln(a2/a1)/b1 and x2 = − ln(a2/a3)/b3.

(6) Sets =

(a1

b1exp(b1x1)− a1

b1+ (x2 − x1)a2 +

a3

b3exp(−b3x2)

).

(7) Set

k1 =1s

(a1

b1exp(b1x1)− a1

b1

)and k2 = k1 +

1s(x2 − x1)a2.

(8) Generate independent and uniformly distributed random numbers U and V on the interval[0, 1].

(9) If U ≤ k1 goto step 10.ElseIf k1 < U ≤ k2 goto step 11.Else goto step 12.

(10) Setx =

1b1

ln(b1

a1sU + 1).

If

15

V ≤ fZ(x)d1(x)

=1a1

exp(− (

χx−1 + ψx

2+ b1x)

),

Then Return xElse goto step 8.

(11) Setx =

sU

a2− a1

b1a2

(exp(b1x1)− 1

)+ x1.

If

V ≤ fZ(x)d2(x)

=1a2

exp(− (

χx−1 + ψx

2)),

Then Return x.Else goto step 8.

(12) Set

x = − 1b3

ln[− b3

a3

{sU − a1

b1(eb1x1 − 1)− (x2 − x1)a2 − a3

b3e−b3x2

}].

If

V ≤ fZ(x)d3(x)

=1a3

exp(− (

χx−1 + ψx

2) + b3x

),

Then Return x.Else goto step 8.

Remark. In order to generate a sequence of inverse Gaussian random numbers repeat step 8.

So far we have generated random numbers from an univariate inverse Gaussian distribution. Weturn now to the generation of multivariate generalized hyperbolic random vectors. For this weexploit the above introduced mixture representation.

Pseudo algorithm for generating an MGH vector:

(1) Set ∆ = L′L via Cholesky decomposition.

(2) Generate an inverse Gaussian random number Z with parameters χ and ψ.

(3) Generate a standard normal random vector N.

(4) Return X = µ + Z∆β +√

ZL′N.

Pseudo algorithm for generating an MAGH vector:

(1) Set Σ = A′A via Cholesky decomposition.

(2) Generate a random vector Y with independent MGH1(0, 1, ωi), i = 1, . . . , n, distributedcomponents (see above).

(3) Return X = µ + A′Y.

Table 1 presents the empirical efficiency of the MGH random vector generator for various pa-rameter constellations. In our framework, efficiency is defined by the following ratio

Efficiency =] of generated samples

] of algorithm-passes including rejections.

16

χ/ψ 0.1 0.5 1 2 5 10

0.1 0.94 0.913 0.904 0.886 0.877 0.872

0.5 0.916 0.884 0.877 0.865 0.859 0.852

1 0.901 0.877 0.867 0.863 0.857 0.851

2 0.889 0.866 0.857 0.856 0.856 0.845

5 0.876 0.858 0.854 0.852 0.847 0.849

10 0.866 0.86 0.851 0.853 0.842 0.847Table 1Empirical efficiency of the MGH random number generator for λ = 1 and 10, 000 generatedsamples.

8 Simulation and empirical study

A series of computational experiments with simulated data is performed in this section. Theexperiments disclose that

• MGH density functions with parameter λ ∈ IR seem to have very close counterparts in theMGH-subclass with parameter λ = 1 (in short: MH distribution),

• The MAGH distribution can closely approximate the MGH distribution with similar para-meter values.

8.1 General MGH distributions versus MGH distributions with parameter λ = 1 (MH)

Figure 5 shows the identification results of two univariate MGH distributions and two univariateMH distributions (i.e., λ = 1). All four plots simultaneously show the respective fit via MGHdistribution (bright dotted line) and via MH distribution (dark dotted line). In both plots onthe left side of the figure, samples were drawn from an MH distribution. The fit illustrates thephenomenon that although the identification procedure with MGH densities frequently producesλ 6= 1, the approximation of the original MH density function remains good. The plots on theright side illustrate the opposite case, namely, a good approximation of the MGH density function(λ 6= 1) via an MH density function (λ = 1).

Consider now a bivariate MGH2(µ,Σ, ω) distribution with Σ = (σij)i,j=1,2. The mutual tradeoffbetween λ, α, and the scaling parameters S1 :=

√σ11 and S2 :=

√σ22 of the corresponding dis-

tribution function is shown in Table 2. While all samples were drawn from an MGH distributionwith parameter λ = 1 (MH distrubution), MGH identification usually leads to an overestimationof λ which is traded off by lower values of α, S1, and S2. In contrast to that, the parameteridentifications via MH distributions are close to the reference values. However, the differencesbetween the cross entropies are hardly discernible, showing that both parameter combinationscorrespond to densities which are close to each other.

The following conclusions can be drawn:

• The fit of both distributions, MGH and MH distribution, measured by cross entropy and visualcloseness of the density plots, is satisfying. This implies the existence of multiple parameter

17

0

0.05

0.1

0.15

0.2

0.25

0.3

-4 -2 0 2 4 6 8 10

E=2.0000_S=1.0000_L=1.0000_A=1.0308_B=0.2425E=1.9058_S=1.0108_L=1.4342_A=1.2406_B=0.2451E=1.9104_S=1.1935_L=1.0000_A=1.3317_B=0.2634

(1.91, 1.19, 1.00, 1.33, 0.26)′ (dark dotted line)

(1.91, 1.01, 1.43, 1.24, 0.24)′ (bright dotted line)

(m(µ), m(√

Σ), m(λ), m(α), m(β))′ =

(2.00, 1.00, 1.00, 1.03, 0.24)′ (solid line)

(µ,√

Σ, λ, α, β)′ =

0

0.1

0.2

0.3

0.4

0.5

0.6

-1 0 1 2 3 4 5 6

E=2.0000_S=1.0000_L=1.0000_A=2.2500_B=0.1111E=1.9789_S=0.9929_L=1.5171_A=2.5770_B=0.1079E=1.9784_S=1.0988_L=1.0000_A=2.6728_B=0.1152

(1.98, 1.10, 1.00, 2.67, 0.12)′ (dark dotted line)


(m(µ), m(√

Σ), m(λ), m(α), m(β))′ =

(2.00, 1.00, 1.00, 2.25, 0.11)′ (solid line)

(µ,√

Σ, λ, α, β)′ =

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

-6 -4 -2 0 2 4 6 8 10 12 14 16

E=0.0000_S=1.0000_L=2.0000_A=1.0000_B=0.5000E=0.0686_S=1.0665_L=1.9044_A=1.0478_B=0.4928

E=-0.0606_S=2.3628_L=1.0000_A=2.2544_B=0.5433

(0.06, 2.36, 1.00, 2.25, 0.54)′ (dark dotted line)


(m(µ), m(√

Σ), m(λ), m(α), m(β))′ =

(0.00, 1.00, 2.00, 1.00, 0.50)′ (solid line)

(µ,√

Σ, λ, α, β)′ =

0

0.05

0.1

0.15

0.2

0.25

0.3

-6 -4 -2 0 2 4 6

E=0.0000_S=1.0000_L=6.0000_A=2.0000_B=0.0000E=-0.0070_S=2.3608_L=3.6424_A=4.3219_B=0.0021E=-0.0201_S=3.6554_L=1.0000_A=5.7631_B=0.0037

(0.02, 3.66, 1.00, 5.76, 0.00)′ (dark dotted line)


(m(µ), m(√

Σ), m(λ), m(α), m(β))′ =

(0.00, 1.00, 6.00, 2.00, 0.00)′ (solid line)

(µ,√

Σ, λ, α, β)′ =

Fig. 5. Univariate MGH and MH distributions. Reference densities (solid lines) and identifiedMGH and MH densities (bright and dark dotted lines) averaged from 100 random samples forvarious parameter constellations. The values m(·) denote the sample mean of the respectiveparameters.

18

constellations for MGH distributions which lead to quite similar density functions. Similarresults have been observed for the corresponding tail functions.

• Generalized hyperbolic densities seem to have very close counterparts in the class of MHdistributions (even for large λ). Therefore, the class of MH distributions will be sufficientlyrich for our considerations.

In view of the above results, only MH distributions and the corresponding MAH distributions(multivariate affine generalized hyperbolic distributions with λ = 1) will be considered in thenext section.

8.2 MH distributions versus MAH distributions

Parameter estimates are compared for the following three classes of bivariate distributions:

(1) MH distributions.

(2) MAH distributions with minimal parameter configuration (same value of α and β for eachmargin).

(3) MAH distributions with maximal parameter configuration (different values of αi and βi foreach margin).

All types of distributions in Table 3 have been identified from data sampled from an MH dis-tribution. The two-stage algorithm introduced in Section 6 has been used for the identificationof the MAHmax model (determining first the sample correlation matrix, then transformingthe variables, and finally identifying the univariate distributions). The identification results areprovided in Table 3.

The following conclusions can be drawn:

• For all three models, most parameters show an acceptable fit regarding the sample bias and thesample standard deviation. The relative variability of the estimates increases with decreasingα (fatter tailed distributions). Such fatter tailed distributions seem to be more ill-posed withrespect to the estimation of individual parameters.

• The differences between the parameter estimates obtained either for the MH, the MAHmin,or the MAHmax distribution are negligible (although the data are drawn from an MH distri-bution).

• The fit in terms of the cross entropy does not differ significantly between the various models.As expected, the MAHmax estimates are closer to the MH reference distribution than are theMAHmin estimates in terms of the cross entropy (Note that in one case they are even betterthan the MH estimates). The fitting capability of the MAHmax model comes at the expenseof a larger variability and a sometimes larger bias (”overlearning effect”).

19

Tab

le2

Biv

aria

teM

GH

and

MH

dist

ribu

tion

.For

each

para

met

erλ,α,S

1=√ σ

11,an

dS

2=√ σ

22

the

tabl

elis

tsth

ere

fere

nce

valu

e,th

esa

mpl

em

ean

m(·)

,an

dth

esa

mpl

est

anda

rdde

viat

ion

σ(·)

ofth

epa

ram

eter

esti

mat

ions

deri

ved

from

100

sam

ples

ofsa

mpl

e-si

ze10

00ea

ch.In

the

last

colu

mn,

the

sam

ple

mea

nan

dth

esa

mpl

est

anda

rdde

viat

ion

ofth

eco

rres

pond

ing

cros

sen

trop

yH

are

prov

ided

.

λm

(λ)

σ(λ

)α

m(α

)σ(α

)S

1m

(S1)

σ(S

1)

S2

m(S

2)

σ(S

2)

m(H

)σ(H

)

MG

H1.

0000

1.10

610.

0571

0.32

000.

2724

0.08

670.

3200

0.25

740.

0735

0.32

000.

2595

0.07

463.

4525

0.04

38

MH

1.00

001.

0000

0.00

000.

3200

0.32

860.

1228

0.32

000.

3229

0.10

780.

3200

0.32

430.

1119

3.41

770.

3450

MG

H1.

0000

1.37

450.

1651

1.14

560.

8758

0.20

851.

0000

0.70

960.

1341

1.00

000.

7141

0.13

433.

8683

0.04

55

MH

1.00

001.

0000

0.00

001.

1456

1.16

090.

2922

1.00

000.

9954

0.19

121.

0000

1.00

370.

1931

3.86

830.

0454

MG

H1.

0000

2.17

790.

5777

2.24

001.

8217

0.51

511.

0000

0.68

970.

1603

1.00

000.

6931

0.16

212.

5389

0.04

11

MH

1.00

001.

0000

0.00

002.

2400

2.45

430.

6451

1.00

001.

0475

0.18

591.

0000

1.05

220.

1862

2.53

900.

0411

20

Tab

le3

Biv

aria

teM

Han

dM

AH

dist

ribu

tion

s.R

efer

ence

valu

e,sa

mpl

ebi

asan

dsa

mpl

est

anda

rdde

viat

ion

(in

brac

kets

)ar

epr

ovid

edfo

rth

ere

spec

tive

para

met

eres

tim

ates

deri

ved

from

100

sam

ples

ofsa

mpl

e-si

ze10

00ea

ch.D

enot

eD

ep.P

ar.:=

σ12/(S

1S

2).

The

last

colu

mn

lists

the

sam

ple

mea

nan

dth

esa

mpl

est

anda

rdde

viat

ion

ofth

ecr

oss

entr

opy

H.

Dis

trib

uti

on

µ1

µ2

S1

S2

α1

α2

β1

β2

Dep

.Par.

Cro

ssE

nt.

H

0.0

00

valu

e:0.0

00

valu

e:1.0

00

valu

e:1.0

00

valu

e:1.0

00

valu

e:1.0

00

valu

e:0.0

00

valu

e:0.0

00

valu

e:0.0

00

valu

e:

MA

Hm

in0.0

04

(0.0

85)

0.0

02

(0.0

87)

0.0

19

(0.2

08)

0.0

25

(0.2

13)

0.0

40

(0.2

65)

0.0

40

(0.2

65)

0.0

02

(0.0

32)

0.0

02

(0.0

32)

0.0

05

(0.0

37)

3.7

69

(0.0

39)

MA

Hm

ax

0.0

05

(0.1

11)

0.0

14

(0.1

12)

0.0

34

(0.3

23)

0.0

60

(0.3

00)

0.0

64

(0.3

98)

0.1

02

(0.3

85)

0.0

06

(0.0

45)

0.0

03

(0.0

44)

0.0

05

(0.0

48)

3.7

68

(0.0

39)

MH

0.0

03

(0.0

95)

0.0

07

(0.0

94)

0.0

23

(0.1

63)

0.0

29

(0.1

68)

0.0

42

(0.2

10)

0.0

42

(0.2

10)

0.0

05

(0.0

40)

0.0

01

(0.0

39)

0.0

03

(0.0

36)

3.7

53

(0.0

38)

0.0

00

valu

e:0.0

00

valu

e:0.3

20

valu

e:0.3

20

valu

e:0.3

20

valu

e:0.3

20

valu

e:0.0

00

valu

e:0.0

00

valu

e:0.0

00

valu

e:

MA

Hm

in0.0

22

(0.8

30)

0.0

51

(0.6

44)

0.0

01

(0.1

26)

0.0

03

(0.1

31)

0.0

08

(0.1

37)

0.0

08

(0.1

37)

0.0

04

(0.1

15)

0.0

04

(0.1

15)

0.0

19

(0.1

09)

3.4

57

(0.1

12)

MA

Hm

ax

0.0

60

(0.4

64)

0.0

15

(0.2

00)

0.0

00

(0.2

05)

0.0

31

(0.1

94)

0.0

11

(0.2

14)

0.0

17

(0.2

03)

0.0

20

(0.1

11)

0.0

08

(0.1

22)

0.0

04

(0.2

14)

3.3

82

(0.1

42)

MH

0.0

71

(0.6

97)

0.1

00

(1.0

15)

0.0

03

(0.1

08)

0.0

04

(0.1

12)

0.0

09

(0.1

23)

0.0

09

(0.1

23)

0.0

04

(0.0

35)

0.0

07

(0.0

67)

0.0

12

(0.1

05)

3.4

18

(0.3

45)

0.0

00

valu

e:0.0

00

valu

e:1.1

55

valu

e:1.1

55

valu

e:2.2

36

valu

e:2.2

36

valu

e:0.0

00

valu

e:0.0

00

valu

e:0.5

00

valu

e:

MA

Hm

in0.0

01

(0.1

20)

0.0

01

(0.0

75)

0.1

39

(0.2

38)

0.1

43

(0.1

91)

0.0

77

(0.6

26)

0.0

77

(0.6

26)

0.0

03

(0.0

42)

0.0

03

(0.0

42)

0.0

55

(0.0

27)

2.6

87

(0.0

40)

MA

Hm

ax

0.0

02

(0.1

10)

0.0

02

(0.1

27)

0.0

85

(0.2

94)

0.0

56

(0.3

20)

0.3

18

(1.0

83)

0.2

70

(0.9

69)

0.0

04

(0.0

64)

0.0

01

(0.0

53)

0.0

04

(0.1

45)

2.6

86

(0.0

40)

MH

0.0

03

(0.1

18)

0.0

02

(0.0

83)

0.1

59

(0.2

24)

0.1

28

(0.1

77)

0.1

28

(0.6

03)

0.1

28

(0.6

03)

0.0

03

(0.0

58)

0.0

00

(0.0

48)

0.0

54

(0.0

26)

2.6

80

(0.0

40)

2.0

00

valu

e:4.0

00

valu

e:1.1

55

valu

e:1.1

55

valu

e:1.2

58

valu

e:1.2

58

valu

e:0.2

29

valu

e:0.5

12

valu

e:0.5

00

valu

e:

MA

Hm

in0.1

47

(0.2

01)

0.7

59

(0.1

29)

0.0

98

(0.2

53)

0.0

06

(0.2

34)

0.2

24

(0.2

79)

0.2

24

(0.2

79)

0.1

45

(0.0

34)

0.1

38

(0.0

34)

0.0

50

(0.0

29)

4.2

10

(0.0

50)

MA

Hm

ax

1.4

70

(0.1

73)

0.4

98

(0.1

71)

0.5

04

(0.5

11)

0.5

21

(0.1

92)

0.1

34

(0.4

83)

0.4

00

(0.3

03)

0.1

54

(0.0

39)

0.0

50

(0.0

14)

0.1

44

(0.1

23)

4.1

75

(0.0

48)

MH

0.2

01

(0.1

17)

0.2

60

(0.0

82)

0.0

31

(0.2

14)

0.1

86

(0.1

74)

0.1

82

(0.2

40)

0.1

82

(0.2

40)

0.1

28

(0.0

38)

0.0

42

(0.0

13)

0.0

07

(0.0

27)

4.1

28

(0.0

45)

21

9 Application to financial data

The MGH and MAGH distributions are now fitted to various asset-return data: Dax/Cac andNikkei/Cac returns (both comprising 3989 samples) and Dax/Dow (1254 samples). In particular,the following distributions are used:

(1) MGH/MH,

(2) MAGH/MAH with minimum parameterization (denoted by (min)), that is, with all marginsequally parameterized,

(3) MAGH/MAH with maximum parameterization (denoted by (max)), that is, with eachmargin individually parameterized.

For some of these distributions we have also estimated the symmetric pendant (denoted by(sym)), i.e., β = 0. Further, we illustrate estimations following the affine-linear transformationmethod provided at the end of Section 3 (denoted by (PC)). The results are presented in Table4. The dependence parameter Dep.Par. refers to the sub-diagonal elements of the normed matrixΣ, i.e., Dep.Par.:= σij/

√σiiσjj = σij/(SiSj).

We have already mentioned that the Kullback divergence or cross entropy is a widespread mea-sure of the goodness-of-fit of multidimensional distributions (in particular, lower cross entropysignals a better fit), see Ullah (1996) for an overview. For small divergences between the dis-tributions (i.e., good fits), cross entropy is approximately equal to another popular divergencemeasure, the χ2 divergence:

∫f(x) log

f(x)g(x)

dx =∫

f(x) log(

f(x)− g(x)g(x)

+ 1)

dx ≈∫

(f(x)− g(x))2

g(x)dx.

Thus, one cannot expect substantially different results if alternative divergence measures areapplied.

Taking additional parameters or degrees of freedom, such as

• λ 6= 1 (MAGH or MGH) instead of λ = 1 (MAH or MH),

• β 6= 0 (MAGH or MGH) instead of β = 0 (symmetric MAGH or MGH),

• multiple λ, α, and β parameters (maximum parametrizations) instead of single ones (minimumparametrizations)

can be justified by means of the individual cross entropies. Consider the cross entropies Hi andHj of variants i and j with variant i having k additional parameters in contrast to variant j.The fact that the cross entropy times the sample size n is equal to the negative log-likelihoodleads to the following likelihood ratio L between the variant j and the variant i :

Ln(i, j) = n(Hj −Hi).

The statistics 2Ln(i, j) possesses a χ2-distribution with k degrees of freedom (see Stuart andOrd (1994)) and can be used as a test for additional parameters.

22

Tab

le4

Est

imat

ions

from

two-

dim

ensi

onal

asse

t-re

turn

data

(Dax

-Cac

,D

ax-D

ow,N

ikke

i-C

ac).

Data

Model

Locati

on

µScaling

SD

ep.P

ar.

Corr

.λ

αβ

Cro

ssEnt

DaxC

ac

MA

GH

min

0.0

012

0.0

008

0.0

062

0.0

060

0.6

89

0.7

09

0.8

89

0.6

75

-0.0

34

-6.1

83

DaxC

ac

MA

Hm

in0.0

011

0.0

008

0.0

062

0.0

059

0.6

88

0.7

09

1.0

00

0.6

99

-0.0

29

-6.1

83

DaxC

ac

MA

GH

min

sym

.0.0

004

0.0

003

0.0

056

0.0

054

0.6

89

0.7

09

1.0

02

0.6

31

0.0

00

-6.1

82

DaxC

ac

MA

GH

maxPC

0.0

012

0.0

005

0.0

103

0.0

046

0.9

41

0.7

09

0.8

12

0.7

32

0.1

92

1.3

63

0.0

37

0.0

00

-6.2

08

DaxC

ac

MA

Hm

axPC

0.0

015

0.0

007

0.0

095

0.0

042

0.8

98

0.7

09

1.0

00

1.0

00

0.2

61

1.3

15

0.0

39

0.0

00

-6.2

08

DaxC

ac

MA

GH

maxPC

sym

.0.0

002

0.0

001

0.0

107

0.0

048

0.9

99

0.7

09

0.6

35

0.6

45

0.0

50

1.4

27

0.0

00

0.0

00

-6.2

75

DaxC

ac

MG

H0.0

010

0.0

005

0.0

045

0.0

044

0.6

72

0.7

09

1.1

34

0.5

35

-0.0

38

-0.0

17

-6.2

13

DaxC

ac

MH

0.0

011

0.0

006

0.0

059

0.0

058

0.6

73

0.7

09

1.0

00

0.6

89

-0.0

44

-0.0

19

-6.2

13

DaxC

ac

MG

Hsy

m.

0.0

005

0.0

003

0.0

043

0.0

042

0.6

72

0.7

09

1.1

43

0.5

04

0.0

00

0.0

00

-6.2

12

DaxD

ow

MA

GH

min

0.0

025

0.0

013

0.0

137

0.0

097

0.5

05

0.4

98

0.8

91

1.2

85

-0.0

67

-5.7

43

DaxD

ow

MA

Hm

in0.0

021

0.0

011

0.0

124

0.0

088

0.5

03

0.4

98

1.0

00

1.1

69

-0.0

56

-5.7

43

DaxD

ow

MA

GH

min

sym

.0.0

002

0.0

001

0.0

137

0.0

097

0.5

00

0.4

98

0.7

59

1.2

08

0.0

00

-5.7

41

DaxD

ow

MA

GH

maxPC

0.0

009

0.0

006

0.0

145

0.0

092

0.2

87

0.4

98

0.6

94

0.6

54

1.0

14

1.6

23

-0.0

01

0.0

00

-5.7

46

DaxD

ow

MA

Hm

axPC

0.0

012

0.0

002

0.0

122

0.0

082

0.3

73

0.4

98

1.0

00

1.0

00

0.8

68

1.5

61

0.0

32

0.0

00

-5.7

46

DaxD

ow

MA

GH

maxPC

sym

.0.0

000

0.0

001

0.0

147

0.0

093

0.2

77

0.4

98

0.6

50

0.6

57

1.0

13

1.6

34

0.0

00

0.0

00

-5.7

46

DaxD

ow

MG

H0.0

020

0.0

003

0.0

135

0.0

097

0.4

89

0.4

98

1.0

87

1.3

59

-0.0

77

-0.0

10

-5.7

57

DaxD

ow

MH

0.0

021

0.0

004

0.0

137

0.0

098

0.4

89

0.4

98

1.0

00

1.3

46

-0.0

79

-0.0

15

-5.7

57

DaxD

ow

MG

Hsy

m.

0.0

001

0.0

001

0.0

130

0.0

093

0.4

88

0.4

98

1.0

97

1.2

89

0.0

00

0.0

00

-5.7

55

Nik

keiC

ac

MA

GH

min

0.0

000

0.0

004

0.0

028

0.0

027

0.1

96

0.2

36

0.8

70

0.2

72

-0.0

11

-5.8

84

Nik

keiC

ac

MA

Hm

in-0

.0000

0.0

004

0.0

024

0.0

023

0.1

99

0.2

36

1.0

00

0.2

45

-0.0

08

-5.8

84

Nik

keiC

ac

MA

GH

min

sym

.-0

.0001

0.0

003

0.0

028

0.0

027

0.1

97

0.2

36

0.8

80

0.2

78

0.0

00

-5.8

83

Nik

keiC

ac

MA

GH

maxPC

0.0

008

0.0

007

0.0

084

0.0

069

0.1

70

0.2

36

0.6

46

0.9

02

0.7

00

0.9

57

0.0

09

0.0

00

-5.8

39

Nik

keiC

ac

MA

Hm

axPC

0.0

008

0.0

008

0.0

067

0.0

058

0.3

52

0.2

36

1.0

00

1.0

00

0.5

64

0.8

59

0.0

08

0.0

00

-5.8

38

Nik

keiC

ac

MA

GH

maxPC

sym

.0.0

000

0.0

003

0.0

090

0.0

076

0.2

78

0.2

36

0.6

52

0.6

39

0.7

14

1.0

20

0.0

00

0.0

00

-5.8

38

Nik

keiC

ac

MG

H0.0

004

0.0

005

0.0

032

0.0

031

0.2

17

0.2

36

1.1

17

0.3

55

-0.0

26

-0.0

14

-5.8

85

Nik

keiC

ac

MH

0.0

004

0.0

005

0.0

043

0.0

041

0.2

16

0.2

36

1.0

00

0.4

59

-0.0

28

-0.0

17

-5.8

85

Nik

keiC

ac

MG

Hsy

m.

-0.0

000

0.0

003

0.0

032

0.0

031

0.2

17

0.2

36

1.1

18

0.3

49

0.0

00

0.0

00

-5.8

85

In our case, a single additional parameter such as λ or β is justified on a significance levelof 0.05 with Hj − Hi > 3.84146/3989 ≈ 0.001 for Dax/Cac and Nikkei/Cac data and withHj −Hi > 3.84146/1254 ≈ 0.003 for Dax/Dow data. However, this is not the case for minimallyparameterized variants, in particular generalized asymmetric variants are not justified on thissignificance level.

Three additional parameters (λ, α, and β in the maximally parameterized variants) are justifiedon a significance level 0.05 with Hj −Hi > 7.81473/3989 ≈ 0.002 for Dax/Cac and Nikkei/Cacdata and with Hj−Hi > 7.81473/1254 ≈ 0.006 for Dax/Dow data. On the significance level 0.01,the critical differences are 0.003 for Dax/Cac and Nikkei/Cac data and 0.009 for Dax/Dow data.These significance thresholds are exceeded for Dax/Cac data - the flexibility of the maximallyparameterized variants is obviously valuable for the modelling in this case.

To evaluate the goodness-of-fit of the respective distribution models to the financial data, the χ2

test has been performed. While there are diverse recommendations (see Stuart and Ord (1994))

23

concerning the choice of the class intervals in the univariate case, the choice remains difficultfor multivariate distributions. Unfortunately, the results of the test depend essentially on thischoice. We have used some simple choices: 62, 82, and 102 intervals of width 0.005 or 0.01. For62 and 82 intervals, the hypotheses that Dax/Dow data do not arise from the individual MGHand MAGH distributions cannot be rejected on the one per cent significance level (for MGHdistributions and 82 intervals the hypothesis cannot be rejected on the five per cent significancelevel). For the other data sets, the hypothesis could be rejected on the one percent significancelevel. However, according to our analysis the hypothesis must be rejected for any other commonparametric multidimensional distribution which is different from the hyperbolic family.

Due to the total or approximate symmetry of some distribution models, the dependence param-eters can be roughly interpreted as ”correlation coefficients” for MGH and MAGHmin distribu-tions. They are even close to the corresponding sample correlation coefficient (column ”Corr.”).Note that such an interpretation is not possible for the MAGHmax model due to the differentparameterization of the one-dimensional margins.

Conclusion

Summarizing the results we have investigated an interesting new class of multidimensional dis-tributions with exponentially decreasing tails to analyze high-dimensional data, namely themultivariate affine generalized hyperbolic distributions. These distributions are attractive re-garding the estimation of unknown parameters and random vector generation. We illustratedthat this class of distributions possesses an appealing dependence structure and we derived sev-eral dependence properties. Finally, an extensive simulation study showed the flexibility androbustness of the introduced model. Thus, the use of multivariate affine generalized hyperbolicdistributions is recommended for multidimensional data modelling in statistical and financialapplications.

Appendix

Proof (Theorem 8). i) Suppose X ∈ MGH2(µ,Σ, ω) with parameter β = 0. Then X belongsto the family of elliptically contoured distributions and the assertion follows by Theorem 2 inLindskog, McNeil, and Schmock (2003).ii) Suppose X ∈ MAGH2(µ,Σ, ω) with stochastic representation X

d= A′Y + µ, A′A = Σ,copula C and parameter ρ. For ρ = 0, the assertion follows by Theorem 5.1.9 in Nelsen (1999)due to the independence of X1 and X2. For the remaining case we can assume ρ > 0 as forρ < 0 the assertion is shown similarly. According to Theorem 5.1.3 in Nelsen (1999), Kendall’stau is a copula property what justifies the assumption µ = 0. Further, Kendall’s tau is invariantunder strictly increasing transformations of the margins (see Theorem 5.1.9 in Nelsen (1999))and therefore we may set

A′ =

1 0

1 c

with c :=

√1/ρ2 − 1.

24

Consequently, the distribution function of X = (X1, X2)′ has the form

FX(x1, x2) = IP(Y1 ≤ x1, Y1 + cY2 ≤ x2) =

x1∫

−∞FY2

(x2 − z

c

)fY1(z) dz,

and the corresponding density function is fX(x1, x2) = fY2

(x2−x1

c

)fY1(x1)/|c|. Thus, using the

fact that

τ = 4∫

[0,1]2

C(u, v) dC(u, v)− 1 = 4∫

IR2

FX(x1, x2)fX(x1, x2) d(x1, x2)− 1,

formula (4) is shown. 2

In order to prove Theorem 10 we first investigate the tail behavior of the univariate symmetricMGH and MAGH distributions. The tail of the distribution function F , as always, is denotedby F := 1− F.

Definition 11 (Semi-heavy tails) A continuous (symmetric) function g : IR → (0,∞) iscalled semi-heavy tailed (or exponentially tailed) if it satisfies

g(x) ∼ c|x|ν exp(−η|x|) as x → ±∞, (9)

with ν ∈ IR, η > 0 and some positive constant c. The class of (symmetric) semi-heavy tailedfunctions is denoted by Lν,η.

Lemma 12 Let f be a density function such that f ∈ Lν,η, ν ∈ IR, η > 0. Then the cor-responding distribution function F possesses the same asymptotic behavior as its density, i.e.,F (x) ∼ c|x|ν exp(−η|x|) as x → −∞ and F (x) ∼ cxνexp(−ηx) as x → ∞ for some positiveconstant c; write F ∈ Lν,η.

Proof. Consider e.g. the tail function F . Applying partial integration we obtain

F (x) =

∞∫

x

f(u) du ∼ c

∞∫

x

uν exp(−ηu) du = cηxν exp(−ηx) + ηcν

∞∫

x

uν−1 exp(−ηu) du.

Thus, the proof is complete if we show that∫∞x uν−1 exp(−ηu) du

/xν exp(−ηx) = o(1) as

x →∞. Rewriting the latter quotient yields

0 ≤ 1x

∞∫

x

(u

x

)ν−1exp(−η(u− x)) du =

1x

∞∫

0

(u + x

x

)ν−1exp(−ηu) du.

The assertion is now immediate because(u

x+ 1

)ν−1≤ (u + 1)ν−1 for ν ≥ 1 and

(u

x+ 1

)ν−1≤ 1 for ν < 1,

and the corresponding integrals exist. 2

The next lemma is quite useful; it states that the tail of the convolution of two semi-heavy taileddistributions is determined by the heavier tail.

25

Lemma 13 Let F1 and F2 be distribution functions with F1 ∈ Lν1,η1 and F2 ∈ Lν2,η2 where0 < η2 < η1, ν1, ν2 ∈ IR. Then F1 ∗ F2 ∈ Lν2,η2 and, moreover,

limt→∞F1 ∗ F2(t)/F 2(t) = m1 :=

∞∫

−∞eη2udF1(u). (10)

Proof. For some fixed s > 1, we have

F1 ∗ F2(t) =

∞∫

−∞F 2(t− u)dF1(u) =

t/s∫

−∞F 2(t− u)dF1(u)−

t−t/s∫

−∞F 2(u)dF1(t− u)

=

t/s∫

−∞F 2(t− u)dF1(u) +

t−t/s∫

−∞F 1(t− u)dF2(u) + F 2(t− t/s)F 1(t/s),

where the last equality follows by partial integration. Thus, dominated convergence yields

limt→∞F1 ∗ F2(t)/F 2(t) = lim

t→∞

t/s∫

−∞F 2(t− u)/F 2(t)dF1(u) =

∞∫

−∞eη2udF1(u) =: m1 < ∞,

because

0 ≤t−t/s∫

−∞F 1(t− u)/F 2(t)dF2(u) ≤ F 1(t/s)

F 2(t)· F 2(t− t/s) → 0 as t →∞.

A consequence of the symmetric tails of F1 and F2 is that limt→−∞ F1∗F2(t)/F2(t) = m1. Hence,F1 ∗ F2 ∈ Lν2,η2 is proven. 2

According to Barndorff-Nielsen and Blæsild (1981), the univariate MGH distributions have semi-heavy tails, in particular

MGH1(0, 1, ω) ∼ c|x|λ−1 exp((∓α + αβ)x) as x → ±∞, (11)

with some positive constant c. Hence, in the symmetric case β = 0 we obtain MGH1(0, 1,ω) ∈ Lν,η with ν = λ− 1 and η = α. Now we are ready to prove Theorem 10.

Proof (Theorem 10). We only show upper tail-dependence and upper tail-independence, respec-tively, as the lower pendant is obtained similarly. Recall that tail dependence is a copula propertyand therefore we may put µ = 0.

i) Let X ∈ MGH2(0,Σ, ω) with β = 0. In that case X belongs to the family of elliptically con-toured distributions. According to Theorem 6.8 in Schmidt (2002) the assertion follows becauseof the exponentially-tailed density generator.

ii) Let X ∈ MAGH2(0, Σ, ω) with stochastic representation Xd= A′Y and Cholesky matrix

A′ = (aij)i,j=1,2. Note that a11, a22 > 0. Then

26

IP(X2 > F−1X2

(v) | X1 > F−1X1

(v)) =IP(a11Y1 > F−1

X1(v), a12Y1 + a22Y2 > F−1

X2(v))

IP(a11Y1 > F−1X1

(v))

=1

1− v

∞∫

F−1Y1

(v)

IP(Y2 > (F−1X2

(v)− a12y)/a22)fY1(y) dy =: I,

because Y1 and Y2 are independent random variables. If ρ ≤ 0 then a12 ≤ 0 and upper tail-independence immediately follows by dominated convergence.

Consider now ρ > 0 and therefore a12 > 0. Let α2 < α1

√1/ρ2 − 1. Then α2/a22 < α1/a12. For

all ε ∈ (0, 1] we conclude with u := 1− v that

I ≤ ε +1u

F−1Y1

(1−uε)∫

F−1Y1

(1−u)

IP(Y2 > (F−1X2

(1− u)− a12y)/a22)fY1(y) dy

≤ ε + (1− ε)IP(Y2 > (F−1X2

(1− u)− a12F−1Y1

(1− uε))/a22). (12)

Due to (11) we know that Fa12Y1 ∈ Lν1,η1 with ν1 = λ1 − 1 and η1 = α1/a12. Thus, Lemma 13gives FX2 ∈ Lν2,η2 with ν2 = λ2−1 and η2 = α2/a22 as 0 < η2 < η1. Then the probability in (12)converges to zero as u → 0+ if F−1

X2(1−u)−a12F

−1Y1

(1−uε) = F−1a12Y1+a22Y2

(1−u)−F−1a12Y1

(1−uε) →∞ as u → 0+. Put xu := F−1

a12Y1(1− uε) and yu := F−1

a12Y1+a22Y2(1− u). Then

u =1εF a12Y1(xu) = FX2(yu) ∼ c1

εxν1

u exp(−η1xu) ∼ c2yν2u exp(−η2yu), (13)

as u → 0+ and therefore xu, yu → ∞. The asymptotic behavior (13) implies yu − xu → ∞ asu → 0+ because 0 < η2 < η1. Hence, upper tail-independence is shown.

iii) Now suppose ρ > 0 and α2 > α1

√1/ρ2 − 1 which yields a12 > 0 and α2/a22 > α1/a12.

According to Lemma 13 we have Fa12Y1 ∈ Lν1,η1 and FX2 ∈ Lν1,η1 with ν1 = λ1 − 1 andη1 = α1/a12. Notice that with u := 1− v

I ≥ IP(Y2 > (F−1X2

(1− u)− a12F−1Y1

(1− uε))/a22). (14)

Further u = F a12Y1(xu) = FX2(yu) ∼ c1xν1u exp(−η1xu) ∼ c2y

ν1u exp(−η1yu). Hence

c2

c1

( yu

xu

)ν1

exp(−η1(yu − xu)) → 1 as u → 0+.

Suppose that lim supu→0+(yu − xu) = ∞. If ν1 ≥ 0, then the fact

lim infu→0+

c2

c1

( yu

xu

)ν1

exp(−η1(yu − xu)) ≤ lim infu→0+

c2

c1(2(yu − xu))ν1 exp

(− η1(yu − xu)

)= 0,

as u → 0+ would lead to a contradiction. On the other hand, if ν1 < 0 then

lim infu→0+

c2

c1

( yu

xu

)ν1

exp(−η1(yu − xu)) ≤ lim infu→0+

c2

c1exp

(− η1(yu − xu)

)= 0 as u → 0+

would also lead to a contradiction. Therefore we conclude that lim infu→0+ IP(Y2 > (F−1X2

(1 −u) − a12F

−1Y1

(1 − uε))/a22) > 0 as Y2 is supported on IR. Finally, the limit limv→1− IP(X2 >

F−1X2

(v) | X1 > F−1X1

(v)) exists due to the convexity of FX1 and FX2 for large arguments. 2

27

Acknowledgement. The authors would like to thank Professor Friedrich Schmid and ProfessorUlrich Stadtmuller for very helpful discussions. Further, the authors are grateful to ProfessorGholamreza Nakhaeizadeh for supporting this work. The authors thank two anonymous refereesfor very constructive comments which improved an earlier version of this paper.

References

Atkinson, A.C., 1982, The simulation of generalized inverse Gaussian and hyperbolic randomvariables, Siam J. Sci. Stat. Comput 3, 502–515.

Barndorff-Nielsen, O. E., 1977, Exponentially decreasing distributions for the logarithm of par-ticle size, Proceedings of the Royal Society London A, 401–419.

Barndorff-Nielsen, O. E., 1978, Hyperbolic distributions and distributions on hyperbolae, Scand.J. Statist. pp. 151–157.

Barndorff-Nielsen, O. E., and P. Blæsild, 1981, Hyperbolic distributions and ramifications: Con-tributions to theory and application. In: C. Taillie, G. Patil, and B. Baldessari (Eds.), Statis-tical Distributions in Scientific Work, Dordrecht Reidel vol. 4, pp. 19–44.

Barndorff-Nielsen, O. E., J. Kent, and M. Sørensen, 1982, Normal Variance-Mean Mixtures andz Distributions, International Statistical Review 50, 145–159.

Bauer, C., 2000, Value at risk using hyperbolic distributions, J. Economics and Business 52,455–467.

Bingham, N. H., and R. Kiesel, 2001, Modelling asset returns with hyperbolic distributions. In:J. Knight and S. Satchell (Eds.), Asset return distributions, Butterworth-Heinemann, 1–20.

Blæsild, P., and J. L. Jensen, 1981, Multivariate distributions of hyperbolic type. In: C. Taillie,G. Patil, and B. Baldessari (Eds.), Statistical distributions in scientific work, Dordrecht Reidel,4, 45–66.

Boender, C.G.E., and A.H.G. Rinnooy Kan, 1987, Bayesian stopping rules for multistart globaloptimization methods, Mathematical Programming 37, 59–80.

Cambanis, S., S. Huang, and G. Simons, 1981, On the Theory of Elliptically Contoured Distri-butions, Journal of Multivariate Analysis 11, 368–85.

Cherubini, U., E. Luciano, and W. Vecchiato, 2004, Copula Methods in Finance. (John Wiley& Sons).

Eberlein, E., 2001, Application of generalized hyperbolic Levy motions to finance. In: O.E.Barndorff-Nielsen, T. Mikosch, and S. Resnick (Eds.), Levy Processes: Theory and Applica-tions, 319–337.

Eberlein, E., and U. Keller, 1995, Hyperbolic distributions in finance, Bernoulli 1, 281–299.Eberlein, E., and K. Prause, 2002, The generalized hyperbolic model: financial derivatives and

risk measures In: H. Geman, D. Madan, S. Pliska, and T. Vorst (Eds.), Mathematical Finance-Bachelier Congress 2000, 245–267.

Embrechts, P., F. Lindskog, and A. McNeil, 2003, Modelling Dependence with Copulas andApplications to Risk Management. In: S. Rachev (Ed.) Handbook of Heavy Tailed Distibutionsin Finance, Elsevier, Chapter 8, 329–384.

Embrechts, P., A. McNeil, and D. Straumann, 2002, Correlation and Dependency in Risk Man-agement: Properties and Pitfalls. In: M.A.H. Dempster (Ed.), Risk Management: Value atRisk and Beyond, Cambridge University Press, Cambridge, 176–223.

Fang, K.T., S. Kotz, and K.W. Ng, 1990, Symmetric Multivariate and Related Distributions.(Chapman and Hall London).

Fletcher, Roger, 1987, Practical Methods of Optimization. (Wiley New York).Hauksson, H.A., M. Dacorogna, T. Domenig, U. Mueller, and G. Samorodnitsky, 2001, Multi-

variate extremes, aggregation and risk estimation, Quantitative Finance 1, 79–95.Joe, H., 1997, Multivariate Models and Dependence Concepts. (Chapman and Hall London).Kullback, S, 1959, Information theory and statistics. (Wiley New York).

28

Lindskog, F., A. McNeil, and U. Schmock, 2003, Kendall’s tau for elliptical distributions. In:G. Bohl, G. Nakhaeizadeh, S.T. Rachev, T. Ridder and K.H. Vollmer (Eds.), Credit Risk:Measurement, Evaluation and Management, Physica-Verlag Heidelberg, 649–676.

Magnus, W., F. Oberhettinger, and R.P. Soni, 1966, Formulas and Theorems for the SpecialFunctions of Mathematical Physics vol. 52 of Die Grundlehren der mathematischen Wis-senschaften in Einzeldarstellungen. (Springer Verlag, Berlin).

Nelsen, R.B., 1999, An Introduction to Copulas. (Springer Verlag).Olbricht, W., 1991, On mergers of distributions and distributions with exponential tails, Com-

putational Statistics & Data Analysis 12, 315–326.Parzen, E., 1962, On estimation of probability density function and mode, Ann. Math. Stat. 35,

1065–1076.Prause, K., 1999, The Generalized Hyperbolic Model: Estimation, Financial Derivatives, and

Risk Measures, PhD Thesis, Albert-Ludwigs-Universitat Freiburg.Rinnooy Kan, A.H.G., and G.T. Timmer, 1987, Stochastic global optimization methods, Part I:

Clustering methods, Part II: Multi-level methods, Mathematical Programming 39, 26–78.Ross, S.M., 1997, Introduction to Probability models. (Academic Press, San Diego) 6 edn.Schmidt, R., 2002, Tail dependence for elliptically contoured distributions, Mathematical Meth-

ods of Operations Research 55, 301–327.Stuart, Alan, and Keith Ord, 1994, Kendall’s Advanced Theory of Statistics, Volume I: Distri-

bution Theory. (Arnold London).Stuart, Alan, and Keith Ord, 1994, Kendall’s Advanced Theory of Statistics, Volume II: Classical

Inference and the Linear Model. (Arnold London).Stutzle, Eric A., and Tomas Hrycej, 2001, Forecasting of Conditional Distributions - An Appli-

cation to the Spare Parts Demand Forecast. In: Proc. 2001 IASTED Int. Conf. on ArtificialIntelligence and Soft Computing Cancun, Mexico.

Stutzle, Eric A., and Tomas Hrycej, 2002a, Estimating Multivariate Conditional Distributionsvia Neural Networks and Global Otimization. In: Proc. 2001 IEEE Int. Joint Conference onNeural Networks Honolulu, Hawaii.

Stutzle, Eric A., and Tomas Hrycej, 2002b, Modelling Future Demand by Estimating the Multi-variate Conditional Distribution via the Maximum Likelihood Pronciple and Neural Networks.In: Proc. 2002 IASTED Int. Conf. on Modelling, Identification and Control Innsbruck, Aus-tria.

Ullah, A., 1996, Entropy, divergence and distance measuser with econometric applications, Jour-nal of Statistica Planning and Inference 49, 137–162.

29

Documents

Multivariate distribution models with generalized ... · R. Schmidt et al., 2006, Multivariate distribution models with generalized hyperbolic margins, Comput. Statist. Data Anal