23
International Conference in Memory of Two Eminent Social Scientists: C. Gini and M.O. Lorenz. 23-26 May 2005, University of Siena, Italy Bivariate Income Distributions with Lognormal Conditionals Jos´ e Mar´ ıa Sarabia a , Enrique Castillo b , Marta Pascual a and Mar´ ıa Sarabia c a Department of Economics, University of Cantabria, Santander, Spain b Department of Applied Mathematics and Computational Sciences, University of Cantabria, Santander, Spain c Department of Business Administration, University of Cantabria, Santander, Spain Abstract In this paper, the most general bivariate distribution with lognor- mal conditionals is fully charactarized, using the methodology pro- posed by Arnold, Castillo and Sarabia (1999). The properties of the new family are studied in detail, including marginal and conditional distributions, regression functions, dependence measures, moments and inequality measures. The new distribution is very general, and contains as a particular case the classical bivariate lognormal distribu- tion. Several subfamilies are studied. Some extensions and variations of the basic model are discussed. Finally, we present an empirical ap- plication. We estimate and compare the basic model proposed in the paper with a classical model, using data from the European commu- nity household panel in different periods of time. Key words: Lognormal distribution, conditionally specified models, Euro- pean community household panel. 1 Introduction The purpose of this paper is to study several classes of bivariate income distri- butions whose conditionals belong to the two and three-parameter lognormal 1

with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

International Conference in Memory of Two Eminent Social Scientists:C. Gini and M.O. Lorenz. 23-26 May 2005, University of Siena, Italy

Bivariate Income Distributionswith Lognormal Conditionals

Jose Marıa Sarabiaa, Enrique Castillob,Marta Pascuala and Marıa Sarabiac

a Department of Economics, University of Cantabria, Santander, Spainb Department of Applied Mathematics and Computational Sciences,

University of Cantabria, Santander, Spainc Department of Business Administration,University of Cantabria, Santander, Spain

Abstract

In this paper, the most general bivariate distribution with lognor-mal conditionals is fully charactarized, using the methodology pro-posed by Arnold, Castillo and Sarabia (1999). The properties of thenew family are studied in detail, including marginal and conditionaldistributions, regression functions, dependence measures, momentsand inequality measures. The new distribution is very general, andcontains as a particular case the classical bivariate lognormal distribu-tion. Several subfamilies are studied. Some extensions and variationsof the basic model are discussed. Finally, we present an empirical ap-plication. We estimate and compare the basic model proposed in thepaper with a classical model, using data from the European commu-nity household panel in different periods of time.

Key words: Lognormal distribution, conditionally specified models, Euro-pean community household panel.

1 Introduction

The purpose of this paper is to study several classes of bivariate income distri-butions whose conditionals belong to the two and three-parameter lognormal

1

Page 2: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

distribution, and to some of their extensions. The most natural definition ofa bivariate lognormal distribution is in terms of the classical bivariate nor-mal distribution. The bivariate lognormal distribution has the probabilitydensity function (pdf):

f(x, y; µ, Σ) =1

xy|Σ|1/22πexp

{−1

2(z − µ)>Σ−1(z − µ)

}, x, y > 0 (1)

where z = (log x, log y)>, µ = (µ1, µ2)> and Σ is a 2× 2 matrix. Kmietowick

(1984) has used this distribution for modelling the distribution of householdsize and income. An important appealing of family (1) is that the marginaland conditional distributions are again lognormal, as in the bivariate nor-mal case. Unfortunately, this distribution presents some differences with thenormal case. For instance, the range of the correlation coefficient is limited,and is more narrowed than the normal case. Some details about the depen-dence structure of (1) appear in Nalbach-Leniewska (1979). This fact canaffect seriously the practical use of this distribution. In this paper we presentseveral bivariate versions of the lognormal distributions based on conditionalspecification. We will use the methodology proposed by Arnold, Castilloand Sarabia (1992, 1999 and 2001). Using this methodology, we obtain newclasses of distributions that are more flexible and broad than the classicalmodels, so that they can be used as an alternative to the bivariate lognormaldistribution (1).

The contents of the paper are organized as follows. Section 2 presents thelognormal distribution and some of its generalizations. Section 3 studies thebivariate two and three-parameter lognormal conditionals distribution. Theproperties of the new family are studied in detail, including marginal and con-ditional distributions, regression functions, dependence measures, momentsand inequality measures. The new distribution is very general, and containsas a particular case the classical bivariate lognormal distribution. Severalsubfamilies are studied. Some extensions and variations of the basic modelare discussed in Section 4. Finally, we present an empirical application inSection 5. We estimate and compare the basic model proposed in the paperwith a classical model, using data from the European Community HouseholdPanel in different periods of time.

2

Page 3: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

2 The Lognormal Distribution and General-

izations

From the Gibrat (1931) and Kalecki’s (1945) original works, the lognormaldistribution has been used for modelling data from income and wealth dataand as a distribution for firms size, among other applications in economicsand business. A random variable X has a two-parameter lognormal distri-bution if its pdf has the form,

f(x; µ, σ) =1

xσ√

2πexp

1

2

(log x− µ

σ

)2 , x > 0 (2)

with µ ∈ IR and σ > 0. If X has the density (2) we write X ∼ LN (µ, σ).Since log X ∼ N (µ, σ2), the pdf in (2) admits the representation X =exp (µ + σZ), where Z ∼ N (0, 1). When one works with income and wealthdata, one is interested in models above a certain threshold parameter δ.Then, if there exists a δ ∈ IR such that log(X − δ) ∼ N (µ, σ2), we have athree-parameter lognormal distribution. In this case, the pdf is given by:

f(x; δ, µ, σ) =1

(x− δ)σ√

2πexp

1

2

(log(x− δ)− µ

σ

)2 , x > δ. (3)

If X has the pdf (3) we write X ∼ LN (δ, µ, σ). A detailed study of thesedistributions appears in Johnson, Kotz and Balakrishnan (1994), chapter 14.The books of Aitchison and Brown (1957) and Crow and Shimizu (1988) aredevoted to this distribution. Kleiber and Kotz (2003) is a recent and im-portant reference, specially for their applications in economics and actuarialsciences.

We include three extensions of (2) that will be used later. The first oneis a log version of the skew-normal distribution. This distribution, will becalled log-skew-normal distribution, and has the pdf:

f(x; λ, µ, σ) =2

xσφ

(log x− µ

σ

log x− µ

σ

), x > 0, (4)

where φ(z) and Φ(z) denote the standard normal density and distributionfunctions and where λ ∈ IR is a parameter which governs the skewness ofthe density, and µ ∈ IR and σ > 0. Obviously if λ = 0, (4) reduces to

3

Page 4: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

the lognormal distribution (2). We will denote the distribution (4) as X ∼LSN (λ, µ, σ). Azzalini et al. (2003) have proposed this distribution for theanalysis of family income data.

The second one is the log version of the classical exponential-power distri-bution, proposed by Box and Tiao (1973) and widely used in robust Bayesiancontexts. The pdf of this distribution is given by:

f(x; β, µ, σ) =1

2(3+β)/2xσΓ[(3 + β)/2]exp

1

2

∣∣∣∣∣log x− µ

σ

∣∣∣∣∣2/(1+β)

, x > 0,

(5)where −1 < β ≤ 1, µ ∈ IR and σ > 0. When β = 0 we obtain (2). If Xhas the pdf (5), we write X ∼ LEP(β, µ, σ) which is called log-exponential-power distribution. This distribution was proposed by Vianelli (1983). Thethird generalization was proposed by Lye and Martin (1993) and Creedy, Lyeand Martin (1997), and arises as the stationary distribution in a particularstochastic model. The pdf is given by,

f(x; θ) = exp{θ1x + θ2 log x + θ3(log x)2 + θ4(log x)3 − κ(θ)

}, x > 0, (6)

where exp{κ(θ)} is the normalizing constant. When θ1 = θ4 = 0, we obtain(2), and when θ3 = θ4 = 0, the classical Gamma distribution. A randomvariable X with pdf (6) is denoted by X ∼ LM(θ). This distribution isvery general and flexible, and facilitates modelling multimodal distributions.Creedy, Lye and Martin (1997) estimated (6) for individual earnings from USCurrent Population Survey, improving the standard Gamma and lognormaldistribution.

3 The Bivariate Three-parameter Lognormal

Conditionals Distribution

A distribution for a bidimensional random vector (X, Y ) is said to be con-ditionally specified if, for every y, the conditional distribution of X givenY = y is a member of some pre-specified parametric family of distributionsand, for every x, the conditional distribution of Y given X = x is a memberof a, possibly different, pre-specified parametric family of distributions. Inthis paper we work with the parametric families (2) to (6). Let (X, Y ) bea two dimensional random variable with support on a subset of IR2. We

4

Page 5: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

want to consider all possible joint distributions for (X, Y ) with the followingproperties:

(a) For each y the conditional distribution of X given Y = y is a three-parameter lognormal distribution with parameters δ1(y), µ1(y) andσ1(y), which may depend on y.

(b) For each x the conditional distribution of Y given X = x is a three-parameter lognormal distribution with parameters δ2(x), µ2(x) andσ2(x) which may depend on x.

In consequence, we seek the most general random variable (X, Y ) such thatthe conditional distributions admit the following representation:

X|Y = y ∼ δ1(y) + exp {µ1(y) + σ1(y)Z} , (7)

Y |X = x ∼ δ2(x) + exp {µ2(x) + σ2(x)Z} , (8)

where δi(u), µi(u) and σi(u), i = 1, 2 are unknown functions with x > δ1(y),y > δ2(x) and Z ∼ N (0, 1). Now, writing the joint density as product ofmarginals and conditionals we obtain the functional equation:

u1(y)

x− δ1(y)exp

1

2

[log(x− δ1(y))− µ1(y)

σ1(y)

]2

=u2(x)

y − δ2(x)exp

1

2

[log(y − δ2(x))− µ2(x)

σ2(x)

]2 (9)

where,

u1(y) =fY (y)

σ1(y)√

2π,

u2(x) =fX(x)

σ2(x)√

2π,

and fX(x), fY (y) represent the marginal densities. The solution of the func-tional equation (9) is not trivial. In this paper we consider some importantmodels. The first model corresponds to constants and known δi(u) = δi fori = 1, 2. In this case the three-parameter lognormal distribution belongs tothe two-parametric exponential family, so that we can use some well knownresults about conditional distributions. The second case corresponds to thechoice µi(u) = µi, i = 1, 2. In the next sections we study these two cases.

5

Page 6: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

3.1 The Bivariate Three-parameter Lognormal Condi-tionals Distribution with δi(·) constant

If δ is known, it is clear that (3) is a two parameter exponential family, andwe can make use of a theorem due to Arnold and Strauss (1991), dealing withbivariate distributions with conditionals in prescribed exponential families.Then we consider two different exponential families of densities {f1(x; θ) :θ ∈ Θ ⊂ IR`1} and {f2(y; τ) : τ ∈ T ⊂ IR`2} where:

f1(x; θ) = r1(x)β2(θ) exp

`1∑

i=1

θiq1i(x)

(10)

and

f2(y; τ) = r2(y)β2(τ) exp

`2∑

j=1

τjq2j(y)

. (11)

The class of all bivariate pdf f(x, y) with conditionals in these prescribedexponential families can be obtained as follows.

Theorem 1 Let f(x, y) be a bivariate density whose conditional densitiessatisfy:

f(x|y) = f1(x; θ(y))

andf(y|x) = f2(y; τ(x))

for every x and y for some functions θ(y) and τ(x) where f1 and f2 are asdefined in (10) and (11). It follows that f(x, y) is of the form:

f(x, y) = r1(x)r2(y) exp{q(1)(x)>Mq(2)(y)

}(12)

in whichq(1)(x) = (1, q11(x), . . . , q1`1(x))>

andq(2)(y) = (1, q21(y), . . . , q2`2(y))>

and M is a matrix of parameters of dimension (`1 + 1)× (`2 + 1) subject tothe requirement that: ∫ ∫

IR2f(x, y)dxdy = 1. (13)

The term em00 is the normalizing constant that is a function of the othermij’s determined by the constraint (13).

6

Page 7: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

Note that the class of densities with conditionals in the prescribed familyis itself an exponential family with (`1 + 1)× (`2 + 1)− 1 parameters. Uponpartitioning the matrix M in (12) in the following manner:

M =

m00 | m01 · · · m0`2

−− + −− −− −−m10 |· · · | M

m`10 |

, (14)

it can be verified that independent marginals will be encountered iff thematrix M ≡ 0. The elements of M determine the dependence structure inf(x, y).

Now, we may apply theorem 1 to the case of the three-parameter lognor-mal distribution, where (10) and (11) are of the form (3). In this case wehave `1 = `2 = 2 and the functions r1, r2, q11, q12, q21 and q22 are of the form:

r1(x) = (x− δ1)−1I(x > δ1),

r2(y) = (y − δ2)−1I(y > δ2),

q11(x) = log(x− δ1),

q12(x) = [log(x− δ1)]2,

q21(y) = log(y − δ2),

q22(y) = [log(y − δ2)]2.

Finally, substituting these functions in the general expression (12), we obtainthe class of bivariate densities with three-parameter lognormal beta condi-tionals (assuming constant δi), which is given by:

f(x, y; δ,m) = (x− δ1)−1(y − δ2)

−1 exp{−uδ1(x)>M uδ2(y)

}, (15)

where uδi(·) denotes the vector:

uδi(z) = (1 , log(z − δi) , [log(z − δi)]

2)>, i = 1, 2,

and M = {mij} is a 3×3 matrix of parameters. The parameters {mij} mustbe chosen such that (15) to be integrable. Expanding formula (15) we obtainthe formula:

f(x, y; δ,m) = [(x− δ1)(y − δ2)]−1 exp{− (m00 + u(z1, z2) + v(z1, z2))} (16)

7

Page 8: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

where

u(x, y) = m10x + m20x2 + m01y + m02y

2 + m11xy,

v(x, y) = m12xy2 + m21x2y + m22x

2y2,

and

z1 = log(x− δ1), (17)

z2 = log(y − δ2). (18)

3.1.1 General Properties

The densities of the form (15) or (16) are called lognormal conditionals dis-tributions. Note that (16) depends on eight parameters. The function u(·, ·)contains the terms that appear in the classical model (1) and the functionv(·, ·) contains new terms that appear in these conditional models. The con-stant exp(m00) is the normalizing constant and it is a function of the restof parameters. In order to have a genuine joint pdf, sufficient conditions forintegrability of (16) are that the parameters satisfy one of the following twosets of conditions:

m12 = m21 = m22 = 0, m02 > 0, m20 > 0, m211 < 4m02m20 (19)

m22 > 0, m212 < 4m22m02, m2

21 < 4m22m20. (20)

If the set of conditions (19) are satisfied, we identify classical bivariate lognor-mal densities (1). If (20) holds, we encounter a new and highly flexible classof distributions with lognormal conditionals. An example of this new modelappears in Figure 1. Some unexpected properties appear in this new modelas, for example, the multimodality. From expression (16), all the parameterof the new distribution can be identified. The conditional parameters µi(u)and σi(u), i = 1, 2 are given by:

µ1(y) = − m12z22 + m11z2 + m10

2(m22z22 + m21z2 + m20)

, (21)

µ2(x) = − m21z21 + m11z1 + m01

2(m22z21 + m12z1 + m02)

(22)

and

σ21(y) =

1

2(m22z22 + m21z2 + m20)

, (23)

σ22(x) =

1

2(m22z21 + m12z1 + m02)

. (24)

8

Page 9: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

Consequently, the conditional densities are:

X|Y = y ∼ LN (δ1, µ1(y), σ1(y)), (25)

Y |X = x ∼ LN (δ2, µ2(x), σ2(x)). (26)

and the marginal pdfs are given by (x > δ1):

fX(x; δ1,m) =

exp

{−

[(m20z

21 + m10z1 + m00) +

(m21z21 + m11z1 + m01)

2

4(m22z21 + m12z1 + m02)

]}

√(m22z2

1 + m12z1 + m02)/2π,

(27)and (y > δ2)

fY (y; δ2,m) =

exp

{−

[(m02z

22 + m01z2 + m00) +

(m12z22 + m11z2 + m10)

2

4(m22z22 + m21z2 + m20)

]}

√(m22z2

2 + m21z2 + m20)/2π.

(28)Note that (27) and (28) are not lognormal distributions if conditions (20)hold. These marginals depend on all eight parameters and then present ahigh flexibility. The conditional moments of (16) are (r = 1, 2, . . .):

E [(X − δ1)r|Y ] = exp

{rµ1(Y ) + r2σ2

1(Y )/2}

, (29)

E [(Y − δ2)r|X] = exp

{rµ2(X) + r2σ2

2(X)/2}

, (30)

and the conditional Gini coefficients are:

GX|Y = 2Φ{σ1(Y )/√

2} − 1, (31)

GY |X = 2Φ{σ2(X)/√

2} − 1. (32)

Combining (29)-(30) with (27)-(28) the moments of the marginal marginaldistributions as well as the correlation coefficient can be obtained.

3.2 The Bivariate Three-parameter Lognormal Condi-tionals Distribution with µi(·) constant

An important submodel corresponds to the choice µi(u) = µi, i = 1, 2 con-stant. In this case, the bivariate random variable (X, Y ) has the joint pdf

9

Page 10: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

1

2

3

1

2

3

00.250.5

0.751

1

2

30 0.5 1 1.5 2 2.5 3

0

0.5

1

1.5

2

2.5

3

Figure 1: Bivariate pdf and contour plots corresponding to model (16) withm10 = m01 = −1, m20 = m02 = m22 = 2, m11 = −1 and m12 = m21 = 1.

(δi = 0):

f(x, y; µ, σ, c) =κ(c)

2πσ1σ2xyexp

{−1

2

(z21 + z2

2 + cz21 z

22

)}, x, y > 0 (33)

where

z1 = (log x− µ1)/σ1, (34)

z2 = (log y − µ2)/σ2. (35)

The conditional distributions are

X|Y = y ∼ LN(µ1, σ

21/(1 + cz2

2)), (36)

Y |X = x ∼ LN(µ2, σ

22/(1 + cz2

1)). (37)

In this particular case we have an explicit expression for the normalizingconstant. It can be verified that,

κ(c) =

√2c

U(1/2, 1, 1/2c), (38)

where U(a, b, z) represents the confluent hypergeometric function defined by(a, z > 0)

U(a, b, z) =1

Γ(a)

∫ ∞

0e−tzta−1(1 + t)b−a−1dt. (39)

10

Page 11: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

3.3 Dependence Measures

In this section we study some dependence conditions corresponding to theconditional models. A distribution is said to be positively ratio likelihooddependent (or positive quadrant dependence) if the density f(x, y) satisfiesthe condition

f(x1, y1)f(x2, y2) ≥ f(x1, y2)f(x2, y1) (40)

for every x1 < x2, y1 < y2 in S(X) and S(Y ), respectively (see Barlow andProschan’s (1981)). By substituting the general pdf (12) in (40), we obtainthe condition:

[q(1)(x1)− q(1)(x2)]>M [q(2)(y1)− q(2)(y2)] ≥ 0 (41)

In the case of the model (16), a general condition about the parametersmij, for (41) to hold cannot be obtained. In general, it is quite possible toencounter non-limited both positive and negative correlations for this model.Scalar measures of dependence such as the correlation coefficient, do notalways tell everything of the dependence properties of a bivariate distribution.The local dependence function (Holland and Wang (1987) and Jones (1996))defined by

γ(x, y) =∂2 log f(x, y)

∂x∂y, (42)

gives more detailed information. For the joint pdf (16), the local dependencefunction is

γ(x, y) = −m11 + 2m21 log(x) + 2m12 log(y) + 4m22 log(x) log(y)

xy.

In particular, the local dependence function of the model (33) is given by:

γ(x, y) = −2c(log x− µ1)(log y − µ2)

σ21σ

22xy

.

3.4 Estimation

The family of densities (16) is a member of the exponential family withnatural sufficient statistics:

(∑

z1i,∑

z21i,

∑z2i,

∑z22i,

∑z1iz2i,

∑z1iz

22i,

∑z21iz2i,

∑z21iz

22i). (43)

11

Page 12: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

However, inference from conditionally specified models is somewhat compli-cated because of the almost ubiquitous presence of an awkward normalizingconstant. We know the shape of the likelihood but not the term requiredto make it integrate to 1. A method to avoid dealing with the normaliz-ing constant consists of using both conditional distributions. Assume that(X1, Y1), . . . , (Xn, Yn) is a random sample from a conditional specified modelf(x, y; θ) like (16). We define the pseudolikelihood estimate of θ to be thatvalue of θ which maximizes the pseudolikelihood function defined by:

˜(θ) =n∏

i=1

fX|Y (xi|yi; θ)fY |X(yi|xi; θ). (44)

According to Arnold and Strauss (1988) these estimators are consistent andasymptotically normal. In this kind of conditional models, these estimatorsare much easier to obtain than the maximum likelihood estimates.

4 Generalized Lognormal Conditionals

In this section we propose some bivariate conditional distributions based onsome extension of the lognormal distribution.

4.1 Bivariate Distributions with Log-Skew-Normal Con-ditionals

We are interested in the form of the density for a two dimensional randomvariable (X,Y ) such that:

for each y > 0, X|Y = y ∼ LSN (λ1(y), µ1, σ1), (45)

for each x > 0, Y |X = x ∼ LSN (λ2(x), µ2, σ2), (46)

for some functions λ1(x) and λ2(y), where µi and σi, i = 1, 2 are fixed andknown. We are able to identify two types of models from (45)-(46). Thefirst solution corresponds to the independence case with λ1(y) = λ1 andλ2(x) = λ2 constant. The second case corresponds to the dependence case,with solutions λ1(y) = λ(log y − µ2)/σ2 and λ2(x) = λ(log x − µ1)/σ1 andgives place to the joint pdf:

f(x, y; λ, µ, σ) =2

σ1σ2xyφ (z1) φ (z2) Φ (λz1z2) , (47)

12

Page 13: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

where zi are defined in (34)-(35). The new distribution has some interestingproperties. The density (47) has standard lognormal marginals together withlog skewed normal conditionals. The conditional moments are given by (r =1, 2, . . .),

E(Xr|Y = y) = 2 exp(rµ1 + r2σ21/2)Φ

λ1(y)σ1r√

1 + λ21(y)

, (48)

E(Y r|X = x) = 2 exp(rµ2 + r2σ22/2)Φ

λ2(x)σ2r√

1 + λ22(x)

. (49)

In particular, the corresponding regression functions are non-linear. From(48)-(49) it is not hard to obtain expressions for the moments of this distri-bution. Multimodality is again possible in this model (see Arnold, Castilloand Sarabia (2002)). Figure 2 shows two examples of bivariate distributionswith Log-Skew-Normal Conditionals with one and two modes.

4.2 Bivariate Distributions with Generalized Lognor-mal Conditionals

In this section the most general bivariate densities with generalized lognormalconditionals (given by (5) and (6)) are derived. First, we seek the bivariaterandom variable (X, Y ) such that the associated conditional distributionssatisfy:

X|Y = y ∼ LEP(β1, µ1, σ1(y)), (50)

Y |X = x ∼ LEP(β2, µ2, σ2(x)), (51)

where σ1(y) and σ2(x) are unknown functions. Note that the parametersβi and µi, i = 1, 2 are fixed and known. Using the Arnold and Strauss’sTheorem 1 we get,

f(x, y; β, µ, σ, c) =κ(β, c)

2πxyσ1σ2

exp{−1

2[|z1|γ1 + |z2|γ2 + c|z1|γ1|z2|γ2 ]

}, (52)

where γi = 2/(1+βi), i = 1, 2, zi are defined in (34)-(35) and c ≥ 0. The casec = 0 corresponds to the independence between X and Y . For the second

13

Page 14: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

24

68

2

4

6

8

00.0050.01

0.015

24

68

0 2 4 6 80

2

4

6

8

2.55

7.510

2.5

5

7.5

10

00.0020.0040.006

2.55

7.510

0 2 4 6 8 100

2

4

6

8

10

Figure 2: Examples of bivariate distributions with Log-Skew-Normal Condi-tionals.

14

Page 15: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

family, we seek the most general bivariate density of (X, Y ) such that theconditional densities satisfy,

X|Y = y ∼ LM(θ(1)(y)), (53)

Y |X = x ∼ LM(θ(2)(x)), (54)

where θ(i)(u) = (θ(i)1 (u), θ

(i)2 (u), θ

(i)1 (u), θ

(i)1 (u))>, i = 1, 2. Note that (6) is a

4-parameter exponential distribution and in consequence Theorem 1 can beused. Defining the vector:

v(x) = (x, log x , (log x)2 , (log x)3)> (55)

the most general density that satisfies (53) and (54) is given by,

f(x, y; m) = exp{v(x)>M v(y)

}, x, y > 0, (56)

where M = {mij} is a 5 × 5 matrix of parameters. The new distributiondepends on 24 parameters, and includes a lot of conditional models, includinglognormal and gamma conditionals. The {mij} parameters must be selectedin such a way that one of the marginal densities will be integrable.

5 An Empirical Application with the Euro-

pean Community Household Panel

In this section we present an application with income data using the in-formation contained in the European community household panel (ECHP).The ECHP is a standardized annual longitudinal survey which contains dataon individuals and households for the European Union countries with eightwaves available (1994-2001). The ECHP carried out by national data col-lection units and the statistical office of the European communities (EURO-STAT) provides support and coordination. The first wave in 1994 includedall current members of the European Union except Austria, Finland andSweden which were added in 1995, 1996 and 1997, respectively. The ECHPwas conceived as a 3-wave panel and has reached its 8th wave. The mainadvantage is that information is homogeneous among countries since thequestionnaire is similar across them. Thus, it is possible to make compar-isons across countries and over time. Also, it includes new information aboutlabour force behaviour, income, education, health, housing, migration, etc.

15

Page 16: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

It is very important to point out that this is the first fixed and harmonizedpanel for studying socio-economic factors of the households and individualsinside the European Union.

Table 1: Fitted models to ECHP data

Parameters Waves 1-3 Waves 1-3 Waves 3-6 Waves 3-6Classic Model Model (16) Classic Model Model (16)

m10 0.1302 0.1886 0.0449 0.3300m20 0.5551 0.5690 1.0142 1.2610m01 −0.0341 0.0159 −1.1059 −0.8837m02 0.8319 0.8362 1.4558 1.5166m11 −0.7943 −0.9508 −1.2722 −1.7192m12 − −0.0224 − 0.0633m21 − −0.0308 − −0.2833m22 − 0.0004 − 0.0159

log ˜ −4412.90 −4296.75 −216.20 −66.97

The total net income of each household and individual older than 16 yearsold is available and it covers the total income received from all sources. Theincome measure is disposable (after tax) individual income the year prior tointerview in constant 1992 prices. Thus, the interviews corresponding to thefirst eight waves of the ECHP were performed from 1994 to 2001, meaningthan the corresponding incomes refer, respectively, to a period ranging from1993 to 2000 (eight years).

In this study, we have used the Spanish microdata (approximately 10,500individuals) in order to test the sensitivity of the results to different hypothe-ses. The starting point of our analysis is the existence of information for thesame individuals in eight different periods. In particular, we will focus ourresults on waves 1, 3 and 6 in order to analyze income mobility in Spain. Wehave considered two sets of data: the waves 1 and 3 and the waves 3 and6. It is important to point out that we are working with a big number ofbivariate data with high variability (see Figures 3 to 6).

We have fitted to these two sets of data the classical bivariate lognormaldistribution (1) (parameterize in terms of mij’s parameters) and the bivariatelognormal conditional distribution (16) with δi = 0. Both models have been

16

Page 17: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

fitted maximizing the pseudo-likelihood function given in (44). The resultsappear in Table 1. This Table includes the estimators of the parameters andthe log of the pseudo-likelihood function. Note that the differences betweenthe logs of the pseudo-likelihood functions for the two models (classical andconditional) for both sets of data. This implies a very significant improvementin the fit of the bivariate lognormal conditional distribution. Figures 3 to6 show the data, the joint pdf and the contour plots corresponding to thefitted models.

Acknowledgments

The authors are indebted to the Spanish Ministry of Science and Technology(Project SEJ2004-02810) for partial support of this work.

References

Aitchison, J. and Brown, J.A.C. (1957). The Lognormal Distribution. Cam-bridge University Press, Cambridge.

Arnold, B.C, Castillo, E. and Sarabia, J.M. (1992). Conditionally SpecifiedDistributions. Lecture Notes in Statistics, Vol. 73. Springer Verlag,New York.

Arnold, B.C, Castillo, E. and Sarabia, J.M. (1999). Conditional Specifica-tion of Statistical Models. Springer Series in Statistics, Springer Verlag,New York.

Arnold, B.C., Castillo, E. and Sarabia, J.M. (2001). Conditionally SpecifiedDistributions: An Introduction (with discussion). Statistical Science,16, 249-274.

Arnold, B.C., Castillo, E. and Sarabia, J.M. (2002). Conditionally SpecifiedMultivariate Skewed Distributios. Sankhya, Ser. A, 64, 1-21.

Arnold, B.C. and Strauss, D. (1988). Pseudolikelihood Estimation. Sankhya,Ser. B, 53, 233-243.

17

Page 18: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

-2

0

2-2

0

2

00.250.5

0.751

-2

0

2-2 -1 0 1 2 3

-2

-1

0

1

2

3

1

2

3

1

2

3

0246

1

2

3 0.5 1 1.5 2 2.5 3 3.5

0.5

1

1.5

2

2.5

3

3.5

Figure 3: Bivariate lognormal model fitted to original data (above) andfitted to data in logarithms (below), joint pdf and (left) and contour plotsand data (right). Data: ECHP (waves 1 and 3).

18

Page 19: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

-2

0

2-2

0

2

00.250.5

0.751

-2

0

2-2 -1 0 1 2 3

-2

-1

0

1

2

3

1

2

3

1

2

3

02.55

7.510

1

2

30.5 1 1.5 2 2.5 3 3.5

0.5

1

1.5

2

2.5

3

3.5

Figure 4: Bivariate lognormal conditionals model fitted to original data(above) and fitted to data in logarithms (below), joint pdf and (left) andcontour plots and data (right). Data: ECHP (waves 1 and 3).

19

Page 20: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

-2

0

2-2

0

2

0

0.5

1

-2

0

2-1 0 1 2 3

-1

0

1

2

3

12

3

1

2

3

0

0.5

1

12

30.5 1 1.5 2 2.5 3 3.5

0.5

1

1.5

2

2.5

3

3.5

Figure 5: Bivariate lognormal model fitted to original data (above) andfitted to data in logarithms (below), joint pdf and (left) and contour plotsand data (right). Data: ECHP (waves 3 and 6).

20

Page 21: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

-2

0

2-2

0

2

00.250.5

0.751

-2

0

2-1 0 1 2 3

-1

0

1

2

3

1

2

3

1

2

3

00.51

1.5

1

2

30.5 1 1.5 2 2.5 3 3.5

0.5

1

1.5

2

2.5

3

3.5

Figure 6: Bivariate lognormal conditionals model fitted to original data(above) and fitted to data in logarithms (below), joint pdf and (left) andcontour plots and data (right). Data: ECHP (waves 3 and 6).

21

Page 22: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

Arnold, B.C. and Strauss, D. (1991). Bivariate Distributions with Condi-tionals in Prescribed Exponential Families. Journal of the Royal Sta-tistical Society, Ser. B, 53, 365-375.

Azzalini, A., Capello, T. and Kotz, S. (2003). Log-skew-normal and Log-skew-t Distributions as Models for Familiy Income Data. Journal ofIncome Distribution, 11, 13-21.

Barlow, R.E. and Proschan, F. (1981). Statistical Theory of Reliability andLife Testing: Probability Models. Silver Springs, MD.

Box, G.E.P. and Tiao, G. (1973). Bayesian Inference in Statistical Analysis.Addison-Wesley, Reading, MA.

Creedy, J., Lye, J.N. and Martin, V.L. (1997). A Model of Income Distri-bution. In: J. Creedy and V.L. Martin (eds.), 29-45.

Crow, E.L. and Shimizu, K.S. (eds.) (1988). Lognormal Distributions.Marcel Dekker, New York.

Gibrat, R. (1931). Les Inegalites Economiques. Librairie du Recueil Sirey,Paris.

Holland, P.W. and Wang, Y.L. (1987). Dependence Function for Contin-uous Bivariate Densities. Communications in Statistics, Theory andMethods, 16, 863-876.

Johnson, N.L., Kotz, S. and Balakrishnan, N. (1994). Continuous Uni-variate Distributions. Volume 1. Second Edition. John Wiley, NewYork.

Jones, M.C. (1996). The Local Dependence Function. Biometrika, 83,899-904.

Kalecki, M. (1945). On the Gibrat Distribution. Econometrica, 13, 161-170.

Kleiber, C. and Kotz, S. (2002). Statistical Size Distributions in Economicsand Actuarial Sciences. John Wiley, New York.

22

Page 23: with Lognormal Conditionals - unisi.it May/PAPER_Sarabia... · with Lognormal Conditionals Jos¶e Mar¶‡a Sarabiaa, Enrique Castillob, Marta Pascuala and Mar¶‡a Sarabiac a Department

Kmietowicz, Z.W. (1984). The Bivariate Lognormal Model for the Dis-tribution of Household Size and Income. The Manchester School ofEconomics and Social Studies, 52, 196-210.

Lye, J.N. and Martin, V.L. (1993). Robust Estimation, Nonnormalities,and Generalized Exponential Distributions. Journal of the AmericanStatistical Association, 88, 261-267.

Nalbach-Leniewska, A. (1979). Measures of Dependence of the Multivari-ate Lognormal Distribution. Mathematische Operationsforschung-SerieStatistik, 10, 381-387.

Vianelli, S. (1983). The Family of Normal and Lognormal Distributions oforder r. Metron, 41, 3-10

23