13
Bayesian Analysis of the Independent Multinormal Process. Neither Mean Nor Precision Known Author(s): Albert Ando and G. M. Kaufman Source: Journal of the American Statistical Association, Vol. 60, No. 309 (Mar., 1965), pp. 347- 358 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2283159 . Accessed: 07/04/2014 18:35 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 130.113.126.253 on Mon, 7 Apr 2014 18:35:43 PM All use subject to JSTOR Terms and Conditions

Ando and Kauffman 1965

Embed Size (px)

DESCRIPTION

1965 paper on Bayesian

Citation preview

Page 1: Ando and Kauffman 1965

Bayesian Analysis of the Independent Multinormal Process. Neither Mean Nor PrecisionKnownAuthor(s): Albert Ando and G. M. KaufmanSource: Journal of the American Statistical Association, Vol. 60, No. 309 (Mar., 1965), pp. 347-358Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2283159 .

Accessed: 07/04/2014 18:35

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 130.113.126.253 on Mon, 7 Apr 2014 18:35:43 PMAll use subject to JSTOR Terms and Conditions

Page 2: Ando and Kauffman 1965

BAYESIAN ANALYSIS OF THE INDEPENDENT MULTINORMAL PROCESS-NEITHER MEAN NOR PRECISION KNOWN*

ALBERT ANDO

University of Pennsylvania AND

G. M. KAUFMAN

Massachusetts Institute of Technology

Under the assumption that neither the meali vector nor the variance- covariance matrix are known with certainty, the natural conjugate family of prior densities for the multivariate Normal process is identi- fied. Prior-posterior and preposterior analysis is done assuming that the prior is in the natural conjugate family. A procedure is presented for obtaining non-degenerate joint posterior and preposterior distri- butions of all parameters even when the number of objective sample observations is less than the number of parameters of the process.

1. INTRODUCTION

IN THIS paper we develop the distribution theory necessary to carry out Bayesian analysis of the multivariate Normal process as defined in Section

1.1 below when neither the mean vector nor the variance-covariance matrix of the process is known with certainty. The development here generalizes Raiffa's and Schlaifer's treatment of the multivariate Normal process as done in Part B, Chapter 12 of reference [5], in which it is assumed that the variance- covariance matrix is known up to a particular multiplicative constant. We drop this assumption here.

In Section 1 we define the process, identify a class of natural conjugate distri- butions, and do prior-posterior analysis. The conditional and unconditional sampling distributions of some (sufficient) statistics are presented in Section 2. In particular, we prove that the distribution of the sample mean vector margi- nal with respect to the sample variance-covariance matrix, to the process mean vector, and to the process variance-covariance matrix is multivariate Student whenever the prior is in the natural conjugate family of distributionls. We then use the results of Sections 1 and 2 to do preposterior analysis in Section 3.

We also show in Section 3 that Bayesiall joint inference-finding joint posterior and preposterior densities of the mean vector and the variance-covari- ance matrix-is possible even when classical joint inference is not, i.e. whenl the number of objective sample observations is less than the number of dis- tinct elements of the mean vector and variance-covariance matrix of the process.

Geisser and Cornfield [3] and Tiao and Zellner [7] analyze the multivariate Normal process and multivariate Normal Regression process respectively under identical assumptions about the state of knowledge of the parameters of the process. Their presentations differ from that given here in three respects:

* The contribution by Ando to this paper is partially supported by a grant from the National Science Founda- tion. The authors wish to thank the referees for helpful suggestions and Jim Martin for proofreading early drafts, and to gratefully acknowledge valuable comments and criticisms given to them by members of the Decisions Under Uncertainty Seminar, conducted by Profeesor Howard Raiffa at the Harvard Business School.

347

This content downloaded from 130.113.126.253 on Mon, 7 Apr 2014 18:35:43 PMAll use subject to JSTOR Terms and Conditions

Page 3: Ando and Kauffman 1965

348 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1965

first, following the lead of Jeifreys [4], both sets of authors assume that the joint prior on the mean vector and variance-covariance matrix is a special (degenerate) case of a natural conjugate density; second, here we find sampling distributions unconditional as regards the parameters of the process and do pre- posterior analysis; third, by doing the analysis for the complete natural conju- gate family, we are able to provide a procedure for deriving joint posterior and some joint preposterior distributions of all parameters under conditions men- tioned in the paragraph immediately above.

1.1 Definition of the Process

As in Anderson [1], we define an r-dimensional Independent Multinor- mal process as one that generates independent r X1 random vectors x(l), x(i), * *with identical densities

-oo < x < oo,

fN (X t , h) =(2gr) iexp [-(X- p)'h(x - p)] h 2 -?? < p < ??(1 h is PDS.

1.2 Likelihood of a Sample The likelihood that the process will generate n successive values xl), . . .

x(n) is

(27r)-irn exp [-(x( )x(j) -)] h Iin. (2)

If the stopping process is non-informative, as defined in reference [1], this is the likelihood of a sample consisting of n observations x(l), * * , x(i), ,(n).

When neither h nor ti is known, we may compute these statistics:

1 m - Ix(P) v n - r (redundant), (3a)

n

and

V -_(x -m)(x() - m)t. (3b)

It is well known' that the kernel of the joiint likelihood of (m, V) is, provided V>0P

exp [-Xn(m - tt)th(m -)] h h Iexp [-2 tr hV] I h I(+r1), (4a)

the kernel of the marginal likelihood of m is, provided V(nu) >O,

exp [-in(m - )th(m -h )] hI (4b)

and the kernel of the marginal likelihood of V is, provided v > 0, and V is PDS,

exp [-tr hV] I h 0,(+ r- ). (4c)

Formula (4c) is the kernel of a Wishart distribution. A random matrix V of dimension (r Xr) will be called "Wishart distributed

I See for example, Anderson [11, Theorem 3.3.2 and pp. 154-60.

This content downloaded from 130.113.126.253 on Mon, 7 Apr 2014 18:35:43 PMAll use subject to JSTOR Terms and Conditions

Page 4: Ando and Kauffman 1965

BAYESIAN ANALYSIS OF THE MULTIVARIATE 349

with parameter (h, v)" if

V fw (V| h, v)

fw(r, v) | V| 'exp [-2trhV] I h Ii+r-) ifVisPDSandv>O? 0 otherwise,

where

w(r, v) -I2i(y+r)r7r(r-1)/4 r(,(v + r - (4e) i_1

That (m, V, v) defines a set of sufficient statistics for (La, h) is shown in section 3.3.3 of [1].

We will wish to express (4a) in such a fashion that it automatically reduces to (4b) when only (m, n) is available and to (4c) when only (V, n) is available, In addition, we will wish to treat the cases that arise when V is singular. Hence we define

{n -1 v < 0 if O IJ > OX(5a)

= {V if V is non-singular (5b) t 0 otherwise

and

d= < if . (5c) t0 n-=0

In terms of (5a), (5b), and (5c) we may rewrite (4a), (4b), and (4c) as

exp [-2n(m - Lt)th(m -L)] I h | I exp [-2 tr hV*] I h i-4+t-1) (4a') exp [--n(m - 8t)th(m-n h - iLa (4b')

exp [--I tr hV*] |h|i-+1.(4c')

Notice that (4a) is now defined even when v <0. By adopting the convention that (1) V* = 0 and b = n-1 when V is unknown or irrelevant, and (2) n =0 when m is unknown or irrelevant,

the kernel (4a') reduces to (4b') in the first case and to (4c') in the second.

1.3 Conjugate Distributions of (j, h), - and h

When both U and ih are random variables, the natural conjugate of (4a') is the Normal-Wishart distribution ff)((U, h m, V, n, v) defined as equal to

k(r,v)exp [-In(i-m) -h(U-m)] I hlisexp [-ItrhV*]|V* i(V+r-l) I hIIl-1

JfN (L I m, hn)fw (h I V, v) if n > 0 and v > O 6

O otherwise,

This content downloaded from 130.113.126.253 on Mon, 7 Apr 2014 18:35:43 PMAll use subject to JSTOR Terms and Conditions

Page 5: Ando and Kauffman 1965

350 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1965

where V* and 6 are defined as in (5), and

k(r, v) (2Yr)-IrnIrw(r, v). (6b)

If (6a) is to be a proper density function V must be PDS, v>0 and n>0 so that in this case a=1 and V* = V is PDS. We write the first expression in (6a) with 6 and V* so that formulas for posterior densities will generalize auto- matically to the case where prior information is such that one or more of these conditions hold: V is singular, n =0, v =0.

We obtain the marginal prior of -i by integrating (6a) with respect to h; if v >0, n >, 0V is PDS, and we define H, = vnV-, then

D(ttim, V, n, v) (|n, H, v) v [v + (ti -m)H,(# - m)]2 (+r). (6c)

This distribution of j is the non-degenerate multivariate Student distribution defined in formula (8-26) of [5].2

Proof: We integrate over the region Rh -{ h h is PDS }:

D( | In, V, n, v)-A ff (i I m, hn)fw (h I V, v)dh Rh

oc exp [-2 trh{n(ti - m) (p - m)t + V}I h I(v-+s)-ldh. Rh

But as the integrand in the integral immediately above is the kernel of a Wishart density with parameter (n(p-m)(p-m)t+V, v+8),

D(t I m, V, n, v) X [1 + (tt -m) t(nV-I)(ti - m)]dv+1+r1).

Provided that v>O, V is PDS and n>O, H,v-nV-1 is PDS, 6=1, and we have (6c).

If v> 0 but V is singular, V*-1 does not exist so neither does the marginal distribution of . And if n=O the marginal distribution of - does not exist.

Similarly, we obtain the marginal prior on h by integrating (6a) with respect to t. If v>0, n?0 and V is PDS, then

D(h I m, V, n, v) = fw (h I V, v) oc exp [- tr hV] I h 2 . (6d)

If a Normal-Wishart distribution with parameter (zn', V', n', v') is assigned to (ji, h) and if a sample then yields a statistic (m, V, n, v) the posterior distri- bution of (Ui, h) will be Normal-Wishart with parameter (mn", V*", n", v") where

nt = n' + n, 6" = n1 >

, , ml" = n"l1(n'm' + nm), (7a)

V"/ = v' + v + r + 6 + S'-6"-4-1 (7b)

5 Cornfield and Geisser [3] prove a similar result: if the prior on (ui, h) has a kernel hi (112)v -1>0, and we observe a sample which yields a statistic (in, V, n, v), v >0, then the marginal posterior distribution of Lu is multivar- iate Student.

This content downloaded from 130.113.126.253 on Mon, 7 Apr 2014 18:35:43 PMAll use subject to JSTOR Terms and Conditions

Page 6: Ando and Kauffman 1965

BAYESIAN ANALYSIS OF THE MULTIYARIATE 351

fV' + V + n'm'mmt + nmmt - n"m"m"t _ V" if V" is PDS V*"I= . (7c)

V 0 otherwise

Proof: When V' and V are both PDS, the prior density and the sample likeli- hood combine to give the posterior density in the usual manner. When either V' or V or both are singular, the prior density (6a) or the sample likelihood or both may not exist. Even in such cases, we wish to allow for the possibility that the posterior density may be well defined. For this purpose, we define the posterior density in terms of V' and V rather than V'* and V*. Thus, multi- plying the kernel of the prior density by the kernel of the likelihood, we obtain

exp [--n'(L-m')th( im')] j h i' exp [-4 trhV']| h |i'-1 *exp [-4n(m - p)th(m-p)] I h|*exp [-4 trhV] | h I(Y+r-l1) (7d)

- exp [-2S] I h II/ exp [-4 trh(V' + V)] I h II(v'+1+r+5'+5-5"-?-1)-

where

S n'(p -m')th(Lp - m') + n(m - t)th(m -).

Since h is symmetric, by using the definitions of (7a), we may write S as

(p- m")t(hn")(p - ml") - mi"t(hn")mI" + m't(hn')m' + mt(hn)m.

Now, since

mn't(hn')m' + mit(hn)m - m"t(hn")mI" = tr h[n'(m'mr") + n(mmz-n) -n(m"m')],

by defining v" as in (7b), V" and V*" as in (7c), we may write the kernel (7d) as

exp [-2(,j #- m") t(hn") (U - m")] h | exp [-4 tr hV*"] j h | i which is the kernel of Normal-Wishart density with parameter

(m'", V", n", v"').

We remark here that a prior of the form (6a) lacks flexibility when v is small because of the manner in which this functional form interrelates the distribu- tion of -i and h-. The nature of this interrelationship is currently being examined and will be reported in a later paper.'

2. SAMPLING DISTRIBUTIONS WITH FIXED n We assume here that a sample of size n is to be drawn from an r-dimensional

Independent Multinormal process whose parameter (j, h) is a random vari- able having a Normal-Wishart distribution with parameter (m', V', n', v').

2.1 Conditional Joint Distribution of (m, 'V , h) The conditional joint distribution of the statistic (mz, V) given that the

process parameter has value (Ui, h) is, provided v>O, 3 Ando, A., and Kaufman, G., 'Extended Natural Conjugate Distributions for the Multinormal Process.'

This content downloaded from 130.113.126.253 on Mon, 7 Apr 2014 18:35:43 PMAll use subject to JSTOR Terms and Conditions

Page 7: Ando and Kauffman 1965

352 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1965

(r) 00~~~~~~~~1' D(m, V i t h, v) = fN (m I U, hn)fw (V I h, v) (8)

as shown in section 1.

2.2 Siegel's Generalized Beta Function

Siegel [6] established a class of integral identities with matrix argument that generalize the Beta and Gamma functions. We will use these integral identities in the proofs that the unconditional sampling distributions of zm, of V and of (m, V) are as shown in sections 2.3, and 2.4, and 2.5. (In fact, the integrand in Siegel's identity for the generalized Gamma function is the kernel of a Wishart density.)

Let X be (r Xr) and define

IPr(a) = Irr(r-1)14r(a)r(a- r) * (a- ), (9a)

B,(a, b) = r,(a) r+(b) (9b) rr(a + b)

where a > (r -1) /2, b > (r -1) /2. Siegel established the following integral identi- ties: letting Rx x IX is PDS},

f X a-I(r+1)

RX I I + X la+b dX = Br(a, b). (9c)

Defining Y = (I +X)'X, letting V and B be real symmetric matrices and let- ting v < Y < B denote the set I Y I B-Y, Y-V are PDS }; then

fw t yla- (r+1) I I - y lS(r+1)dY _ B,(a, b) (9d) Ry

where the domain of integration Ry { Y < Y <I }. We shall define the standardized generalized Beta density function as

a 2>(r -1),

fl*(Y I a, b) Br'(a, b) Y a *r+1) - Y _2(r+1) b > 2(r - 1), (9e) Y Ry.

Similarly the standardized inverted generalized Beta density function is defined as

> >(r-r1),

B(XI a, bV)-[Brr(a )+) b >( - 1 (9g)

VeRvR {VsVisPDS}.

This content downloaded from 130.113.126.253 on Mon, 7 Apr 2014 18:35:43 PMAll use subject to JSTOR Terms and Conditions

Page 8: Ando and Kauffman 1965

BAYESIAN ANALYSIS OF THE MULTIVARIATE 353

The functions (9e) and (9f) are related via the integrand transform Y= (I+X)'-X, which has Jacobian J(Y, X) - YI -(r+l). The function (9g) may be transformed into (9f) by an integrand transform TXTt=V, where T is a nonsingular upper triangular matrix such that TTt = C.

2.3 Unconditional Joint Distribution of (m, V)

The unconditional (with respect to (~, h)) joint distribution of (mz, V) has density

D(m, V I m', V', n', v'; n, v)

= fRI fhf (m |t, hn)fw (V I h, v)fw(tp, h I|', V',, v')dLadh (10a)

where the domain of integration Rt of U, is (-oo, + oo) and Rh of h is lhi h is PDS }.

D(Mi, Vj ni', V', n'; n, v) oc V + C (lOb)

where

n'n nu , and C = nu(m -m')(m MI)t + V'. (IOc)

n' + n

Proof: Using (4a') and (8) we may write

D(m, V I m', n', PI; n, v)

= Lh [ fX fIN(m I t hn)f;N (ts I m', hn')d .fw (V I h, v)fw (h V', v')dh. Rh LLU

The inner integral isf() (ml m', hnu). Hence the total integral mlay be written as

fRx (m m m', hnu)fw (V I h, u)fw (h I V', v')dh

which upon replacing the fN and fw's by their respective formulas and dropping constants becomes

I| 1 Vn fm exp [-nU(m- m')th(zm - m')] Rh

*exp [-2 tr h(V' + V)] I h Ii(y++tra+V-V'-b-1)_ldh.

Using the definitions of (7), this equals

| Viw-f exp [- tr h { n,(m - m') (m - m') t + V' + V} ] I h I jI"-1dh. Rh

Letting B nu (rn - m') (m - m') t+V+V' from (4c) we see that the integrand in the above integral is, aside from the multiplicative constant

w(r, i") I B I(i"+r-')

This content downloaded from 130.113.126.253 on Mon, 7 Apr 2014 18:35:43 PMAll use subject to JSTOR Terms and Conditions

Page 9: Ando and Kauffman 1965

354 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1965

a Wishart density with parameter (B, v"). Hence, apart from a normalizing constant depending on neither V nor B,

xTjlv-i D(m,VIm', V', n', v';n, v) M t

I Bli(p I+ V [1) lvl

| V + V' + n(n(m -m')(m - ml)t i v'+r_i) proving (lOb).

2.4 Unconditional Distribution of m~

The kernel of the unconditional (with respect to (j-, h)) distribution of rmn can be found three ways: by utilizing the fact that 'm-U -j and U- are conditionally independent given h = h and then finding the unconditional distribution of m as regards h. By integrating D(m, V| I', n', vI; n, v) over the range of vT when this distribution exists, i.e., over Rv {V V is PDS}; or we may find it by integrating the kernel of the marginal likelihood of m- defined in (4b) multi- plied by the kernel of the prior density of (U, h) over RU and Rh. The first and third methods have the merit of in no way depending on whether or not V is singular-that is, even if ?<O (n<r), the proofs go through, which of course is not the case if we proceed according to the second method. We show first by the second method that when v>0, v' >0, and V*'= V' is PDS,

D(mn n m', V', n', v'; n, v) = j"D(m, V I m', V', n', v'; n, v)dV (1la)

i [I1- n+(n-(m -m') tV'l(r-nm )]- (-'+r)

We then show by the first method that this result holds even when v <0. If we define H,= v'n.V'-I, then (hla) may be rewritten as

[v' + (m - m')tH( (m- m')]-i(;'+r) (lib)

Provided v'>0, nu>O and H, is PDS this is the kernel of the nondegenerate Student density function with parameter (m', H,, v') as defined in formula (8-26) of [5]. Proof: In (lOb) let a=(21v-1)+1(r+l) and b= 1(v"+r-1)-a=-'(v'+r). Then the kernel of the unconditional distribution of flm is proportional to

| Ia- |12(r+ 1)

I V -C + dV (12)

where Rv= {VI V is PDS }. Then by (9g),

D(m I m', V', n'; n) M I n.(m - m')(m - mn')I + V' I-b -1+ (m -MI) tnuVf-l(M - M))-b,

establishing (1 la).

Alternate Proof: Another way of establishing (11) is to use the fact that the kernel of the marginal likelihood of m'n for n>0 observations given the param-

This content downloaded from 130.113.126.253 on Mon, 7 Apr 2014 18:35:43 PMAll use subject to JSTOR Terms and Conditions

Page 10: Ando and Kauffman 1965

BAYESIAN ANALYSIS OF THE MULTIVARIATE 355

eter (t, h) is by (5) and (4b') whether or not v <0

exp [-2n(m - U) Ith(m - tj| h |it (4b)

Furthermore, conditional on h= h, t and I --m-t are independent Normal random vectors; and so m ' U+ e is Normal with mean vector

E(mh) =E(U) E(e) = rn' + O = m'

and variance-covariance matrix

V(mxi) = V(j) + V(t) = (hnu)-1.

Thus integrating with respect to h,

D(rn I V', n', v'; n, v)

f exp [-2 tr h{n,,(rn - m ')(m - rn')t + V'}] h I"(V'+1)-ldh. Rh

As the integrand in the above integral is the kernel of a Wishart density with parameter (n,(n(m-m')(m-m')t+V', v+1),

D(m I V', n', v'; n, v) oc n(nu(m - m')(m - mn')t + V, -i(^'+r)

and (11) follows directly.

2.5 Unconditional Distribution of V When v>0 The kernel of the unconditional distribution of V can be found by integrating

D(m, VI m', VI', n', v'; n, v) over the range of min, or by integrating the product of the kernel of the marginal likelihood (4c) of V and the kernel of the distri- bution (5a) of (ui, hL) with parameter (m', V', n', v') over the range of (U, h). We show by the former method that for a>-(r-1) and b>-(r-1), when v>O and V is PDS

lV |a-j(r+1)(ia D(V I m' VtIv n ; n) oc

I t+Vl+ (16a) D(VIrn,V',n'n) LXIVI +VI a+b

where a = 2 (v+r-1) and b= (v'+r-1). Letting K be a non-singular upper triangular matrix such that KKt = V' we may make the integrand transform KZKt=V and write (16a) as

D(ZIrn',V',n';n) | Z |a-i(r+1)(1b D(Z|I m, , n ; n); o- zl+ (16b)

Formula (16b) is the kernel of a standardized inverted generalized Beta func- tion with parameter (a, b).

Proof: From (lOb)

D(m, V I rn', V', n'; n, v)

I| V it'-1 V' + V + n(r(m-m')(m-rn') K-i("+r-l).

Conditional on V = V, the second determinant is the kernel of a Student density of in with parameter (mn, nu(v"-l)[V'+V], v"-l). That part

This content downloaded from 130.113.126.253 on Mon, 7 Apr 2014 18:35:43 PMAll use subject to JSTOR Terms and Conditions

Page 11: Ando and Kauffman 1965

356 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1965

of the constant which normalizes this Student density and which involves V is IV'+V1 -( +-); hence (16a) follows.

Now since V' is PDS there is a non-singular triangular matrix K of order r such that KKt=V'. If we make the integrand transform KZKt=V', and let J(Z, V) denote the Jacobian of the transform, the transformed kernel may then be written as

I KZKt Ia-+(r+1) Z Ia- (r+1)

'J(Z, V) = I KK' 1-b-i(1+1)J(Z, V) - KKt + KZKt ia+b V I + zJl+b Since J(Z, V) = I Ki r+1=1 KKtj 2(r+l) = I VIT I(r+l), we may write the trans- formed kernel as shown in (16b.).

3, Preposterior Analysis with Fixed n>O

We assume that a sample of fixed size n>O is to be drawn from an r-dimen- sional Multinorinal process whose mean vector ti and matrix precision h are not known with certainty but are regarded as random variables (t, h) having a prior Normal-Wishart distribution with parameter (in', V', n', v') where n'>O, v'>O, but V' may or may not be singular.

3.1 Joint Distribution of (mi", V") The joint density of (mi", V") is, provided n'>O, v'>O, and v'>O is

D(m", V" I ml) V', n', v'; n, v) | V"-VI-*(m"- m')("-t 1)t

l*v oc VVI/~1 (17a)

where

n* = ntn"/n, (17b)

and the range of (m", V") is

R (rn,, v { (m" V") - oo < m" < + oo and V" -C is PDS}. (17c)

Proof: Following the argument of subsections'12.1.4 and 12.6.1 of [5] we can establish that

nu(m - m')th(m - m') - n*(m" - m')th(m" - M') (18a)

where n* = n'n"/n. The same line of argument allows us to write

V" = V' + V + n'(m'm')t + n(mmt) - n"(M"i"t)

= V' + V + n*(m" - m')(m - m')t. (18b)

From (7) we have

(m, V) = -(n"n" - nIM'), V" - V- n*(m" - m')(m" -m) n

Letting J(m", V"; m, V) denote the Jacobian of the integrand transformation from (m, V) to (m", V"), we make this transformation in (lOb), obtaining (17a). Since J(m", V"; m, V) =J(m", m) J(V", V) and both J(m", m)

This content downloaded from 130.113.126.253 on Mon, 7 Apr 2014 18:35:43 PMAll use subject to JSTOR Terms and Conditions

Page 12: Ando and Kauffman 1965

BAYESIAN ANALYSIS OF THE MULTIVARIATE 357

and J(V", V) are constants involving neither m" nor V", neither does J(m", V't; rn, V). When v<0 and V is singular the numerator in (17a) vanishes, so that the density exists only if v>0. However, if v>0, the kernel in (17a) exists even if V' is singular (V'* = 0). Hence we write the kernel as shown in (17a).

That the range of (m", V") is R(m,," v,j as defined in (17c) follows directly from the definitions of mi" and V", of Rm and Rv, and of zn' and V'.

3.2 Some Distributions of m'u"

The unconditional distribution of xm" is easily derived fronm the unconditional distribution (1lb) of m~: provided n > 0, n'> 0, vI> 0, and V' is PDS,

D (m rn'' VIn' , VIn, f) ( I | ' (n"/ _H V) (19a)

where

Hi= v'nuV'1- (19b)

and the conditional distribution of m-n" given V" 'V" is, provided '>0, n'>0, Yt>O,

D(m" I m', V', n', V'; n, v', V") = f.s (n" I m', H*, v) (20a)

where

H= vn*("- V')1. (20b)

The right-hand side of (20a) is the inverted Student density function with parameter (m', H*, v) as defined in formula (8-34) of [5].

Proof: Since m" = (n")-' (n'im'+n) anld since fromn (lib) when n > 0 n'> 0 vI >0, V' is PDS, aind

m ~V(m | zn', H., V)

by Theorem 1 of subsection 8.3.2. of [1],

- 1 (r) I 2 fl1 ,-I fs (izn" m', (n"/n) H,, V,).

To prove (20) observe that the kernel of the conditional distribution of m" given V" V" is proportional to (17a), and so

D(mn" I m', V', n', V; n, v, V")

| - V' - n*((m" - n')(in" - M I)t1.

Since

- -V' = v + n*(m" - Mn') (MI - T)t

(V"-V') will be PDS, as long as v>0, and so (V"-V')-' is also PDS. Using a well known determinental identity and letting H* be as defined in (20b), when V" -V' is PDS we may write the density of mi " as

[l- n*(n" - m')t(V" - V')'J(rn" -MI)PP-_ = P-j+1l[- (in" - rn')tH*(trn -m')]i l,

This content downloaded from 130.113.126.253 on Mon, 7 Apr 2014 18:35:43 PMAll use subject to JSTOR Terms and Conditions

Page 13: Ando and Kauffman 1965

358 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1965

which, aside from the constant vzr,4, is the kernel of an inverted Student density function with parameter (in', H*, v).

3.3 Analysis When n < r

Even when n <r, it is possible to do Bayesian inference on (Ur, h) by appropri- ately structuring the prior so that the posterior of (U, h) is nondegenerate.

For example, if the data generating process is Multinormal, if we assign a prior on ('j, hi) with parameter (0, 0, 0, 0), and then observe a sample x(l), z(10 where n <r, then v <0 and the posterior parameters defined in (7) assume values

n" =n, " = p + r- 1 =O, mr" = m, V"*-0.

The posterior distribution of (Ui, h) is degenerate under these circumstances. If, however, we insist on assigning a very diffuse prior on (U, h), but are will-

ing to introduce just enough prior information to make V" non-singular and v">O then the posterior distribution is non-degenerate; e.g., assign J= 1, V'=MI, M>?, and leave n'=O and m'=O, so that P"'=l, V"=V+V'=V +MI. In this case we have a bona fide non-degenerate Normal-Wishart posterior distribution of (Ui, h).

Furthermore, the unconditional distribution of the next sample observation j(n+l) exists and is, by (l lb), multivariate Student with parameter

(m,(n+ )(V + MI)-', 1).

In addition, for this example the distribution, unconditional as regards (U, Ih), of the mean x of the next no observations exists even though n <r and is by (llb) multivariate Student with parameter (m, n,(V+MI)-', 1), where n,,,=non/no+n. This distribution is, in effect, a probabilistic forecast of i. From (19a) it also follows that the distribution, unconditional as regards (ti, h), of the posterior mean ri" prior to observing x(n+l), * *, x(n+n ) is multi- variate Student with parameter

(m, n(l + ) (V + M I)-', ) .

REFERENCES

[1] Anderson, T. W., Introduction to Multivariate Statistical Analysis. New York: John Wiley and Sons, Inc., 1958.

[2] Deemer, W. L., and Olkin, I., "The Jacobians of certain matrix transformations useful in multivariate analysis, Biometrika, 40 (1953), 43-6.

[3] Geisser, Samuel, and Cornfield, Jerome, "Posterior distributions for multivariate normal parameters," Journal of the Royal Statistical Society, Series B, 25 (1963), 368- 76.

[4] Jeifreys, H., Theory of Probability. London, Oxford University Press, Amen House, 1961.

[5] Raiffa, H., and Schlaifer, R., Applied Statistical Decision Theory. Boston, Mass- achusetts; Division of Research, Harvard Business School, 1961.

[6] Siegel, C. L., "tJber die analytische Theorie der quadratischen Formen," Annals of Mathematics, 36 (1935), 527-606.

[7] Tiao, George C., and Zellner, Arnold, "On the Bayesian estimation of multivariate regression," Journal of the Royal Statistical Society, Series B, 26 (1964), 277-85.

This content downloaded from 130.113.126.253 on Mon, 7 Apr 2014 18:35:43 PMAll use subject to JSTOR Terms and Conditions