38
DEVELOPMENT OF THE NOTION OF STATISTICAL DEPENDENCE - PART II H. 0. Lancaster (received 5 February, 1982) 1. Introduction We continue now the discussion of Lancaster [73] , to which we will refer as Part I, published in this Chronicle, on the development of the notion of dependence with emphasis on distribution theory, expanding some of the points made in the later sections of that article and bringing the history of some developments up to the present time. Discussions of the "multivariate analysis" type are specifically excluded from consideration and so also are those arising from the models of no-interaction in the "multiplicative theory" of inference for multivariate distributions. Historical comments, especially on the orthogonal theory, are also available in Lancaster [71, 74]. 2. Joint Normality by Abstraction from Observation Difficulties, to be mentioned in a later paragraph, were experienced by F. Galton in his studies on the distributions of physical and mental characters in members of families; some of these characters could be measured and were seen to be distributed approximately in the normal distribution. Galton [35] recognized that proper generality will only be attained by treating the normal distribution as an existent distribution; he says on his page 289 "Quetelet, apparently from habit rather than theory, always adopted the binomial law of error, basing his tables on a binomial of high power. It is absolutely necessary to the theory of the present paper to get rid of binomial limitations and to consider the law of devia tion or error in its exponential form." Thus, in one-dimensional Math. Chronicle 12(1983) 55-92. 55

DEVELOPMENT OF THE NOTION OF STATISTICAL ... OF THE NOTION OF STATISTICAL DEPENDENCE - PART II H. 0. Lancaster (received 5 February, 1982) 1. Introduction We continue now the discussion

  • Upload
    vantu

  • View
    235

  • Download
    3

Embed Size (px)

Citation preview

DEVELOPMENT OF THE NOTION OF STATISTICAL DEPENDENCE

- PART II

H. 0. Lancaster

(received 5 February, 1982)

1. Introduction

We continue now the discussion of Lancaster [73] , to which we will

refer as Part I, published in this Chronicle, on the development of the

notion of dependence with emphasis on distribution theory, expanding some

of the points made in the later sections of that article and bringing the

history of some developments up to the present time. Discussions of the

"multivariate analysis" type are specifically excluded from consideration

and so also are those arising from the models of no-interaction in the

"multiplicative theory" of inference for multivariate distributions.

Historical comments, especially on the orthogonal theory, are also

available in Lancaster [71, 74].

2. Joint Normality by Abstraction from Observation

Difficulties, to be mentioned in a later paragraph, were experienced

by F. Galton in his studies on the distributions of physical and mental

characters in members of families; some of these characters could be

measured and were seen to be distributed approximately in the normal

distribution. Galton [35] recognized that proper generality will only be

attained by treating the normal distribution as an existent distribution;

he says on his page 289 "Quetelet, apparently from habit rather than theory,

always adopted the binomial law of error, basing his tables on a binomial

of high power. It is absolutely necessary to the theory of the present

paper to get rid of binomial limitations and to consider the law of devia­

tion or error in its exponential form." Thus, in one-dimensional

Math. Chronicle 12(1983) 55-92.

55

distributions, he wishes to introduce the normal density determined by

its mean and standard deviation, free of any reference to a limiting

process by which it could be obtained.

Galton [35] on his page 291 introduces his notion of reversion,

which, he thinks, prevents the particular character from attaining an in­

creasingly high dispersion in each generation. "By family variability is

meant the departure of the children of the same, or similarly descended

families, from the ideal mean type of all of them. Reversion is the

tendency of that mean filial type to depart from the parent type, 're­

verting' towards what may be roughly and perhaps fairly described as the

average ancestral type."

Galton [36] on his page 246, has returned to the analysis of

dependence, particularly emphasizing the linear regression. He has also

obtained, by smoothing the observed frequencies, ellipses of equal probabi­

lity density in earlier papers. He has shown the results to H.W. Watson,

as we shall see later in the quotation from his memoirs, and Watson has

not made the final step for him; however, J.D. Hamilton Dickson [20] in

an appendix to the paper of Galton [38] showed essentially by method

(vi) of § 8 that the probability density function was proportional to

the exponential of a quadratic expression in the variables. Dickson

further showed that, from the analytical form of the density, all the

observed properties of regression and constant homoscedasticity could be

deduced. Galton [36] expresses his pleasure with the sentence, "I may

be permitted to say that I never felt such a glow of loyalty and respect

towards the sovereignty and magnificent sway of mathematical analysis as

when his answer reached me, confirming, by purely mathematical reasoning,

my various and laborious statistical conclusions with far more minuteness

than I had dared to hope, for the original data ran somewhat roughly,

and I had to smooth them with tender caution."

As a postscript, we may add an extract from Memories of my life,

56

written by Galton [40] in his old age:

"The mathematician who most frequently helped me later on was the

Rev. H.W. Watson, who moreover worked out for me the curious question of

the 'probability of the Extinction of Families'... It appeared in 1875

in the Proceedings of the Royal Society as a joint paper, at his desire;

but all the hard work was his: I only gave the first idea and the data.

He helped me greatly in my first struggles with certain applications of

the Gaussian Law, which, for some reasons that I could never clearly per­

ceive, seemed for a long time to be comprehended with difficulty by

mathematicians, including himself. They were unnecessarily alarmed lest

the well-known rules of Inverse Probability should be unconsciously

violated, which they never were. I could give a striking case of this,

but abstain because it would seem depreciatory of a man whose mathematical

powers and ability were far in excess of my own. Still, he was quite

wrong. The primary objects of the Gaussian Law of Error were exactly

opposed, in one sense to those to which I applied them. They were to get

rid of, or to provide a just allowance for errors. But these errors or

deviations were the very things I wanted to preserve and to know about.

This was the reason that one eminent living mathematician gave me.

"The patience of some of my mathematical friends was tried in en­

deavouring to explain what I myself saw very clearly as a geometrical

problem, but could not express in the analytical forms to which they

were accustomed, and which they persisted in misapplying. It was a gain

to me when I had at last won over Mr. Watson, who put my views into a more

suitable shape. H.W. Watson was Second Wrangler of his year, and had the

reputation among his college fellows of extraordinary subtlety and insight

as a mathematician. He was perhaps a little too nice and critical about

his own work, losing time in overpolishing, so that the amount of what he

produced was lessened. He wrote on the 'Kinetic Theory of Gases."

The doubts of Watson [107] were as follows:

57

"Suppose a man to attempt to cover one foot by a hop, and then the

same or another man to attempt, starting from the end of the hop, to

cover another foot by a stride, and suppose you know that the law of

error constant of the hopper was (a) and that of the strider was (b) ,

you know by what I have already said that the law of error of the

resultant of the hop and stride is

2x

" ~Th 2 2 2

e where h = a + b . Suppose then the given elements to be

modified, and you do not know the law of error constant of the strider

(&) , but you know that of the resultant of the hop and stride (h) as

well as that of the hopper, may you not infer that the error constant of2 2 2

the strider is b where b = h - a

2 2 2"In fact, may you not treat the formula h = a + b as an ordinary

2 2 2 equation, and deduce from it b = h - a ?

"My first impulse was to say decidedly No, you must infer that the

skill constant of the strider was

+ a2 and not A 1 - a .

"I said it is really a case of composite error, in which the error

constant is, as stated above, the square root of the sum of the squares of

the component error constants, equally in the case of a difference and a

sum.

"Here, I said, you have a series of observations or results, viz.,

the sum of stride and hop, whose variations by defect or excess from a

certain mean result follow the law of error with error constant {h),

and you have also a series of hops deviating, according to the error law,

with constant (a) , then by our general principle the remainder or stride

error, the stride being the difference of the total and the hop, must

58

follow the same law with constant \fh2 + a.2 ."

Let us review Galton's difficulties. First, he had to consider the

univariate distribution free of extraneous parameters. Second, he had to

set up an ensemble of distributions of a random variable Y conditional

on values of another variable X . He noticed that the joint density of

X , Y was constant on ellipses. H.W. Watson has failed to help him but

J.D. Hamilton Dickson has obtained the "surface of frequency" by a multi­

plication of the marginal and conditional densities. In this form, it

is evident that not only is Y linearly regressed on X with constant

variance but the same is true for the regression of X on Y . Galton

[36,37,38] recognizes this reciprocal relation in biological terms by

noting that not only is the height of a son linearly regressed on that

of his father but also the height of a father is linearly regressed on

that of his son. The mean and variance of heights can thereby be

conserved from one generation to the next.

Weldon [108] set out the application of Galton's work as follows:

"The results recorded lead to the hope that, by expressing the

deviation of every organ from its average in Mr. Galton's system of

units, a series of constants may be determined for any species of animal

which will give a numerical measure of the average condition of any

number of organs which is associated with a known condition of any one

of them. A large series of such specific constants would give an al­

together new kind of knowledge of the physiological connexion between

the various organs of animals; while a study of those relations which

remain constant through large groups of species would give an idea,

attainable at present in no other way, of the functional correlations

between various organs which have led to the establishment of the great

subdivisions of the animal kingdom."

59

3. Karl Pearson and Non-normal Correlation

Pearson [85] expresses dissatisfaction with the use of the joint

normal density as the paradigm of joint distributions. He remarks that

the normal curve of errors presupposes three equally important principles:

(i) an indefinitely great number of "contributory" summands

(ii) symmetry in the generating distributions

(iii) independence of the summands.

We can see now after later work on the central limit theorem that (ii)

is unnecessary and that good approximations are obtained with moderate

numbers of summands in (i) and that (iii) can be weakened. Pearson

points out that independence of the summands is violated in constructing

artificial examples with the aid of the card pack or roulette wheel and,

perhaps, often in nature too. He concludes that the introduction of these

skew curves leads to two important conclusions; first, if the material is

"heterogeneous" (i.e. non-normal) we have no right to suppose that it is

a mixture of normal distributions and second, if the material obeys a non­

normal law the theory of correlation as developed by F. Galton and J.D.

Hamilton Dickson requires considerable modification.

Pearson and Heron [90] point out that the word correlation in the

statistical, as distinguished from the biological, sense was first used

by Galton [39] , giving in his title "co-relations"; Galton's definition

of correlation involved neither the notion of the product moment co­

efficient nor that of the linearity of regression; nor did the definition

of Pearson [85] contain them. Pearson and Heron further point out that

Galton had found by experience that regression in biological examples

was often linear.

4. Some Problems of Multivariate Independence

Very little was known of joint distributions in more than two

variables at the time of the articles of Yule [113,114], Yule cleared up

60

the doubt as to whether pairwise independence of several variables implied

complete independence of the variables. In particular he showed that

(4.1.) P(X € A , Y i B , Z € C ) = P(X € A) P(Y € B) P(Z € C)

for fixed sets A , B and C , did not imply the complete independence

of the three variables, X , Y and Z . However, it is sufficient, even

in the most general case, if A , B and C are allowed to vary over all

the measurable sets.

Yule [114] considers the joint distribution of the indicator variables

of sets A, B, ... within a given finite universe to which the classi­

fications into A or A, B or B , are relevant. He gives four

propositions:

I. Complete independence can only be said to subsist for a series of

attributes ABCD ... within a given universe, when every pair of such

attributes exhibits independence not only within the universe at large but

also in every sub-universe specified by one or more of the remaining

attributes of the series, or their contraries.

II. Complete independence exists for a series of n attributes if the

criterion of independence holds for all the positive-class frequencies up

to that of the n*" order.

III. Independence of two variables in the universe at large does not imply

independence within sub-universes. Similarly independence within every

sub-universe does not imply independence at large.

IV. With A, B , ..., D fixed sets, and X , Y , ..., Z taking two

values, namely in A , B , ..., D and their complements A , B , ... , D

respectively

(4.2.), P(X € i 4 , Y € B , . . . , Z € Z ? ) = P(X t A) P(Y t B) . . .P(Z t D)

does not in general give any information as to the independence or other­

61

wise of the random variables if their number is greater than 2.

Before Yule [114] , there was doubt as to whether pairwise in­

dependence of three random variables implied complete independence even

in the 2 x 2 x 2 case. Bohlmann [11] defined a joint distribution by

<4-3-) i . J . K - 1,2.

in which independence held for every pair of random variables but not in

a triplet. Although he stated it in another manner, the example of Bern­

stein [9] is (4.3.) with yg- replaced by i . Bernstein's example can

also be represented by the first three Walsh functions on the unit inter­

val, or rather on [-1,1). Lack of correlation is evident but

^3 = ^2 S° t iere no independence.

Yule [114] also gave the theorem:

If for Z € C , X is independent of Y and for Z E C , X is in­

dependent of Y , then X is not independent of Y unless either X is

independent of Z or Y is independent of Z .

This theorem can be generalized to E x S x T tables. Darroch [19]

defined as perfect a distribution for which

(4-4-> frst * fre. ?r.t f.et ' <4..

Yule's theorem can be rewritten: the condition

?r.t ' fr.t f.st 1 ' * ‘ 1’2(4.5.)

f re. - f r.. f.s. , r , s * l , 2 ,

implies that the distribution is perfect.

62

Necessary and sufficient conditions for a distribution on R x 5 x T

points to be perfect have been given by Lancaster [72]; in Yule's

special case of 2 x 2 x 2 distributions, this implies that the products

of any two marginal correlation coefficients are zeroes, and so that

either oorr(X,Z) or eorr(Y,Z) is zero or both are.

A related theorem has been proved by Blanc-Lapierre and Tortrat [10]

for Markov chains.

5. Mathematical Models

The construction of mathematical models for the mathematical analysis

of observations requires that an abstract model of the real world should

be set up retaining as many features of it as is necessary for the purpose

in hand. However, once this has been done deductions must be made with­

out using additional assumptions from the real world, although of course

such additional assumptions can be incorporated to form a new model.

In any case, it must be shown that the features abstracted and the

assumptions made are mutually consistent. Some instances of where these

principles have been violated can be given. In an instance cited in Part

I, J.F.W. Herschel had in effect assumed that in the model errors in the

directions of any two mutually orthogonal axes were mutually independent

and from this Herschel deduced that the distribution of errors was

jointly normal. R.L. Ellis was scornful that the existence of the normal

law could be so deduced and quoted the tag, nihil ex nihilo. However,

the deduction of Herschel merely shows that his model cannot be a true

picture of reality. Nevertheless, a similar idea enabled Maxwell to

develop the kinetic theory of gases.

At a later stage, Fisher [27] stated that the t-test was exact;

E.S. Pearson [84] in his review of the book asked whether this statement

was true, in the real world or in a model. The ensuing correspondence

between Fisher and Student shows the desirability of a theory of models.

A third example of such desirability is the difficulty evidently ex-

63

perienced by H.W. Watson noted in § 2 above, who was required to

construct a bivariate distribution which was not a simple product

distribution. However, others were concerned; for example, Huntington

[57] and Tschuprow [102] complained that much of the statistical theory

had been published in the language of special applications and that

fundamental notions had not been clarified.

We have translated the following remarks of Bernstein [7] :

"In this work I wish to indicate a new line of attack on the general

problem of dependence between physical quantities, not functionally

related, and to classify rationally the laws of correlation between these

quantities according to the simplicity of the influence which one of them

exercises on the distribution curves of the others. It appears to me that

a theoretical scheme in which this influence is very complicated would

present little of interest. It is consequently natural to study first

the case (reducing, to fix the ideas, the number of quantities considered

to two), where the knowledge of one of the variables has the sole effect

of displacing the curve of distribution of the other without modifying

its dimensions; then consequently the case where the displacement is

accompanied by a dilation or longitudinal contraction (compensated by a

transverse deformation corresponding to the fact that the total area en­

closed by the curve should remain invariant and equal to 1). We find as

a specially important limiting case the so-called normal correlation and

I think that the generalisations, which we obtain by this route, could

be used when this (hypothesis) becomes inapplicable."

Bernstein [8] also included the following remarks on dependence in

his address on the current state and future of mathematical statistics:

"Nous voyons ainsi que dans des cas trfes 6tendus, l1iteration d'

experiences mutuellement li£es, repr6sent£es par des schemas stochastiques

extrdmement varies, qui peuvent §tre imparfaits, pourvu qu'elles de-

viennent presqu'indgpendantes, a l'ext^rieur d'un domaine d'activity

64

suffisamment petit, conduit a des schemas stochastiques parfaits bien

determines dans lesquels la correlation normale et la loi de Gauss, en

particulier, apparaissent comme des lois de la nature d'un caractfere

aussi universel que le principe de l'inertie en mechanique.

"C'est done un m^rite incontestable de Galton et de M. Pearson d 1

avoir fond£ et d6velopp£ 1'etude de la correlation normale et d'avoir

prevu leur importance pour les applications; mais ce n'est que grace

des propositions analogues ji celle qui vient d'etre enoncee que cette

prevision re^oit une justification math£matique satisfaisante et que

nous pouvons nous rendre compte, par exemple, de la raison profonde de

1 'existence approximative de la correlation normale entre les grandeurs

de divers organes chez les §tres vivants, ainsi que de 1 'applicability

approchee aux propriety qui dependent de plusieurs gfines de la loi de

regression her£ditaire de Galton. D'ailleurs, cette loi d'hereditfe de

Galton qui suppose que le coefficient de correlation entre les generations

successives diminue en progression geomfetrique, fournit le premier

exemple de chaines stochastiques dont 1'etude mathematique a ete fonde

par Markoff sur une base toute differente.

"En revenant au mouvement brownien nous pourrions deduire de la

proposition citee appliquee & une seule grandeur une variete infinie de

schemas microscopiques qui conduirait & la mSme formule de diffusion.

Les experiences macroscopiques ordinaires ne permettraient pas de faire

le choix entre ces differentes interpretations, mais, s'il etait possible

d'effectuer une sorte de filtration des deplacements microscopiques,

correspondant & des durees de temps assez courtes pour que la loi de

diffusion soit encore inapplicable, on serait amene probablement &

remplacer l'hypothfese de 1'independence complete par une autre plus

conforme a la realite."

65

6. Axiomatization of the Theory of Probability

Until the end of the 19th century, joint normal distributions

appeared in the theory of errors; few or no joint distributions appeared

in the theory of gambling; yet these two fields provided almost the only

realizations of the concepts and methods of probability theory. When

there was a need to study joint distributions because of problems arising

in natural science, difficulties in formulating the joint distributions

appeared as we have seen in the examples given already. There became

evident the need for abstraction and axiomatization; in particular, it

was necessary to separate those parts of the inference and mathematical

technique which could or should be applied in many different fields.

Some partial attempts at axiomatization were made by Broggi [12] and

Bohlmann [ll] . However, advances became possible after there had been

identifications of distribution functions with monotone increasing

functions, random variables with measurable functions, events with sets

and numerical probabilities with measures. The first thorough treatment

was given by S.N. Bernstein [5,9] ; his system of axioms was founded on

the qualitative identification of chance events with their outer and

inner measures. The ensemble of all events appeared as a Boolean algebra.

Later, more detailed discussions were given by B.O. Koopman and V.I.

Glivenko. However, the most influential and successful of the axiom-

atizations has been that of Kolmogorov [65] . Gnedenko [46] mentions

that Kolmogorov's set-theoretic and measure-theoretic approach is

equivalent to Bernstein's earlier approach in terms of complete normed

Boolean algebras, which was proved by V.I. Glivenko in 1939.

Their axiomatizations made it possible

(i) to define probability distributions in a finite or denumerable set

of dimensions (or random variables),

(ii) to define stochastic processes and thus to define certain new

distributions mentioned in § 11 ,

66

(iii) to validate limiting methods e.g. as in the central limit theorem,(iv) to validate methods such as (i) to (iii) in i 8 ,(v) to avoid misunderstandings such as those between Herschel and Ellis or Pearson and Fisher as mentioned in § 5 ,(vi) to generalize methods so that they could have a diversity of applications,(vii) to yield rigorous definitions of independence,(viii) to enable the introduction of powerful mathematical methods as in § 7.

7. Technical Advances in the Mathematics of Dependence

(i) The definition of the Radon-Nikodym derivative as in Nikodym [82] has had very general applications in the theory of dependence and in the computation of conditional probabilities and expectations. In a 2- dimensional distribution subject to existence, define the Radon-Nikodym derivative of F-measure with respect to the product G x H measure by

dF(x,y) dF(7.1.) S2 [x,y) = ----= — — »

dG(x)dH(y) dGdH

which assumes various special forms.

<f>2 , the functional of Pearson [88], can be generalized

(7.2.) <fi2 + 1 = I Q.2 [x,y)dGdH .

(ii) Kernel theory. From the general theory of kernels if a joint2 2 distribution, F , is <p -bounded, that is if £ in (7.2.) is finite,

there is a biorthogonal expansion of the Radon-Nikodym derivative, con­vergent in the mean square sense,

(7.3.) S2 {x,y) = 1 + I Pn x (nV n), |p | < 1, n = 1,2,...1

67

where = y ^ = 1 and { } and {y ^ } are subsets of ortho­

normal sets complete on the respective marginal distributions. (7.3.) can be termed the canonical or biorthonormal expansion; pn are the canonical correlations of R.A. Fisher [28] in some order.

Mehler [8l] in solving the differential equation of heat conduction,

(7.4.) 2t Tt - 2x ^ . — ■ 0 ,6X

derived an expansion of the form (7.3.) with x ^ and y ^ the standardized Hermite polynomials of degree n in x and y respectively. This expansion can be interpreted as the standard bivariate normal density expanded as a product of the two marginal densities and a series.

Mehler's expansion was the earliest example of what became known as a kernel expansion, used in the theory of integral equations. Expansions of other kernels in such a polynomial biorthogonal series are known (Watson, [103,104,105]; Askey, [3] ).

Mehler1s work seems to have been unknown to Pearson [86] and L. Bramley-Moore, who derived the identity anew as follows. The standard joint normal density is proportional to

(1 + \un pn / n!) ,

a Taylor expansion in p . After taking logarithmic differentials andthen setting p = 0 , there follows

(7.6.) (1-p2)2 — = {xy + p (1 -x -y ) + q1 xy - p 3}U ,dfj

(7.5.) U = exp [ , {x y2)

68

2 2(7.7.) un+1 = xyun + n(2n-l-x -y )un l + xyn{n-\)un_2

-n(n-l) (w-2)2 mm _3

Pearson [86] obtained the first few values of u and then stated thatnun is the product of two polynomials, respectively of x and y , each of precise degree n ; this is shown to be true by induction, the re­currence relation was found and the polynomial is differentiated to obtainthat of lower order. With hindsight, the biorthonormal expansion can be

2 2obtained by the examination of E(exp(tX-ht + uY -hu )), the product of the generating functions of the Hermite polynomials.

Moreover, since ^|p|^<°° if | p | < 1 , the series converges absolutelya.e. and so term by term integration is justified. Pearson [86] equated the integral over the quadrant x < h , y < k to the observed frequency, presumed to have been derived from a bivariate normal distribu­tion and solved the resulting equation to give tetrachoric r , an estimator of p . In passing, Pearson [86] recalled an identity preferably attributed to Stieltjes [100], namely

(7.8.) it F(0,0) = I p2k+1 / (2fc+l)! = sin'1 p ,0

obtained by setting h = k = 0 .

There had been little interest among statisticians in such an approach until the development of the theories of stochastic processes, as by Bern­stein [8] and McKendrick [78] , and of the maximum correlation by Hirsch- feld [53] and Maung [80]. Gebelein [41,42,43] used the expansions inmore general distributions to obtain interpretations of the correlation

2ratios and <j> and to generalize the Lexis theory; he explicitly called on the general integral equation theory.

In the kernel theory, the integrations are often carried out with respect to ordinary Lebesgue measure on a finite interval and in the

69

statistical theory, this corresponds to rectangularly distributed marginal variables. Many concepts, such as operators, projections and integral operators, familiar in the integral equation theory, can be used in a probabilistic context, as has been admirably explained by Csfiki and Fischer [15,16,17].

(iii) General orthonormal expansions on product spaces.Eyraud [22,24] seems to be the first to give a general expansion of bi­variate and trivariate densities in terms of their marginal rectangular distributions and an orthogonal series. This orthogonal series contains the constant term unity but no other term which is not a product of two non-constant orthonormal functions, one from each marginal set; for other­wise, it can be seen that by integrating out the y functions an ortho­gonal function in x , say, remaining would change the form of the marginal distribution of X .

The general expansion is of the form,

(7.9.) 2 (x,y) = 1 + 11 p x W y W 1 1 rs

with Ip2rs = 4>2 and convergence is in the mean square sense.

With rectangular marginal distributions on the unit interval,S2 (x,y) is the joint probability density.

The history of these general expansions in product spaces in probabi­lity theory is rather more difficult to define. The use of Gram-Charlier expansions appears in Wicksell [109] for bivariate distributions not too "distant" from the joint bivariate normal distributions and Bula [13] used orthonormal polynomials. To fit a 2-dimensional distribution,Pearson [89] used the bivariate joint normal density multiplied by a series in products of terms from the two Hermite series up to a total degree of 4 to give his 15-constant surface. However, such series were

70

little used.

(iv) Canonical form of a rectangular matrix. The problem of canonical correlations in bivariate distributions was introduced into statistics by H.O. Hirschfeld [53] and by R.A. Fisher as in Maung [80] and it may seem unnecessary to found the theory on that of kernels. A suitable approach is to prove the following:

Theorem 7.1. For every m x n real rectangular matrix A } there exists a pair of orthogonal matrices} M and jV , such that

(7.10.) C = M TAN

has a canonical form with c ■ = c .. and c, > e. > ... c > 0 ,J 1 2 r

r 5 m £ n , and c . ■ - 0 otherwise; indeed, M is such that

T T . . .M AA M vs dzagonal canomcal form and N is the completion of the Tfirst r columns of A M .

This theorem has geometrical implications for if , ..., x^ , y1 , ..., yn ) is a general point in (m+n)-dimensional

TEuclidean space, x Ai = c defines a quadric hypersurface, more readilyr Tstudied in the canonical form, c = £ cix\y\ = Q C where

x* = M^x and H* ~ ^ was proved by Beltrami [4] , Jordan

[62] and Sylvester [lOl] .

The theorem is mentioned in MacDuffee [77] but it seems to have been generally forgotten as the article of Lanczos [76] and the sub­sequent article of Schwerdtfeger [99] show.

Once this theorem has been proved, the more general theorem yielding (7.3.) can be proved by passing to the limit as suggested in Lancaster

71

[69, 74].

(v) The characteristic function can be regarded as a special form of the Fourier transform characterized by taking the integral with respect to a monotone increasing function of total variation unity, and so with respect to a distribution function,

where z is purely imaginary. However, the older French writers considered it to be a special form of the Laplace transform. Character­istic function methods were used by A.M. Ljapunov in his proof of the central limit theorem at the turn of the century but it was long before they were used commonly in mathematical statistics in other contexts. Perhaps, we can regard papers of G. Pdlya and P. L6vy in 1922 as the beginning of their more general use. Even now, there is no thorough textbook treatment of the characteristic function in several dimensions.

The multivariate generalization of (7.10') is

Two important theorems are the following.

A multivariate distribution determines and is determined by the totality of 1-dimensional characteristic functions namely E{exp z(a X + .... + anxn)) f°r all choices of the real quantities

a , ___ a . (Cramer and Wold, [14]; HaCaturov, [51]; Hartman, van1 n

Kampen and Wintner, [52]; KosteljaneC and ReSetnjak, [67]; R6nyi, [94] ).

The homogeneous polynomial biorthogonal property of degree, n ,

i.e. (7.3.) with {x } and {y^^} polynomials is equivalent to

(7.11.) dF(x) .

72

the condition that

u=0 i*-0

nanr

ar *(0,y)

and a similar condition with the roles of u and v interchanged.

This theorem is due to Rao [93] . A special case with n = 1 has been much used; namely, the random variable X is linearly regressed on Y if and only if (7.12.) holds with n = 1 . Applications have been made by Fix, [29]; Kenney, [64]; Rao, [93]; Rothschild and Mourier, [95]; Wicksell, [110] and many others.

8. Generation of Multivariate Distributions

Karl Pearson [85] began a search for new multivariate or joint distributions; some general accounts of his methods are given in Pearson and Heron [90] and Pretorius [92] . Methods of forming joint distribu­tions are now given in a list which cannot be either exhaustive or mutually exclusive.

(i) Empirical multivariate distributions can be generated by chance processes e.g. tossing of coins, dice or by observations. If theobservations are a at the point where X = x , Y = y, \a = N andxy xy{a / N } is a joint distribution.xy

(ii) Artificial or constructed distributions. Any finite non-negative measure in the plane becomes a probability after division by a suitable constant; e.g. suppose g{x,y) 2 0 for all x, y, then subject to existence of the integral, f[x,y) = g{x,y) / / g(x,y) dxdy , is a joint density function.

(iii) Algebraic structures can be used e.g. suppose {[x.yyZ)} is the

73

description of a latin square, then if f{x,y,z) is set equal to n when (x,y,z) occurs in the description and equal to zero otherwise, if{x,y,z)} defines a joint distribution; when the side of the latin square, n , is 2 , Bernstein's example is obtained. Other constructions are possible (Jamnik, [58]; Lancaster, [70]; Joffe, [59,60] ) .

(iv) Pearson distributions and methods. We have already mentioned the results of Pearson, Narumi and Bernstein in Part I. See also § 11 for some results by related methods.

(v) Charlier approximations. Wicksell [109] and Pearson [89] fitted a normal density multiplied by an orthonormal series to obtain an approxim­ation to empirical distributions.

(vi) Joint distributions as products. These have been freely used,e.g. g[x) h(y\x) , that is, a marginal density multiplied by a conditional density; a special case occurs when g{.) is the standard normal density and h{.\x) is the normal-(p:r, 1-p ) distribution (Dickson, [20].

(vii) Transformations of individual variables.if 2 * 2 An example of this is X = hX , Y = h Y , when X and Y are jointly

•k *normal with correlation, p ; after transformation X and Y are2T-variables with correlation, p

(viii) Randomized partitions. Discrete distributions can be converted into absolutely continuous distributions as in the familiar Neyman-Pearson method in univariate distributions or its extension into two dimensionsby Hoeffding [55] .

(ix) Uniformization of the marginal variables. Any absolutely continuous distribution can be transformed into the uniform distribution; other distributions can be made absolutely continuous by a randomized partition.4>2 is invariant under randomized partitions.

-2

74

2If <p is finite, the joint density may be expanded in a biorthonormal Legendre series (Hoeffding, [54] ) or in a biorthonormal trigonometric series (Abazaliev, [l] ) . The coefficients in the expansions are some­times called quasimoments.

(x) Selection and reweighting of the marginal distributions.For a given joint distribution function, F , the conditional distribu­tion for given values of X can be determined. These conditional distributions may then be applied to a new marginal distribution of X to obtain for example g*(x) h{y\x) in place of g{x) h{y\x) , a process of weighting used by Pearson [87] to obtain the joint distribution of q variables in joint normal correlation with p other variables, to which selection was applied.

(xi) Truncation and censoring of the marginal distribution yield new distributions.

(xii) General transformations on the marginal variables generate multi­variate distributions. A special case is given by X and Y independent and X* = aX + bY , Y* = cX + dY .

(xiii) Mixtures of product distributions may lead to new distributions. Every joint distribution is a mixture trivially, namely a mixture of 1-point joint distributions. If the mixing distribution has m points of increase, the resulting mixture of m distributions can have at most m-1 canonical correlations.

(xiv) Convolutions of joint distributions can be regarded as special cases of mixtures.

(xv) Distributions can be symmetrized with respect to the random variables by mixing, i.e. F{x,y) = h \_F{x,y) + F{y,x)~\ , Sarmanov [96],

75

They can also be symmetrized by mixing the product of the distributions of X and Z conditional on Y with respect to the marginal distribu­tion of Y , where (X , 7) has the same distribution as (X, Z)

9. Multivariate Normal Distribution

Suppose X and Y have as elements the random variables of a joint normal distribution. Hotelling [56] showed there was a linear trans­formation X -*• U , and Y -*■ V such that aorr(U-V.) = 6. .p. . On the~ ~ ~ ~ t j ij iother hand, we have seen above that if X and Y are jointly normal,EH-(_X)H-(Y) = S.-p1 , where Hv is the standardized Hermite polynomiali J i. j Kof degree k . Obuhov [83] showed that these two results can be combined or, as Kolmogorov [66] would say, that in a "normal correlation", the canonical variables in Hotelling's problem are linear. The two approaches are united in the solution of Obuhov [83] who shows that because the products of Hermite polynomials are complete on the subspaces of U andV under the Hotelling linear transformations, the only non-zero correla­tions are the products of powers of the form

Pl k\ ..., pj*, klt k2, ..., kr = 0,1,2,... ,r

and the canonical variables are of the form, 6 6 ... 6 ^ in the1 2 x1subsets of variables {^-} and {I.} .t J

The multivariate normal distribution appears in a variety of models. The various characterizations, e.g. as in Kagan, Linnik and Rao [63], show that generalizations to obtain joint distributions in other variables can only be successful if not too many of the properties of the joint normal distribution are retained.

The multivariate central limit theorem was proved by Bernstein [6]. Fr6chet [32] defined a normal correlation by the property that every linear form in the marginal variables was normal, a useful device which covers the case when the correlation matrix is not of full rank.

76

Since Galton's time, the joint multivariate normal distributions have been used as mathematical models in many branches of the natural sciences especially biology, anthropology and psychology. Pearson [87], for example, uses the theory to show how selection of a population accord­ing to one subset of variables affects the correlations between the members of the whole set; this line has been followed by A.C. Aitken,D.N. Lawley and others.

Fisher [26] used the general multivariate theory to determine the correlation between relatives. Fisher [25] obtained the sampling distribution of the coefficient of correlation. The extension to joint distributions of several coefficients of correlation has led to the study, somewhat misleadingly termed "multivariate analysis", which we have decided to exclude from this discussion because of its special nature and difficulties. On the other hand, the theory of the analysis of variance has not led to new problems of multivariate distributions since the assumptions of normality enable linear transformations to transform the normal variables into independent normal variables and so simple product distributions result.

Generalizing from the normal correlation, many authors have felt that there is a unique type of joint distribution between two marginal variables of given types. Many of the articles of M. Frdchet were designed to show that this was not so. Consequently, it is now customary to term the joint distributions, with given marginal distributions, Fr€chet classes. Fr£chet [33] wrote: "Certains ont cru 'sauver1 le coefficient de correlation de X et Y en modifiant X et Y de sorte que les marges soient 'normales1. Ils pensaient, en effet, qu'alors la loi du couple {X, 7) deviendrait n^cessairement la loi de Laplace-Bravais dite 'binormale1, cas oQ l'emploi du coefficient de correlation devient legitime. Nous avons d6j& d6montr6 de deux fa^ons diff£rentes, que c' est la une erreur."

77

10. Random Elements in Common

The random elements in common model has been used in the generation of the joint normal distribution. If the condition of linear regression is imposed, all the characteristic functions of summands must be powers perhaps fractional of a common characteristic function and this is a sufficient condition also for the existence of linear regressions of the sums. So some second condition is necessary to obtain a class of distributions; if the condition that the joint distribution should have the homogeneous polynomial canonical property (7.3.) is required, then the Meixner collection of random variables is characterized as was shown by Eagleson and Lancaster [21]. The history of this problem has been indicated in Lancaster [74]. The Meixner collection contains the bi­nomial, negative binomial, Poisson, normal, gamma and hyperbolic distribu­tions. The associated theory simplifies the theory of almost all the joint distributions in these commonly occuring distributions.

11. Stochastic Processes and Differential Equations

McKendrick [78] obtained a joint Poisson distribution by the solution of differential equations; but this was a "lone" effort and it remained for Bernstein [8] to point the way to the future, as we have noted in § 5. It is thus no accident that Sarmanov [97] showed that the normal and gamma densities exhaust the class of bivariate densities, on an infinite support, determining a continuous stationary process with the polynomial biorthogonal property and with p > p2 > ...; whereas on a finite support, the marginal densities belong to the Jacobi system with its Gegenbauer and Legendre specializations.

Further results have been obtained by Wong [ill] and Wong and Thomas[112]. It is notable that joint distributions in the Pearson system of variables have thus been obtained by the solution of differential equations of the diffusion type.

78

12. Various Models

There are, of course, many other joint distributions, other approaches and many other authors besides those already mentioned.For them the reader may be referred to the texts of Johnson and Kotz [6l], Mardia [79] and Plackett [9l] and to the bibliography of Anderson, Das Gupta and Styan [2].

13. Measures of Dependence

A measure of dependence is designed to indicate, in some definedway, how closely X and Y or {^.} and {I.} are related with extremes

3at mutual independence and complete mutual dependence. If the measure can be expressed as a scalar, it is convenient to refer to it as an index. The conditions for an index to be useful have been stated by Pearson [88], Pearson and Heron [90], Gini [44,45], Fr6chet [30,31,34]. Extensive bibliographies have been given by Goodman and Kruskal [47,48,49,50].The sampling errors are often determined only with great difficulty,Goodman and Kruskal [50] and Kruskal [68]. Information-theoretical indices have been given by Csisz&r [18].

However, Lancaster [75] concludes that a general index of dependence, whereby joint distributions can be arranged in order of the degree of dependence, does not exist. For some defined classes of distributions, such as the joint normal and the random elements in common models, the product moment correlation is the index of choice. In other classes, there may be indices useful for some purposes. There is inevitably loss of information in passing from, say, the matrix of correlations to a single index.

14. Summary

In part I, the earlier development of the notion of statistical dependence has been detailed; the story is continued here. The work of

79

Francis Galton on heredity created new problems for statistical theory; formerly, it had been the custom to avoid difficulties introduced by dependence in the theory of errors, now Galton was studying these very deviations from independence. In the 1880's, there were few known distributions to serve as marginal distributions and very little was known about joint distributions except when they were constructed by taking a product. Karl Pearson began a search for new univariate distribu­tions and new non-normal joint distributions.

In his time, there was a growing awareness of the need to set up a mathematical model in which deductions could be made without further reference to the physical world. Examples of the confusion resulting from a lack of such conventions have been given. There was a need to study distributions apart from the applied context as E.V. Huntington and A.A. Tschuprow pointed out. Somewhat later, S.N. Bernstein gave a remark­able address dealing with the future of the study of dependence.

The use of limiting processes and of multidimensional distributions led to the well-known axiomatizations of probability by S.N. Bernstein and A.N. Kolmogorov. More powerful methods began to be used e.g. Radon- Nikodym derivative, kernel theory, biorthonormal expansions, character­istic function. With the foundation secure and with the new methods, many new joint distributions have been constructed with the aid of a variety of methods listed in §§8 and 9.

Some progress was made in joint distributions of vector valued variables.. H. Hotelling introduced his canonical distribution and A.M. Obuhov, apparently independently, gave a discussion, in which the relations between square summable functions in the two sets of variables could be examined. Many characterization theorems have shown that the multivariate normal distribution has many special properties.

Some models of joint distributions deserve special attention. Sums of independent random variables with some held in common have been long

80

studied. If linear regression is required, each characteristic function must be a power of one of them. If it is then required that the joint distribution of the sums have the homogeneous polynomial canonical property, the summands must all belong to the same Meixner class.

Only passing reference is made to the canonical forms of joint distribution as they have been treated fully elsewhere. We conclude with a brief note on measures of dependence.

REFERENCES

1. A.K.J. Abazaliev, Characteristic coefficients of two-dimensional distributions and their applications (in Russian). Dokl. Akad.Nauk SSSR 178 (1968), 263-266. Sov. Math. 9, 52-56.

2. T.W. Anderson, S. Das Gupta and G.P.H. Styan, A bibliography of multivariate statistical analysis. Oliver and Boyd, Edinburgh (1972).

3. R. Askey, Orthogonal polynomials and positivity. In Wave Propagation and Special Functions (D. Ludwig and F.W.J. Olver, eds.), Studies

in Applied Mathematics (SIAM), Vol. 6 (1970), 64-85. SIAM, Phila­delphia, Pennsylvania.

4. E. Beltrami, Sulle funzione bilineari. Giorn. Mat. Battaglini, 11 (1873), 98-106.

5. S.N. Bernstein, An essay on the axiomatic foundation of the theory of probability (in Russian). Harkov. Zapiski Matem. ObSC., (2) 15 (1917), 209-274.

6. S.N. Bernstein, Sur I'extension du theor&me limite du calcul des probabilitSs aux sommes de quantitys dSpendantes. Math. Ann. 97 (1926), 1-59.

81

7. S.N. Bernstein, Fondements gSomStriqu.es de la theorie des oorrSlations. Metron 7(2) (1927), 1-27.

8. S.N. Bernstein, Sur les liaisons entre les grandeurs aleatoires. Verh. Math. Kongr. Zurich 1 (1932), 288-309.

9. S.N. Bernstein, Theory of Probability (in Russian). 2nd ed. Gozizdat, Moscow (1934).

10. A. Blanc-Lapierre and A. Tortrat, Sur un probl&me d 'indSpendance. C.R. Acad. Sci. Paris A272 (1971), 328-329.

11. G. Bohlmann, Die Grundbegriffe der Wakrscheinlichkeitsrechnung

in ikrer Anwendung auf die Lebensversicherung. Proc. Congr. Inter- nat. Math., Rome, 3 (1909), 244-279.

12. U. Broggi, Die Axiome der Wakrscheinlichkeitsrechnung. Gottingen (1907).

13. C.A. Bula, Calculo de superficies de frecuencias. Rev. Union Mat. Argentina, 6 (1940), 1-107.

14. H. Cramer and H. Wold, Some theorems on distribution functions.J. Lond. Math. Soc. 11 (1936), 290-294.

15. P. Cs£ki and J. Fischer, On bivariate stochastic connection.

Magyar Tud. Akad. Mat. Kutato Int. KOzl. 5 (1960a), 311-323.

16. P. Cs^ki and J. Fischer, Contributions to the problem of maximal correlation. Magyar Tud. Akad. Mat. Kutato Int. KOzl. 5 (1960b), 325-337.

17. P. Cs3ki and J. Fischer, On the general notion of maximal correlation. Publ. Math. Inst. Hungar. Acad. Sci. 8 (1963),27-51.

82

18. I. Csisz^r, Information-type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar.2 (1967), 299-318.

19. J.N. Darroch, Interactions in multi-factor contingency tables.

J. Roy. Statist. Soc. Ser. B 24 (1962), 251-263.

20. J.D. Hamilton-Dickson, Appendix. Proc. Roy. Soc. London 40 (1886), 63-66. See Galton [38].

21. G.K. Eagleson and H.O. Lancaster, The regression system of sums with random elements in common. Austral. J. Statist. 9 (1967), 119-125.

22. H. Eyraud, Sur une representation nouvelle des correlations continues. C.R. Acad. Sci. Paris, 199 (1934), 1356-1358.

23. H. Eyraud, Les principes de la mesure des correlations.

Ann. Univ. Lyon. Sect. A 1 (1936), 30-47.

24. H. Eyraud, Les lois d'erreurs dans deux dimensions. Ann. Univ. Lyon. Sect. A 2 (1939), 19-23.

25. R.A. Fisher, Frequency distribution of the values of the correla­

tion coefficient in samples from an indefinitely large population. Biometrika 10 (1915), 507-521.

26. R.A. Fisher, The correlation between relatives on the supposition of Mendelian inheritance. Trans. Roy. Soc. Edinburgh, 52 (1918), 399-433.

27. R.A. Fisher, Letters June 19, 1929 et seq. in Letters from W.S. Gosset to R.A. Fisher 1915-1936 (foreword by L. McMullen).Private Circulation (1970).

83

2 8 . R.A. Fisher, In Maung, K. [80].

29. E. Fix, Distributions which lead to linear regressions. Proc.1st Berkeley Sympos. 1 (1949), 79-91.

30. M. Fr^chet, A general method of constructing correlation indices. Proc. Math. Phys. Soc. Egypt, 3(2) (1946), 13-20.

31. M. Frgchet, Additional note on a general method of constructing correlation indices. Proc. Math. Phys. Soc. Egypt, 3 (1948), 73-74.

32. M. FrSchet, Generalisations de la loi de probability de Laplace.

Ann. Inst. Henri Poincar£, 12 (1951), 1-29.

33. M. Fr^chet, Sur les tableaux de correlation dont les marges sont donnSes. C.R. Acad. Sci. Paris, 242 (1956), 2426-8.

34. M. Frgchet, A note on simple correlation. Transl. from the French by C. de la Menardifere. Math. Mag. 32 (1958/59), 265-268.

35. F. Galton, Typical laws of heredity. Proc. Roy. Inst it. Gt. Britain8 (1877), 282-301.

36. F. Galton, Regression towards mediocrity in hereditary stature.

J. Anthrop. Instit. 15 (1886a), 246-264.

37. F. Galton, Presidential address to the Section of Anthropology.

Rep. Brit. Assoc. Adv. Sci. (1886b), 1206-1214.

38. F. Galton, Family likeness in stature. Proc. Roy. Soc. London,40 (1886c), 42-73.

39. F. Galton, Co-relations and their measurement, chiefly from anthro­

pometric data. Proc. Roy. Soc. London, 45 (1888/9), 135-145.

40. F. Galton, Memories of my life. Methuen Co., London, (1908).

84

41. H. Gebelein, Das statistische Problem der Korrelation als Variations- und Eigenwertproblem und sein Zusammenhang mit der Ausgleichsrechnung. Z. Angew. Math. Mech. 21 (1941), 364-379.

42. H. Gebelein, Bemerkung uber ein von W. Hoeffding vorgeschlagenes masstabinvariantes Korrelationsmass. Z. Angew. Math. Mech. 22 (1942a), 171-173.

43. H. Gebelein, Verfahren zur Beurteilung einer sehr geringen Korrelation zwischen zwei statistischen Merhvalsreihen. Z. Angew. Math. Mech. 22 (1942b), 286-298 and 553-592.

44. C. Gini, Di una misura della dissomiglianza tra due gruppi di quantity e delle sue applioazioni alio studio delle relazioni statistiche. Atti R. Ist.Veneto Sci. Lett. Arti, (8) 74 (2), (1914a), 185-213.

45. C. Gini, Indioi di omofilia e di rassomiglianza e loro relazioni ool coefficiente di correlazione e con gli indici di attrazione.

Att R. 1st. Veneto Sci. Lett. Arti, (8) 74 (2) (1914b), 583-610.

46. B.V. Gnedenko, On Hilbert’s sixth problem, pp. 116-120 in Hilbert's Problems (in Russian), Izdat "Nauka", Moscow (1969).

47. L.A. Goodman and W.H. Kruskal, Measures of association for cross classifications. J. Amer. Statist. Assoc. 49 (1954), 732-764, Corrig. 52, 578.

48. L.A. Goodman and W.H. Kruskal, Measures of association for cross classifications, ii. Further discussion and references. J. Amer. Statist. Assoc. 54 (1959), 123-163.

49. L.A. Goodman and W.H. Kruskal, Measures of association for cross classifications, iii. Approximate sampling theory. J. Amer. Statist. Assoc. 58 (1963), 310-364.

85

50. L.A. Goodman and W.H. Kruskal, Measures of association for cross classifications, iv. Simplification of asymptotic variances.

J. Amer. Statist. Assoc. 67 (1972), 415-421.

51. A.A. HaCaturov, Determination of the value of the measure for a region of n-dimensional Euclidean space from its values for all half-spaces (in Russian). Usp. Mat. Nauk Ser. 9, 61 (3), (1954), 205-212.

52. P. Hartman, E.R. van Kampen and A. Wintner, Asymptotic distribu­

tions and statistical independence. Amer. J. Math. 61 (1939),477-486.

53. H.O. Hirschfeld, A connection between correlation and contingency. Proc. Cambridge Philos. Soc. 31 (1935), 520-524.

54. W. Hoeffding, Masstabinvariante Korrelationstheorie. Schr. Inst. Angew. Math. Univ. Berlin, 5(3), (1940), 181-233.

55. W. Hoeffding, Masstabinvariante Korrelationsmasse fur dis- kontinuierliche Verteilungen. Arch. Math. Wirtsch. Sozialforsch.7 (1941), 49-70.

56. H. Hotelling, Relations between two sets of variables. Biometrika 28 (1936), 321-377.

57. E.V. Huntington, Mathematics and statistics, with an elementary account of the correlation coefficient and the correlation ratio. Amer. Math. Monthly 26 (1919), 421-435.

58. R. Jamnik, 'Uber vollst'dndige orthonormierte Systeme von paar- weise unabhangigen zufalligen Grossen. Publ. Dept. Math. (Ljubljana)1 (1964), 23-41.

59. A. Joffe, On a sequence of almost deterministic pairwise independent random variables. Proc. Amer. Math. Soc., 29 (1971), 381-2.

86

60. A. Joffe, On a set of almost deterministic k-independent random variables. Ann. Probab., 2 (1974), 161-2.

61. N.L. Johnson and S. Kotz, Distributions in statistics. Continuous multivariate distributions. Wiley, New York (1972).

62. C. Jordan, M&moire sur les formes bilinSaires. J. Math, pures appl. (2) 19 (1874), 35-54.

63. A.M. Kagan, Ju.V. Linnik and C.R. Rao, Characterization problems in mathematical statistics. Transl. by B. Ramachandran. Wiley,New York (1973).

64. J.F. Kenney, The regression systems of two sums having random elements in common. Ann. Math. Statist. 10 (1939), 70-73.

65. A.N. Kolmogorov, Grundbegriffe der Wahrscheinlichkeitsrechnung. Ergebnisse der Mathematik und ihrer Grenzgebiete (1929). Transl. and reprinted as Foundations of the Theory of Probability, Chelsea, New York (1950).

66. A.N. Kolmogorov. In Sarmanov, O.V. and Zaharov, V.K. [98].

67. P.O. KosteljaneC and Ju.G. ReSetnjak, The determination of a completely additive function by its values on half-space (in Russian). Usp. Mat. Nauk, Ser. (9), 61(3) (1954), 135-140.

68. W.H. Kruskal, Ordinal measures of association. J. Amer. Statist. Assoc. 53 (1958), 814-861.

69. H.O. Lancaster, The structure of bivariate distributions. Ann. Math. Statist. 29 (1958), 719-736; Correction 35, 1388.

70. H.O. Lancaster, Pairwise statistical independence. Ann. Math. Statist. 36 (1965), 1313-7.

87

71. H.O. Lancaster, The Chi-squared, distribution. Wiley, New York(1969).

72. H.O. Lancaster, The multiplicative definition of interaction. . Austral. J. Statist. 13 (1971), 36-44.

73. H.O. Lancaster, Development of the notion of statistical dependence. Math. Chronicle 2 (1972), 1-16.

74. H.O. Lancaster, Orthogonal models for contingency tables, pp. 99- 157 in Developments in Statistics, Vol. 3 (P.R. Krishnaiah, ed.). Academic Press, New York (1980).

75. H.O. Lancaster, Measures and indices of dependence. Encyclopedia of Statistical Sciences, Vol. 2. Wiley, New York (1982).

76. C. Lanczos, Linear systems in self-adjoint form. Amer. Math.Monthly 65 (1958), 665-679.

77. C.C. MacDuffee, The theory of matrices. Chelsea, New York (1946).

78. A.G. McKendrick, Applications of mathematics to medical problems. Proc. Edin. Math. Soc. 44 (1926), 98-130.

79. K.V. Mardia, Families of bivariate distributions. Griffin, London(1970).

80. K. Maung, Measurement of association in a contingency table with special reference to the pigmentation of hair and eye colours of 'Scottish school children. Ann. Eugen. London, 11 (1942), 189-223.

81. F.G. Mehler, Uber die Entwicklung einer Funktion von beliebig vielen Variablen nach Laplaceschen Funktionen hoherer Ordnung.

J. Reine Angew. Math. 66 (1866), 161-176.

88

82. 0. Nikodym, Sur une generalisation des integrales de M. J. Radon. Fund. Math. 15 (1930), 131-179.

83. A.M. Obuhov, Normal correlation of vectors (in Russian). Izv.Akad. Nauk SSSR Otd. Mat. Estestv. Nauk Ser. Fiz. 3 (1938), 339-370.

84. E.S. Pearson, Review of statistical methods for research workers, 2nd edition. Nature, 123 (1929), 866-867.

85. K. Pearson, Contributions to the mathematical theory of evolution.

II. Skew variation in homogeneous material. Philos. Trans. Roy.Soc. London, A186 (1895), 343-414.

86. K. Pearson, Mathematical contributions to the theory of evolution. VII. On the correlation of characters not quantitatively measurable. Philos. Trans. Roy. Soc. London, A195 (1900), 1-47.

87. K. Pearson, Mathematical contributions to the theory of evolution. XI. On the influence of natural selection on the variability and correlation of organs. Philos. Trans. Roy. Soc. London, A200 (1902), 1-66.

88. K. Pearson, Mathematical contributions to the theory of evolution. XIII. On the theory of contingency and its relation to association and normal correlation. Draper's Co. Res. Mem., Biometric Ser. No. 1, 35 p. Biometrika, London (1904).

89. K. Pearson, The fifteen constant bivariate frequency surface. Biometrika 17 (1925), 268-313.

90. K. Pearson and D. Heron, On theories of association. Biometrika9 (1913), 159-315.

91. R.L. Plackett, The analysis of categorical data. Griffin, London (1974).

89

92. S.J. Pretorius, Skew bivariate frequency surfaces, examined in

the light of numerical illustrations. Biometrika 22 (1930), 109-223.

93. C.R. Rao, Note on a problem of Ragnar Frisch. Econometrica 15 (1947), 245-249; Correction 17 (1949), 212.

94. A. R6nyi, On measures of dependence. Acta Math. Acad. Sci. Hungar.10 (1959), 441-451.

95. C. Rothschild and E. Mourier, Sur les lois de probability a re­

gression lin&aire et icart type li& constant. C.R. Acad. Sci. Paris, 225 (1947), 1117-1119.

96. O.V. Sarmanov, Pseudonormal correlation and its various general­

izations (in Russian). Dokl. Akad. Nauk SSSR 132 (1960), 299-302; Engl, transl. Soviet Math. Dokl. 1, 564-567.

97. O.V. Sarmanov, Investigation of stationary Markov processes by the method of eigenfunction expansions (in Russian). Trudy Mat. Inst. Steklov, 60 (1961), 238-261.

98. O.V. Sarmanov and V.K. Zaharov, Maximum coefficients of multiple correlation (in Russian). Dokl. Akad. Nauk SSSR 130 (1960), 269- 271; Engl, transl. Soviet Math. Dokl. 1, 51-53.

99. H. Schwerdtfeger, Direct proof of Lanczos' decomposition theorem. Amer. Math. Monthly, 67 (1960), 856-860.

100. T.J. Stieltjes, Extract d'une lettre addressee dt M. Hermite.

Bull. Sci. Math. (2) XIII (1889), 170-172.

101. J.J. Sylvester, Sur la rSduction biorthogonale d'une forme linio- liniaire & sa forme canonique. C.R. Acad. Sci. Paris, 108 (1889), 651-653.

90

102.

103.

104.

105.

106.

107.

108.

109.

110.

111.

A.A. Tschuprow, Grundbegriffe und Grundprobleme der Korrelations- theorie. Teubner, Leipzig. Transl. by M. Kantorowitsch. Principles of the mathematical theory of correlation. W.M. Hodge and Co., London (1939).

G.N. Watson, Notes on generating functions of polynomials (2) Hermite polynomials. J. London Math. Soc. 8 (1933a), 194-199.

G.N. Watson, Notes on generating functions of polynomials (3) Polynomials of Legendre and Gegenbauer. J. London. Math. Soc. 8 (1933b), 289-292.

G.N. Watson, Notes on generating functions of polynomials (4)

Jacobi polynomials. J. London Math. Soc. 9 (1934), 22-28.

G.N. Watson, A note on the polynomials of Hermite and Laguerre.

J. London Math. Soc. 13 (1938), 29-32.

H.W. Watson, Observations on the law of facility of errors.

Proc. Birmingham Philos. Soc. 7 (1891), 289-318.

W.F.R. Weldon, Certain correlated variations in Crangon vulgaris. Proc. Roy. Soc. London, 51 (1892), 2-21.

S.D. Wicksell, The construction of the curves of equal frequency in case of type A correlation. Sven. Aktuarietidskr. 4 (1917), 122-140.

S.D. Wicksell, Analytical theory of regression. Medd. Lunds Astronom. Obs. Ser. 2, No. 69 (1934). Reprinted in Lunds Univ. Xrsskr. Afd. 2, 30, No.l.

E. Wong, The construction of a class of stationary Markoff processes. Proc. Symp. Appl. Math. 16 (1964), 264-276.

91

112. E. Wong and J.B. Thomas, On polynomial expansions of second-order distributions. J. Soc. Indust. Appl. Math. 10 (1962), 507-516.

113. G.U. Yule, On the association of attributes in statistics with illustrations from the material of the Childhood Society. Philos. Trans. Roy. Soc. London, A194 (1900), 257-319.

114. G.U. Yule, Notes on the theory of association of attributes in statistics. Biometrika 2 (1903), 121-134.

University of Sydney

92