15
Quality and Quantity, 21:109-123 (1987) 109 Martinus Nijhoff Publishers (Kluwer), Dordrecht - Printed in the Netherlands Association, agreement, and equity KLAUS KRIPPENDORFF The Annenberg School of Communications, University of Pennsylvania, Philadelphia Abstract The paper attempts to make a clear distinction between three broad families of statistical indices: association, agreement, and what one may call equity. The need for this distinction arises in social research, for example, where reliability (accuracy, reproducibility, and stability) is assessed by measures of association rather than agreement. In this application, the assumptions built into an association measure conflict with the reality that gives rise to reliability data. A second motivation for this distinction is that association measures tend to express chance as the product of two potentially very different frequency distributions, agreement as the product of two identical distributions, and equity ignores such distributions altogether. A third motivation for this distinction is that the probability distribution of such measures does not depend on whether they are linear or non-linear, symmetrical or asymmetrical, or whether they express predictability or the extremality of a frequency distribution, but on their family membership. Notions of association, agreement, and equity have inherently nothing to do with the (nominal, ordinal, interval, and ratio) ordering in data. The 2-by-2 case is therefore chosen as the basis of the proposed distinction. All statistical indices, whether they are designed to characterise multi- variate data or to identify complex orderings, ought to be applicable to this most reduced case of two variables, making one distinction in each. To test a coefficient's membership in one of the three families, nothing more complex is needed. Origin of the problem A common problem in the social sciences is the statistical measurement of reliability. Reliability assessments typically require that two or more trained individuals (measuring instruments) independently observe, judge and record (are applied to) the same units of observation (objects) and in terms of the same descriptive categories (scales of measurement). A high level of agreement between coders as these "calibrated" observers are often called, can be interpreted to mean that data are reproducible and may therefore be relied upon in further analyses, a low level of agreement that data are worthless as a source of evidence. Although the notion of agreement is quite different from that of correlation - two variables agree only when x =y whereas two variables correlate already when x =ay + b - one finds that many researchers resort to correlation coefficients as measure of reliability. I have suggested elsewhere that this practice is unjustifiable because the assumptions built into a correlation coefficient are not consistent with the situation under which reliability data are obtained. Although striving, like myself, for what they call "operational interpretations", Goodman and Kruskal in their excellent review of the literature (1954,1959) turn out not to be so clear about the difference between

Association, agreement, and equity

  • Upload
    upenn

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Quality and Quantity, 21:109-123 (1987) 109 �9 Martinus Nijhoff Publishers (Kluwer), Dordrecht - Printed in the Netherlands

Association, agreement, and equity

K L A U S K R I P P E N D O R F F The Annenberg School of Communications, University of Pennsylvania, Philadelphia

Abstract The paper attempts to make a clear distinction between three broad families of statistical indices: association, agreement, and what one may call equity.

The need for this distinction arises in social research, for example, where reliability (accuracy, reproducibility, and stability) is assessed by measures of association rather than agreement. In this application, the assumptions built into an association measure conflict with the reality that gives rise to reliability data. A second motivation for this distinction is that association measures tend to express chance as the product of two potentially very different frequency distributions, agreement as the product of two identical distributions, and equity ignores such distributions altogether. A third motivation for this distinction is that the probability distribution of such measures does not depend on whether they are linear or non-linear, symmetrical or asymmetrical, or whether they express predictability or the extremality of a frequency distribution, but on their family membership.

Notions of association, agreement, and equity have inherently nothing to do with the (nominal, ordinal, interval, and ratio) ordering in data. The 2-by-2 case is therefore chosen as the basis of the proposed distinction. All statistical indices, whether they are designed to characterise multi- variate data or to identify complex orderings, ought to be applicable to this most reduced case of two variables, making one distinction in each. To test a coefficient's membership in one of the three families, nothing more complex is needed.

Origin of the problem

A c o m m o n problem in the social sciences is the statistical measurement of

reliability. Reliabil i ty assessments typically require that two or more t ra ined

individuals (measuring ins t ruments) independen t ly observe, judge and record

(are applied to) the same uni ts of observat ion (objects) and in terms of the same descriptive categories (scales of measurement) . A high level of agreement

between coders as these "ca l ibra ted" observers are often called, can be interpreted to mean that data are reproducible and may therefore be relied

u p o n in further analyses, a low level of agreement that data are worthless as a

source of evidence.

Al though the no t ion of agreement is quite different from that of correlat ion - two variables agree only when x = y whereas two variables correlate already

when x = a y + b - one finds that m a n y researchers resort to correlat ion

coefficients as measure of reliability. I have suggested elsewhere that this practice is unjust i f iable because the assumptions bui l t into a correlat ion coefficient are no t consis tent with the s i tuat ion under which reliabil i ty data

are obtained. Al though striving, like myself, for what they call "opera t iona l in terpreta t ions" , G o o d m a n and Kruskal in their excellent review of the l i terature (1954,1959) tu rn out no t to be so clear about the difference be tween

O

Tabl

e 1.

A C

ompa

riso

n of

Ass

ocia

tion,

Agr

eem

ent,

and

Equ

ity

Coe

ffic

ient

s.

p2•q2 p1

(P

")

ql (

q,,)

1

line

a!

Pl +

P2 =

l-q

A S

SO

CIA

TIO

N

data

are

dra

wn

from

tw

o se

para

te p

opul

atio

ns

Ben

ini (

1901

)

ad -

-

bc

B-

rain

( P

lq2,

P2q

x)

Pear

son ad

-

bc

4,=

Vf~

q,

P2q

2

AG

RE

EM

EN

T

data

are

dra

wn

from

one

com

mon

pop

ulat

ion

wit

hout

rep

lace

men

t w

ith

repl

acem

ent

Kri

ppen

dorf

f (1

978b

,198

0)

Or=

P

q

b+

c'

n+l

2 =1

extr

emal

ity

/ sy

mm

etri

cal X

~pre

dict

abil

ity

asym

met

rica

l

Peir

ce (1

884)

ad

- bc

P~q

s

bqs

+ c

p,

=1

Psq

s

n pq

Kri

ppen

dorf

f (1

978b

) 1

ad

b(b

-n)-

(a-d

)(b

-c)+

c(c-

])

2 C

~ 1 =

D+

('

n-1

2 t~

s=l

n p~

q~

Plq

l

(lar

ge s

ampl

es o

nly)

Scot

t (19

55) P

q

b+

c 2 =

l-

--

P

q

Kri

ppen

dorf

f (1

978b

)

b 2

-(a

-d)(

b-

c)+

c 2

ad

2 q7

1

b+

c 2 G

=I

--

-

Psq

s

Plq

l

EQ

UIT

Y

data

are

dra

wn

from

un

ifor

m p

opul

atio

n(s)

with

rep

lace

men

t

Ham

ann

(196

l)

= (

a +

d)-

(b

+ c

)

2 ]

\ 2

] I

]

22

b+c 2

=1-

-

-

!/

22

/

on-li

near

Yul

e (19

12)

Odd

- bCb

T Y

Yul

e (19

12)

ad

- bc

Q

=-

-

ad +

bc

Oty

=

aQ

~ +

-

-

2 n

b+

c(b

+c

1)

ad +

--

2

2 n

b+

c 2 ~r

r b

+ c

~

2d

+--

2

2

q'tO

{b

+r

"qQ

=

(a+

d)2

-(b

+c)

2

(a+

d

)2+

(b+

c)

2

112

association and agreement assumptions. And a recent review of reliability measures by Fleiss (1975) only adds to the uncertainty (Krippendorff, 1978a).

The distinctions I am proposing here transcend the reliability context in which they arose and are far more basic than previously assumed.

Scope of the paper

What I am proposing is a fairly general distinction between three broad families of statistical indices: association, agreement, and what one might call equity. To stay within reasonable boundaries, I am limiting the scope of the paper as follows:

Although my argument is intended to be entirely general, I am basing my distinctions on 2-by-2 contingency matrices. I contend that all statistical indices, whether they are designed to characterize multi-variable data or to identify complex orderings, are applicable to this most reduced case of two variables with one distinction in each. To reveal a coefficient's membership in one of the three families, nothing more complex is needed than applying it to this case.

The measures i will be concerned with all have a zero-point that is indicative of a situation governed entirely by chance, take positive and

negative values and have upper and lower limits, usually + 1 and - 1 . This excludes several familiar measures, for example, Kendall 's Xr because its zero-point defies a clear interpretation, and the tetrachoric correlation coeffi- cient, the contingency coefficient, information transmission measures, chi- square measures, etc. because they assume positive values only, with the latter two having no upper limit.

I can not survey the vast field of statistical indices, even within the above constraints. Others (e.g. Goodman and Kruskal, 1954, 1959) have done this better than I can. By selecting only representatives of a certain type, I am also ignoring the fact that a large number of different coefficients become the same when applied to the 2-by-2 case.

Finally, I am not claiming that the distinction I am proposing leads to an exhaustive classification scheme with mutually exclusive types. There are hybrid coefficients and others to which the proposed distinction is irrelevant. Such measures may well have valid operational interpretations in contexts other than here envisioned. I believe though that any system of classification for statistical indices for cross-classification will have to include the distinction between association, agreement and equity.

I am presenting my distinctions in the form of Table 1 and proceed to discuss what its rows and columns mean.

Population assumptions

I am suggesting that association, agreement and equity measures are dis- tinguishable by their population assumptions. Such assumptions refer to the

113

source of the data and affect the interpretation of these coefficients, especially at their zero-points. By setting a coefficient to zero, population assumptions virtually reveal themselves in the way data are regarded as the product of chance.

In the context of reliability assessment, discussed above, the notion of agreement is based on the sameness or difference between two values that are separately attributed to the same object. It follows that agreement can be measured only between variables that make the same distinctions (else the identity of the two values can not be established) and that it applies only to a single set of objects (else differences among objects contribute to a measure that is intended to reflect only unreliabilities of coding, for example). Those two points may appear obvious but not their consequence: agreement mea- sures assume that observed values are drawn from a single population of values.

In contrast, association and correlation measures cannot make this assump- tion. Very different kind of variables may be correlated with each other, for example, sex with income. Whether a single set of objects is described in terms of two kinds of variables or whether two distinct sets of objects are compared along the same dimension, there is no reason to assume that the frequency distributions underlying the two correlates must be the same. In fact it is the very essence of correlation that it takes differences in marginal distribution into account. Hence for association, values are assumed to be drawn from two different populations of values or from one population of objects described in terms of two different variables, each with its own parameter.

The two ways of regarding data as the product of chance follow from these assumptions. With a, b, c, d, Pl, etc. denoting observed proportions of the sample size, the 2-by-2 contingency matrices for assessing association are:

a b P 1 statistical P 1 P2 P lq2 P l observed

independence c d ql ql P2 qlq2 ql

P2 q2 1 P2 q2 1

in which Pl characterizes one population of categories and P2 the other. In both of these 2-by-2 matrices the individual marginal proportions thus main- tain references to two separate populations of categories.

Under the assumption of a single population, the expected proportions may be one of two kind: with or without replacement. Thus, the expected propor- tions for assessing agreement are:

matches expected

without replacement

pn - 1 qn P n - 1 P~--1-1

pn qn - 1 q n - 1 q ~ - I

P q

matches p expected

with replacement

q

p2 pq

pq q2

P q

114

Therein the common population parameter p must be either independently known or estimated from the two observed marginal proportions Pl and P2- Although there exist coefficients that seem to imply that such estimates could be based on the geometric mean and on the harmonic mean, I believe that the

arithmetic mean is the only function that preserves the required complementary.

Assuming a weight 0 ~< w ~< 1:

p = wp, + (1 - w ) p 2 = w(1 - ql) + (1 - w)(1 - q2) = 1 - wq, - (1 - w ) q 2

= 1 - q

Since all probabilities are based on the same sample size, w = 1 /2 seems to be Pl +P2

the only reasonable weight to estimate p from Pl and P2- Hence: p 2

b + c and by implication the proportion ~ in the b and c cells of the con-

tingency matrix of observed proportions:

observed

b + c

2

b + c d

2

P q

P

I have not yet said much about equity, partly because such measures are less familiar. They can be regarded as the special case of agreement in which p = q and as the special case of association in which Pl = P2 = �89 The observed and the expected proportions for assessing equity then reduces to:

observed

a + d b + c

2 2

b + c a + d

2 2

1 1

1

expected

1

1 1 1

1 1 1

1 1

Another way of clarifying the distinction is to say that association regards observations as ordered pairs, agreement considers observations as pairs without regard for their order (a mismatch is a mismatch regardless of kind) while, in the 2-by-2 case, equity distinguishes only between matching and non-matching observations.

I suggested that the population assumptions are recognizable by setting a coefficient to zero. A brief examination of Table 1 reveals that all association coefficients share the same numerator: a d - b c or some power of the two products. But, considering association assumptions:

ad - bc = a - P l P2

115

which shows that association coefficients are zero when the observed propor- tion a equals the expected proportion PaP> and that this expected proportion is estimated from two separate populations, characterized by Pl and P2 respectively.

Excepting the asymmetrical cases, the numerators of the agreement coeffi- cients are also easily expressed in terms of the difference between an observed and an expected proportion. Without replacement:

b + c ( b + c 1) 1 _ = a _ p pn - a d - T 2 n n - 1

and with replacement:

b + c ) 2 = a _ p 2 ad- - - ~

Thus, agreement coefficients are zero when the observed proportion a equals pn - 1 p2

p--n--Z-]-i or respectively. But both are now estimated from one common

population with p. For the asymmetrical cases, the corresponding expressions are merely less transparent. When the variable with respect to which such a coefficient is assessed is s = 1'

a d - ~ - n

a + d 1( psn - 1 qsn - 1 ) 2 2 p~ n-----U~-I +q" n---l-

and:

a d _ � 8 9 a + d p2s +q2, 2 2

For the equity coefficient the numerator is:

( a 2 ~ d ) 2 - ( ~ - - ~ ) 2 a 2 d ( 1 ) 2

The coefficient is zero when a + d equals b + c, or equivalently, when �89 + d ) equals �88 which is the proportion in any one cell as estimated without knowledge of the population parameters or setting these parameters uniformly

1 t o p = 7 . Thus, the population assumptions as revealed through the condition under

which these coefficients are zero provide a distinction between four broad families of coefficients: association, agreement without replacement, agree- ment with replacement (for large sample sizes only) and equity.

116

L i n e a r i t y - n o n - l i n e a r i t y

Measures of association, agreement and equity cannot be anything other than a function of the relative frequencies within a contingency matrix. In the 2-by-2 case there are essentially three reference points of interest.

ama x = the proportion at which association, agreement or equity is maxi- mum

ae~ p = the proportion at which association, agreement or equity is the product of chance

amm = the proportion at which association, agreement or equity is minimum Obviously, unless ae~ p divides the interval between a .... and ami n in two

equal parts, a linear function of the relative frequencies cannot connect the three reference points and range from + 1 to - 1 . This may be visualized by means of Figure 1. Thus, linear coefficients restrict their indicative power to two reference points or to one reference point. And coefficients that are indicative of all three must take a non-linear form.

This gives rise to the distinction between linear and non-linear coefficients with a further classification of linear coefficients into those whose values are and those whose values are not indicative of the maximum conditions. The former might be termed indices of extremafity and later turn out to be interpretable as indices of predictability.

A linear form that ignores ami n is known since Peirce (1884) and is interpretable as the degree to which the observed situation resembles the maximum condition rather than chance. From the maximum and chance condition:

a max brain

Cmin dmax

aexp bexp

Cexp dexp

Peirce considered the probabilities with which the two conditions prevail in data. Let ~ be the probability of the maximum condition to account for the observations and 1 - ~ be the probability of chance to determine the data. Equaling the observed situation with a mixture of the two weighted condi- tions:

a=~ama x + ( 1 - ~ ) a e x p b=~bmi n + ( 1 - ~ ) b e • p

c = ~Cmi n + (1 - ~)C~• d = ~dma x + (1 - ~)dex p

for two degrees of freedom the unique solution for ~ turns out to be:

a + d - (aex p -Jr- dexp)

a .... +dmax-(aexp+dexp)

117

And for one degree of freedom it simplifies to:

a - aex p

ama x -- aex p

Although negative values of ( would defy its interpretation as a probability, the procedure provides a good motivation for a measure of extremality. In fact, excepting indices of predictability, all coefficients are expressable in either of these forms.

Consider now the rather different population assumptions for the four families of coefficients. The values in these expressions are:

for association:

for agreement (any sample size):

for agreement in large samples:

for equity:

amax ---- min(Pl , P2); aexp =PlP2 pn -- 1

amax =P ; aexp = P n - 1

amax = p; aexp = p2 ( a + d ) 1 ( a + d ) = ( 1 ) 2

- - 5 - max = -~; - - T - exp

and by entering these in the expression for ~, one obtains Benini's (1901) /3, Krippendorff 's (1978b) a, Scott's (1955) ~r, and Hamann's (1961) 71. Their ranges are:

q1 ' -~2 ] max ' ~ 2 ~</3~<1

1 1

--min ~ n ~<a~<l q P

q) - m i n q , ~<~r<l

- 1 ~ 1

Excepting ~, the negative values of these linear coefficients are not so clearly interpretable. It is therefore advisable to restrict their use to situations in which the focus of a statistical exploration is on association or on dissociation but not on both, or on agreement or disagreement but not on both. For example, in reliability contexts, agreement is not likely to be below chance (except for the effects of sampling errors). This makes a and ~r perfectly appropriate agreement measures despite their uninterpretable lower limits.

A second linear form takes the expected relative frequency, a~x p, as the sole reference point and, without assigning a particular value to am~ x or to amen, distributes its range around the zero point according to a variance related

118

criterion. This form is exemplified by Pearson's Product-Moment Correlation Coefficient which, in the 2-by-2 case, reduces to the ~ coefficient:

V qlq2 V P i P 2 P:ql V Plq2 ,

or equivalently:

- m l n , , , - - , C e x p

Since the upper and lower limits of () contain no reference to the maximum condition, it is not interpretable as an index of extremality. It assesses the degree to which the variance in one variable is explainable from the variance in the other and, like any index of predictability, it reaches unity (positive or negative) only when marginal proportions are equal p~ =P2 or Pl = q2. If these proportions are unequal then, although there will always be a pattern of maximum association, there will always remain some uncertainty whenever one attempts to explain, account or predict the distinctions made in one variable from the distinctions made in the other. Thus the indication of extremality and the indication of predictability are based on two fundamen- tally different ideas.

_I ~ 2

P

mln(Pl,P 2) 1

2

�9 za~l

PlP2

O,

I i

t

0 +1 ~ - - r 1

Q

,,, ,17

rl �9

Fig. 1. Linear and Symmetrical Coefficients.

119

II 1 I

min(Pl,p2 ~ " _ _

-1

Fig. 2. Non-linear and Symmetrical Coefficients.

0 +1

For agreement and for equity coefficients which assume data to be drawn from a single population or marginal proportions to be irrelevant respectively, the difference between extremality and predictability disappears. This is indicated in Table 1. The five coefficients so far discussed are shown in Figure

a + d 1 as a function of the relative frequency a or, in the case of equity, of

The non-linear forms accomplish what the linear coefficients do not: they indicate all three reference points, am~ x by 1, aex p by 0, and anon by - 1. The two non-linear forms considered here are those in Yule's (1912) association coefficients, Y and Q, and analogous expressions for agreement and equity. Table 1 reserves two rows for these coefficients. One may notice that the principles of constructing Yule's Y are already contained in the equity coeffi- cient aq. The coefficients in its row are thus of the same power as the other linear coefficients in Table 1. However, Yule's Q, aO, ere, and TJQ are more powerful than the others. This accounts for the S-shaped function of this group of coefficients as depicted in Figure 2.

Symmetry-asymmetry

The coefficients so far mentioned are all symmetrical in the sense that both variables equally contribute to their value. In a symmetrical coefficient, variables are interchangeable. Peirce (1884) argued the case for an asymmetri- cal association coefficient, 0, which makes references to only one of the two variables. Krippendorff (1978b) proposed an agreement coefficient vr,, that allows the researcher to assess data reliability (accuracy) relative to a standard,

120

s. Table 1 lists a third asymmetrical coefficient. Their respective ranges are:

min(Pl P2, q l q 2 ) ~<0,

Psqs P~qs

min(]31q2, P2ql)

n - 1 r a i n ( p , q ) 1 ~ < a s ~ < l - - - -

n p.,.q.~

n - 1 p~ - p:.

n 2p~q,,.

min (p , q)

P~qs

P l - - ]32 ~<%~<1- ~

Evidently, the inequality p~ =~P: or lack of agreement on the marginal proportions, Pl YaPs or p24=ps, prevents these coefficients from reaching unity. They are thus sensitive to deviations from the standard to which an asymmetrical coefficient refers by its subscript.

The difference between the symmetrical and the asymmetrical coefficients lies only in their denominators. For the association coefficient this is quite obvious: when the reference to one variable is changed to the other, min(Plq2, P z q l ) and r become either P l q l or P2q2. Since this is not so readily apparent for the agreement coefficients, Table 1 gives a, vr, %, and % also in their form 1 - Dobs/Dex p (see Krippendorff, 1970) in which D is a measure of disagreement. They are also seen to differ only in their denomina- tors: the expression for the variance in the common population, pq , is replaced by the variance in the standard or reference population, p,q.,..

Hybrid cases

I mentioned the existence of hybrid cases, i.e., coefficients that make conflict- ing population assumptions. This is exemplified by Yule's (1912) coefficient of colligation a~ and Cohen's (1960) K. oa was intended to measure association, was proposed as an agreement coefficient. Both are expressed here in terms of ( a -- a e x p ) / ( a . . . . - - aexp) and familiar through Table 1 :

a -- Pl P2 a d - bc

2' ( I ) 2

K ~ - - m

a - Pa P2 a d - bc

P - P l P 2 � 8 9 + q l P 2 )

It is apparent, both numerators reveal association assumptions. It follows that K can hardly be interpreted as an agreement coefficient for its numerator fails to register disagreements on marginal proportions.

121

However, ~'s denominator is seen as making equity assumptions (of no marginal constraints, Pl = P2 = P = �89 which conflicts sharply with its numera- tor's association assumptions (of two separate marginal constraints char- acterized by Pl and P2). In the case of x the conflict occurs within the denominator. In the form ( a - aexp)/(amax- aexp) , area x makes agreement assumptions (of a common populat ion, characterized by p = �89 Pl + P2)) while aex p makes association assumptions (of two separate populations). Thus, near its zero-point, K behaves like an association coefficient while near its maximum it behaves like an agreement coefficient. I therefore suggest that x's interpretation is unclear. Neither is it appropriate for the kind of reliability assessments for which it was proposed nor is it a good association measure.

Perhaps it is these kind of inconsistencies that make some coefficients more successful than others. The coefficients in Table 1 may not be equally useful but their internal consistency makes them at least unambiguously interpreta- ble.

Probability distributions

The three families of coefficients also differ in their probability distribution. For all association coefficients, which accept the two marginal frequencies

as fixed, the probability of obtaining a particular pattern of cooccurance in a 2-by-2 matrix of frequencies:

A B

C D

A + C B + D

A + B

C + D

N

is given by the hypergeometric distribution:

P a s s o c i a t i o n = A+cC~ B+DCB (A + B)!(A + C)!(B + D)!(C + D)!

NCA+B N!A!B!C!D!

This probability distribution is well known through Fisher's (1934) exact probability test. The number of different configurations of frequencies within such a matrix which is also the number of different association coefficients and the number of individual probabilities Passociation that sum to unity is:

min(A, D) + min(B, C) + 1

The number of different agreement coefficients is not constrained by the Pl + P2 two marginal frequency distributions but by the proportion p - 2 and

by the sample size N. There are 2UCp~U+p2n ways p l N + p 2 N " l " s and ql N + q2N "0"s can be placed into a 2-by-N matrix. Since the order in which

122

these pairs appear in this matrix is irrelevant, NPA,B+(-,D, i.e. the number of arrangements of A matching " l"s , of B + C mismatches and of D matching "0"s, must be discounted. Agreement coefficients do not distinguish between B (0-D-mismatches and C (1-0) mismatches either. Hence the 2 ~+c partitions of B + C pairs into B pairs and C pairs must be discounted as well. Thus the probability of obtaining a particular agreement coefficient is:

NPA,B+C,O 2~+C = (2A + B + C)!( B + C+ 2D)!N!2 e§ Pagree . . . . . t 2NC2AqB+(, (2N)!A!(B+ C)D!

And, under agreement assumptions, the number of different agreement coeffi- cients whose individual probabilities Pag ........ ~ add up to unity is:

min(A, D)+ "B+C'j~-~--[+I

where the inverted bracket denotes the largest integer contained in the enclosed expression.

The number of different equity coefficients is constraint only by the sample size N. There are 2 N possible ways N units of observations can be partitioned into matching and mismatching pairs regardless of kind. Of these ~eC A + i~ are regarded equivalent by a given equity value. Hence, the probability of obtain- ing a particular equity coefficient is:

P e q u i t y - - UCA+D N!

2 > ( A + D ) ! ( B + C ) ! 2 ~'

And, under equity assumptions, the number of different equity coefficients whose individual probabilities Pcq,,ity sum to unity is:

N + I

The two new probability distributions just given lend themselves to non- parametric tests of statistical hypotheses regarding agreement and equity just as Fisher's exact probability test does regarding association: the probability of a coefficient to be equal or larger than the one obtained is the sum of the probabilities of the frequency configurations equal to or more extreme than the one observed. Exactly which configurations are considered more extreme or less extreme crucially depends on the population assumptions underlying the coefficient chosen. Members of the same family of coefficients impose the same rank ordering on these configurations and thus share the same probabil- ity distribution. This is another justification for the distinction between association, agreement, and equity.

123

References

Benini, Rodolfo. Principii di Demografia. No. 29 of Manuali Barbbra di Science Giuridiche Sociali e Politicize. Firenzi: G. Barbara, 1901.

Cohen, Jacob. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 20, 1: 37-46, 1960.

Fisher, Ronald A. Statistical Measures for Research Workers (5th ed.) Edinburgh: Oliver and Boyd, 1934.

Fleiss, Joseph L. Measuring Agreement between Two Judges on the Presence or Absence of Trait. Biometrics, 31: 651-659, 1975.

Goodman, Leo A. and Kruskal, William H. Measures of Association for Cross Classifications. Journal of the American Statistical Association, 49: 732-764, December 1954.

Goodman, Leo A. and Kruskal, William H. Measures of Association for Cross Classifications II. Further Discussion and References. Journal of the American Statistical Association, 54: 123-163, March 1959.

Hamann, U. Merkmalsbestand and Verwandtschaftsbeziehungen der Farinosae. Ein Beitrag zum System der Monokotyledonen. 14/illdenowia, 2: 639-768, 1961.

Krippendorff, Klaus. Bivariate Agreement Coefficients for Reliability Data. In E.F. Borgotta (Ed.) Sociological Methodology 1970. San Francisco: Jossey-Bass, 1970.

Krippendorff, Klaus. Reliability of Binary Attribute Data. Biometries, 34, 1: 142-144, March 1978a.

Krippendorff, Klaus. Reliability, The Case of Binary Attributes. Philadelphia: The Annenberg School of Communications, University of Pennsylvania, Mimeo, 1978b.

Krippendorff, Klaus, Content Analysis, an Introduction to its Methodology. Beverly Hills CA: Sage, 1980.

Peirce, C.S. The Numerical Measure of the Success of Predictions (Letter to the Editor). Science, 4: 453-454, 1884.

Scott, William A. Reliability of Content Analysis: The Case of Nominal Scale Coding. Public Opinion Quarterly, 19: 321-325, 1955.

Yule, G. Udny. On the Methods of Measuring Association between Two Attributes. Journal of the Royal Statistical Society, 75: 579-642, 1912.