Clinical Disagreement and The Kappa

UNDERSTANDING KAPPA STATISTIC

KAPPA• KAPPA = River Imp, Water Sprite

Origin = Japan (with Chinese & Hindu Antecedents)

• the kappa has a beak, webbed feet and a shell on its back and dwells under bridges, pouncing on any who attempt to cross the river.

OBJECTIVES

1. To understand the concept of clinical disagreement and observer variability.

2. To understand Kappa and related statistics.

LEARNING STRATEGIES

1. Identify the situations when clinical disagreement occurs.

2. Calculate Kappa and understand the concepts of Kappa.

Epictetus, 2nd Century:

Appearances of the mind are 4 kinds:1. Things are what they appear to be. (It is pneumonia, and

appears like one).2. Things neither are, nor appear to be. (It is not pneumonia,

nor does it appear like one)3. They are, and do not appear to be (It is pneumonia but does

not look like one).4. They are not, yet appear to be (It is not pneumonia, but looks

like one).

Sorting out all of these appearances in everyday life is the tasks

of wise men (doctors). ( Example: Diagnosing pneumonia by history and

physical examination)

Why the disagreement?

Variations

Sources:

How the measurements are carried out (Instruments, Tests, The person carrying out the measurements).

Biological factors (within the individuals (patients), among individuals (patients)).

CASE SCENARIOS OF CLINICAL DISAGREEMENT

1. Two radiologists, A and B, disagreeing on evidence of malignancy.

2. Disagreement between two cardiologists in the interpretation of electrocardiograms to look for evidence of ischemia.

3. A psychiatrist from HUKM disagreeing on the diagnosis of a psychotic disorder of his patient diagnosed earlier as such by another colleague at HKL.

WHY KAPPA?

• In reading medical literature on diagnosis and interpretation of diagnostic tests, our attention is generally focused on items such as: sensitivity, specificity, predictive values and likelihood ratios. Those mentioned above address the validity of the tests.

• But if the people who actually interpret the test cannot agree on the interpretation of the results, the test results will be of little use!

RATIONALE OF USING KAPPA

• Kappa tries to eliminate agreement which would be expected by chance alone.

• Let’s say you and me agree 95% of the time on a specific test. Merely saying 95% agreement is not enough!

• Kappa takes into consideration the proportion of the observed agreement beyond chance (potential agreement beyond chance minus agreement attributed by chance) divided by remaining agreement (potential agreement beyond chance).

Cont…

Example: Let’s say that my diagnostic test to confirm a malaria slide is to flip a coin. I find that I agree with my colleague (Dr X) 55% of the time.

a. Is that agreement good enough?

To know so, I would have to see how much of that agreement would be by chance alone?

Cont….

Chance alone should account for 50%. Of the remaining 50%, I observed 5% agreement. The Kappa should therefore be (55 – 50) divided by (100 – 50) = 5/50 = 0.1

The Answer: It is a low agreement.

The lesson: You cannot diagnose malaria by flipping coins!

Two clinicians look at the same 100 mammograms, and both thinks 20 are positive for breast cancer:

Neg Pos. Neg. Pos. Neg. Pos

Neg. 80 0 75 5 70 10

Pos. 0 20 5 15 10 10

Agree: 100% 90% 80%Kappa: 1.0 0.69 0.38

INTER-RATER RELIABILITY

• Inter-rater reliability refers to the correlation of responses from two or more raters, each evaluating the same endpoint or making the same measurements in multiple subjects.

• Inter-rater reliability is an important concept in clinical research.

• Errors may arise as a result of different interpretation.Eg. How different pathologists interpret a histo-pathology slide.

KAPPA COEFFICENT & CORRELATION COEFFICIENT• Kappa coefficient is analogous to Pearson

correlation coefficient (or Spearman rank correlation coefficient) and

• has the same range of values (+1 to –1).

• However, it is better in several ways in identifying disagreement compared to Pearson or Spearman rank correlation.

WHY NOT USE CORRELATION COEFFICIENT?

1. Correlation coefficient is high for any linear relationship, not just when the first measurement equals the second measurement. If the second measurement is multiplied by 3, the correlation coefficient remains the same, although the measurements no longer agree.

2. The test of significance for the correlation coefficient uses the absence of relationship as the null hypothesis. This will invariably be rejected, since of course there is a relationship between the first and second measurements – even if they don’t agree with each other very well. The null hypothesis is rarely of clinical interest ( as variables being correlated usually have some relationship to each other).

Kappa Formula

• Kappa takes into account the probability that some agreement will occur by chance.

K = observed agreement - chance agreement

1- chance agreement

= (Po - Pe)/ (1 – Pe)

KAPPA COEFFICIENT (KAPPA STATISTIC)

Observer 1Pos Neg Mar. Total

Pos a b g1Observer 2

Neg c d g2

Marg. Total f1 f2 n

Reading 2 X 2 Table

a and d = No. of times observers agree.

b and c = No. of times observers disagree.

If no disagreements, then b and c = 0 and observed agreement, Po would be 1.

If no agreements, a and d = 0, and observed agreement, Po would be 0.

Kappa Formula

Whereby:

Po = a+d n

f1 X g1 + f2 X g2 n nPe = _________________

n

KAPPA interpretation

Range of possible values for Kappa = -1 to 1.

Poor agreement = < 0.2Fair agreement = 0.2 to 0.4Moderate agreement = 0.4 to 0.6Good agreement = 0.6 to 0.8Very good agreement = 0.8 to 1.0

It is rare that we get a perfect or negative agreement.

Kappa

Observer # 1 TotalPositive Negative

Obs. # 2Positive 40 10 50

Negative 20 30 50Total 60 40 100

Po = (40 +30)/100 = 0.7Pe = ((60 * 50)/100 + (40*50)/100))/100= (30 +20)/100

= 0.5Kappa = (0.7-0.5)/ (1-0.5) = 0.2/0.5 = 0.4

WEIGHTED KAPPA

• Use in Ordinal data. Ordinal data has an inherent order, such as pain is rated “none’, ‘mild’, “moderate’ or ‘severe”.

• A weight of “0” when the two raters are maximally apart, a weight of “1” when there is exact agreement, and weight proportionately spaced for in between intermediate levels of agreement.

• The formula is the same for normal kappa except that observed and expected agreement are summed not just along the the diagonal, but for the whole table, with each cell first multiplied by a weight for that cell.

Example: Weighted 0, 0.5, 1.0

Observer # 1Normal Mild Serious Total

Norm. 7 2 1 10#2

Mild 5 10 5 20

Serious 3 3 14 20

Total 15 15 20 50

Observed Agreement

Normal Mild Serious Total

Normal 7 10Mild 10 20Serious 14 20

Total 15 15 14 50Observed agreement = Po =((7+10+14)/50) = 0.62

Expected (Perfect) Agreement


Normal 3Mild 6Serious 8TotalExpected agreement = Pe = (3+6+8)/50 = 0.34

Observed weighted agreement


Normal 2Mild 5 5Serious 3Total

Partial agreement =(2+5+3+ 5)/50 = 15/50 = 30%But we give only ½ credit for this partial

agreement, so we get 30% * 0.5 = 15%

Expected partial agreement for weighted kappa


Normal 3

Mild 6 8

Serious 6

Total

Expected numbers= (3+6+6+8)/50 = 23/50= 0.46

Since we are giving ½ credit = 0.46 * 0.5 = 0.23

Weighted Kappa

Total observed agreement = 0.62+0.15 = 0.77

Total expected agreement = 0.34 + 0.23 = 0.57

Weighted Kappa = (0.77-0.57)/(1-0.57) = 0.465

Weighted values

For 3 categories:

Complete disagreement = 0 weighted.

Partial agreement = ½ weighted.

Complete agreement = 1 weighted.

For 4 categories = 0, 0.33, 0.67 and 1.0

Using SPSS to Compute Kappa

In variable view use Doctor 1 and Doctor2 as string variables. Count as numeric.

Data weight cases.. To weight for count.Select ANALYZE I DESCRIPTIVE STATISTICS I

CROSSTABS from the SPSS menu. In the dialog box, click on the STATISTICS button and then select the Kappa option box.

Note: Make sure your data are in the right column and rows.

Final notes

• Kappa should not be viewed as the unequivocal standard to assess rater agreement.

• Kappa value itself is influenced by chance. Confidence interval for kappa may be more informative.

• Kappa may not be reliable for rare observation. Kappa is affected by prevalence. For rare findings, very low kappa may not necessarily reflect low rate of overall agreement.

• Because it is affected by prevalence, it may not be appropriate to compare kappa between different studies or populations.

THANK YOU

Presentations & Public Speaking

Clinical Disagreement and The Kappa