Upload
sppippukm
View
362
Download
2
Embed Size (px)
DESCRIPTION
Clinical Disagreement and The Kappa
Citation preview
UNDERSTANDING KAPPA STATISTIC
KAPPA• KAPPA = River Imp, Water Sprite
Origin = Japan (with Chinese & Hindu Antecedents)
• the kappa has a beak, webbed feet and a shell on its back and dwells under bridges, pouncing on any who attempt to cross the river.
OBJECTIVES
1. To understand the concept of clinical disagreement and observer variability.
2. To understand Kappa and related statistics.
LEARNING STRATEGIES
1. Identify the situations when clinical disagreement occurs.
2. Calculate Kappa and understand the concepts of Kappa.
Epictetus, 2nd Century:
Appearances of the mind are 4 kinds:1. Things are what they appear to be. (It is pneumonia, and
appears like one).2. Things neither are, nor appear to be. (It is not pneumonia,
nor does it appear like one)3. They are, and do not appear to be (It is pneumonia but does
not look like one).4. They are not, yet appear to be (It is not pneumonia, but looks
like one).
Sorting out all of these appearances in everyday life is the tasks
of wise men (doctors). ( Example: Diagnosing pneumonia by history and
physical examination)
Why the disagreement?
Variations
Sources:
How the measurements are carried out (Instruments, Tests, The person carrying out the measurements).
Biological factors (within the individuals (patients), among individuals (patients)).
CASE SCENARIOS OF CLINICAL DISAGREEMENT
1. Two radiologists, A and B, disagreeing on evidence of malignancy.
2. Disagreement between two cardiologists in the interpretation of electrocardiograms to look for evidence of ischemia.
3. A psychiatrist from HUKM disagreeing on the diagnosis of a psychotic disorder of his patient diagnosed earlier as such by another colleague at HKL.
WHY KAPPA?
• In reading medical literature on diagnosis and interpretation of diagnostic tests, our attention is generally focused on items such as: sensitivity, specificity, predictive values and likelihood ratios. Those mentioned above address the validity of the tests.
• But if the people who actually interpret the test cannot agree on the interpretation of the results, the test results will be of little use!
RATIONALE OF USING KAPPA
• Kappa tries to eliminate agreement which would be expected by chance alone.
• Let’s say you and me agree 95% of the time on a specific test. Merely saying 95% agreement is not enough!
• Kappa takes into consideration the proportion of the observed agreement beyond chance (potential agreement beyond chance minus agreement attributed by chance) divided by remaining agreement (potential agreement beyond chance).
Cont…
Example: Let’s say that my diagnostic test to confirm a malaria slide is to flip a coin. I find that I agree with my colleague (Dr X) 55% of the time.
a. Is that agreement good enough?
To know so, I would have to see how much of that agreement would be by chance alone?
Cont….
Chance alone should account for 50%. Of the remaining 50%, I observed 5% agreement. The Kappa should therefore be (55 – 50) divided by (100 – 50) = 5/50 = 0.1
The Answer: It is a low agreement.
The lesson: You cannot diagnose malaria by flipping coins!
Two clinicians look at the same 100 mammograms, and both thinks 20 are positive for breast cancer:
Neg Pos. Neg. Pos. Neg. Pos
Neg. 80 0 75 5 70 10
Pos. 0 20 5 15 10 10
Agree: 100% 90% 80%Kappa: 1.0 0.69 0.38
INTER-RATER RELIABILITY
• Inter-rater reliability refers to the correlation of responses from two or more raters, each evaluating the same endpoint or making the same measurements in multiple subjects.
• Inter-rater reliability is an important concept in clinical research.
• Errors may arise as a result of different interpretation.Eg. How different pathologists interpret a histo-pathology slide.
KAPPA COEFFICENT & CORRELATION COEFFICIENT• Kappa coefficient is analogous to Pearson
correlation coefficient (or Spearman rank correlation coefficient) and
• has the same range of values (+1 to –1).
• However, it is better in several ways in identifying disagreement compared to Pearson or Spearman rank correlation.
WHY NOT USE CORRELATION COEFFICIENT?
1. Correlation coefficient is high for any linear relationship, not just when the first measurement equals the second measurement. If the second measurement is multiplied by 3, the correlation coefficient remains the same, although the measurements no longer agree.
2. The test of significance for the correlation coefficient uses the absence of relationship as the null hypothesis. This will invariably be rejected, since of course there is a relationship between the first and second measurements – even if they don’t agree with each other very well. The null hypothesis is rarely of clinical interest ( as variables being correlated usually have some relationship to each other).
Kappa Formula
• Kappa takes into account the probability that some agreement will occur by chance.
K = observed agreement - chance agreement
1- chance agreement
= (Po - Pe)/ (1 – Pe)
KAPPA COEFFICIENT (KAPPA STATISTIC)
Observer 1Pos Neg Mar. Total
Pos a b g1Observer 2
Neg c d g2
Marg. Total f1 f2 n
Reading 2 X 2 Table
a and d = No. of times observers agree.
b and c = No. of times observers disagree.
If no disagreements, then b and c = 0 and observed agreement, Po would be 1.
If no agreements, a and d = 0, and observed agreement, Po would be 0.
Kappa Formula
Whereby:
Po = a+d n
f1 X g1 + f2 X g2 n nPe = _________________
n
KAPPA interpretation
Range of possible values for Kappa = -1 to 1.
Poor agreement = < 0.2Fair agreement = 0.2 to 0.4Moderate agreement = 0.4 to 0.6Good agreement = 0.6 to 0.8Very good agreement = 0.8 to 1.0
It is rare that we get a perfect or negative agreement.
Kappa
Observer # 1 TotalPositive Negative
Obs. # 2Positive 40 10 50
Negative 20 30 50Total 60 40 100
Po = (40 +30)/100 = 0.7Pe = ((60 * 50)/100 + (40*50)/100))/100= (30 +20)/100
= 0.5Kappa = (0.7-0.5)/ (1-0.5) = 0.2/0.5 = 0.4
WEIGHTED KAPPA
• Use in Ordinal data. Ordinal data has an inherent order, such as pain is rated “none’, ‘mild’, “moderate’ or ‘severe”.
• A weight of “0” when the two raters are maximally apart, a weight of “1” when there is exact agreement, and weight proportionately spaced for in between intermediate levels of agreement.
• The formula is the same for normal kappa except that observed and expected agreement are summed not just along the the diagonal, but for the whole table, with each cell first multiplied by a weight for that cell.
Example: Weighted 0, 0.5, 1.0
Observer # 1Normal Mild Serious Total
Norm. 7 2 1 10#2
Mild 5 10 5 20
Serious 3 3 14 20
Total 15 15 20 50
Observed Agreement
Normal Mild Serious Total
Normal 7 10Mild 10 20Serious 14 20
Total 15 15 14 50Observed agreement = Po =((7+10+14)/50) = 0.62
Expected (Perfect) Agreement
Normal Mild Serious Total
Normal 3Mild 6Serious 8TotalExpected agreement = Pe = (3+6+8)/50 = 0.34
Observed weighted agreement
Normal Mild Serious Total
Normal 2Mild 5 5Serious 3Total
Partial agreement =(2+5+3+ 5)/50 = 15/50 = 30%But we give only ½ credit for this partial
agreement, so we get 30% * 0.5 = 15%
Expected partial agreement for weighted kappa
Normal Mild Serious Total
Normal 3
Mild 6 8
Serious 6
Total
Expected numbers= (3+6+6+8)/50 = 23/50= 0.46
Since we are giving ½ credit = 0.46 * 0.5 = 0.23
Weighted Kappa
Total observed agreement = 0.62+0.15 = 0.77
Total expected agreement = 0.34 + 0.23 = 0.57
Weighted Kappa = (0.77-0.57)/(1-0.57) = 0.465
Weighted values
For 3 categories:
Complete disagreement = 0 weighted.
Partial agreement = ½ weighted.
Complete agreement = 1 weighted.
For 4 categories = 0, 0.33, 0.67 and 1.0
Using SPSS to Compute Kappa
In variable view use Doctor 1 and Doctor2 as string variables. Count as numeric.
Data weight cases.. To weight for count.Select ANALYZE I DESCRIPTIVE STATISTICS I
CROSSTABS from the SPSS menu. In the dialog box, click on the STATISTICS button and then select the Kappa option box.
Note: Make sure your data are in the right column and rows.
Final notes
• Kappa should not be viewed as the unequivocal standard to assess rater agreement.
• Kappa value itself is influenced by chance. Confidence interval for kappa may be more informative.
• Kappa may not be reliable for rare observation. Kappa is affected by prevalence. For rare findings, very low kappa may not necessarily reflect low rate of overall agreement.
• Because it is affected by prevalence, it may not be appropriate to compare kappa between different studies or populations.
THANK YOU