Cohen's Kappa Calculator (2×2 Agreement Table)
Enter the number of items in each cell. This tool computes observed agreement, expected agreement, and Cohen's kappa.
What is Cohen's Kappa?
Cohen's kappa is a statistic that measures agreement between two raters while correcting for agreement that could happen by chance. It is commonly used in psychology, medicine, machine learning labeling, and content analysis where two people classify the same items into categories.
If you only use raw percent agreement, your reliability estimate can be misleading, especially when one category is very common. Kappa addresses this by comparing observed agreement to chance-expected agreement.
Formula
κ = (Po − Pe) / (1 − Pe)
- Po = observed agreement proportion
- Pe = expected agreement by chance from marginals
Interpretation basics: κ = 1 means perfect agreement, κ = 0 means chance-level agreement, and κ < 0 indicates agreement worse than chance.
How to Use This Calculator
- Build a 2×2 table from your two raters' classifications.
- Enter the four counts: a, b, c, and d.
- Click Calculate Kappa.
- Review total sample size, observed agreement, expected agreement, kappa, and interpretation.
Example
Suppose two reviewers evaluate 100 abstracts for inclusion in a study. They agree on 86 abstracts and disagree on 14. The calculator might return κ around 0.71, which is usually interpreted as substantial agreement.
Common Interpretation Scale (Landis & Koch)
- < 0.00: Poor
- 0.00–0.20: Slight
- 0.21–0.40: Fair
- 0.41–0.60: Moderate
- 0.61–0.80: Substantial
- 0.81–1.00: Almost perfect
These cutoffs are popular but not universal. In high-stakes fields, stricter thresholds may be expected.
Important Notes and Pitfalls
1) Prevalence effect
When one category dominates (for example, 95% "negative"), kappa can be lower than expected even with high percent agreement.
2) Bias effect
If raters use categories at different rates, kappa can also be pulled downward.
3) Use the right version of kappa
This page computes the classic Cohen's kappa for two raters and nominal categories (binary table input). For ordered categories, weighted kappa is often more appropriate.
When to Report More Than Kappa
Best practice is to report:
- Raw percent agreement
- Cohen's kappa
- The confusion/agreement table itself
- Contextual explanation of class imbalance
Showing all these numbers helps readers judge reliability more accurately than any single metric alone.