cohen's kappa calculator - Aaron Graves, PhDude Replica

Cohen's Kappa Calculator (2×2 Agreement Table)

Enter the number of items in each cell. This tool computes observed agreement, expected agreement, and Cohen's kappa.

a: Both raters said "Positive"

b: Rater A Positive, Rater B Negative

c: Rater A Negative, Rater B Positive

d: Both raters said "Negative"

Tip: counts must be non-negative numbers. Total items must be greater than 0.

What is Cohen's Kappa?

Cohen's kappa is a statistic that measures agreement between two raters while correcting for agreement that could happen by chance. It is commonly used in psychology, medicine, machine learning labeling, and content analysis where two people classify the same items into categories.

If you only use raw percent agreement, your reliability estimate can be misleading, especially when one category is very common. Kappa addresses this by comparing observed agreement to chance-expected agreement.

Formula

κ = (P_o − P_e) / (1 − P_e)

P_o = observed agreement proportion
P_e = expected agreement by chance from marginals

Interpretation basics: κ = 1 means perfect agreement, κ = 0 means chance-level agreement, and κ < 0 indicates agreement worse than chance.

How to Use This Calculator

Build a 2×2 table from your two raters' classifications.
Enter the four counts: a, b, c, and d.
Click Calculate Kappa.
Review total sample size, observed agreement, expected agreement, kappa, and interpretation.

Example

Suppose two reviewers evaluate 100 abstracts for inclusion in a study. They agree on 86 abstracts and disagree on 14. The calculator might return κ around 0.71, which is usually interpreted as substantial agreement.

Common Interpretation Scale (Landis & Koch)

< 0.00: Poor
0.00–0.20: Slight
0.21–0.40: Fair
0.41–0.60: Moderate
0.61–0.80: Substantial
0.81–1.00: Almost perfect

These cutoffs are popular but not universal. In high-stakes fields, stricter thresholds may be expected.

Important Notes and Pitfalls

1) Prevalence effect

When one category dominates (for example, 95% "negative"), kappa can be lower than expected even with high percent agreement.

2) Bias effect

If raters use categories at different rates, kappa can also be pulled downward.

3) Use the right version of kappa

This page computes the classic Cohen's kappa for two raters and nominal categories (binary table input). For ordered categories, weighted kappa is often more appropriate.

When to Report More Than Kappa

Best practice is to report:

Raw percent agreement
Cohen's kappa
The confusion/agreement table itself
Contextual explanation of class imbalance

Showing all these numbers helps readers judge reliability more accurately than any single metric alone.