confusion matrix calculator - Aaron Graves, PhDude Replica

Interactive Confusion Matrix Calculator

Enter your model outcomes below (counts, not percentages). Then click Calculate Metrics.

True Positives (TP)

False Positives (FP)

False Negatives (FN)

True Negatives (TN)

What is a confusion matrix?

A confusion matrix is a compact table used to evaluate a classification model. It compares predicted labels against actual labels and tells you not only how often the model is correct, but how it is wrong. That distinction is crucial when mistakes have different real-world costs.

For binary classification, the matrix has four cells:

True Positive (TP): Model predicted positive, and the truth is positive.
False Positive (FP): Model predicted positive, but the truth is negative.
False Negative (FN): Model predicted negative, but the truth is positive.
True Negative (TN): Model predicted negative, and the truth is negative.

Why a confusion matrix is better than accuracy alone

Accuracy can be misleading, especially with imbalanced datasets. If only 1% of cases are positive, a naive model can predict “negative” for everything and still get 99% accuracy while being useless for detecting the positive class. The confusion matrix exposes that failure immediately by showing high FN and zero TP.

This is why data scientists, analysts, ML engineers, and product teams rely on derived metrics such as precision, recall, specificity, F1 score, and MCC rather than accuracy alone.

How to use this calculator

Step 1: Enter TP, FP, FN, TN

Use integer counts from your validation set, test set, A/B experiment, or production monitoring logs. All values must be zero or greater.

Step 2: Click calculate

The tool computes the key classification metrics and presents them in percentage and decimal forms where appropriate.

Step 3: Interpret metrics by business objective

Medical screening: prioritize high recall (low false negatives).
Spam filtering: prioritize precision (avoid flagging valid emails).
Fraud detection: optimize balance between recall and precision, often with F1 and PR curves.
Risk scoring: monitor specificity and false positive burden to manage operational cost.

Metric formulas used in this confusion matrix calculator

Accuracy = (TP + TN) / (TP + TN + FP + FN) Precision = TP / (TP + FP) Recall (Sensitivity, TPR) = TP / (TP + FN) Specificity (TNR) = TN / (TN + FP) F1 Score = 2TP / (2TP + FP + FN) NPV = TN / (TN + FN) FPR = FP / (FP + TN) FNR = FN / (FN + TP) Balanced Accuracy = (Recall + Specificity) / 2 MCC = (TP×TN − FP×FN) / √[(TP+FP)(TP+FN)(TN+FP)(TN+FN)]

If a denominator is zero, the corresponding metric is shown as N/A. That does not mean the model is broken; it means the metric is undefined for that specific outcome distribution.

Practical interpretation tips

1) Precision vs Recall trade-off

Increasing recall usually lowers precision, and vice versa. Moving the classification threshold changes this trade-off. Always optimize for the mistake type that costs you more.

2) Use F1 when classes are imbalanced

F1 combines precision and recall into one score and is useful when positives are rare and both FP and FN matter.

3) Consider MCC for robust summary

Matthews Correlation Coefficient (MCC) is often a better single-number summary than accuracy in imbalanced settings because it uses all four cells of the confusion matrix.

4) Track prevalence and base rates

Even good models can look bad (or vice versa) if base rates shift over time. Monitor confusion matrix metrics over rolling windows in production.

Common mistakes to avoid

Reporting only accuracy without class balance context.
Mixing up rows and columns in the confusion matrix orientation.
Comparing metrics across different thresholds without stating the threshold.
Evaluating on training data instead of holdout or out-of-time data.
Ignoring confidence intervals and sample size effects.

Quick FAQ

Is this calculator for binary classification only?

Yes. For multi-class tasks, compute one-vs-rest confusion matrices per class and then aggregate using macro/micro averaging.

Can I enter decimals?

Yes, though counts are usually integers. Decimal values can appear in weighted or averaged confusion matrices.

What metric should I optimize first?

Start with the metric tied most directly to your business risk. For life-critical systems, missed positives (FN) are often costly; for user experience systems, false alarms (FP) may dominate.

Final thought

A confusion matrix is one of the simplest and most powerful diagnostics in machine learning. Use it early in model development and continuously in production monitoring. The right metric at the right threshold can make a model truly useful—not just statistically impressive.