f1 calculator

F1 Score Calculator

Calculate the F1 score for a classification model using either confusion matrix counts (true positives, false positives, false negatives) or direct precision/recall values.

Calculation Method

From TP, FP, FN From Precision & Recall

True Positives (TP)

False Positives (FP)

False Negatives (FN)

Precision = TP/(TP+FP), Recall = TP/(TP+FN), F1 = 2PR/(P+R)

What Is the F1 Score?

The F1 score is a single metric that combines precision and recall into one number. It is the harmonic mean of the two, which means it only becomes high when both precision and recall are high.

In practical terms, the F1 score answers this question: How well does my model balance catching positives and avoiding false alarms?

Why People Use an F1 Calculator

Accuracy can be misleading when classes are imbalanced. If only 1% of records are positive, a model that predicts “negative” for everything can still be 99% accurate and yet completely useless. F1 helps you evaluate meaningful positive-class performance.

Precision-focused domains: fraud detection, spam filtering, moderation
Recall-focused domains: disease screening, safety alerts, risk monitoring
Balanced objective: F1 is useful when both errors matter

F1 Formula

F1 = 2 × (Precision × Recall) / (Precision + Recall)

Core definitions

Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
TP: correctly predicted positives
FP: predicted positive but actually negative
FN: predicted negative but actually positive

How to Use This Calculator

Method 1: From TP, FP, and FN

Enter the three count values from your confusion matrix. The calculator automatically computes precision, recall, and F1 score.

Method 2: From Precision and Recall

If you already have precision and recall values, switch to the second mode and enter them directly as decimals between 0 and 1.

Worked Example

Suppose your classifier returns:

TP = 90
FP = 30
FN = 10

Then:

Precision = 90 / (90 + 30) = 0.75
Recall = 90 / (90 + 10) = 0.90
F1 = 2 × (0.75 × 0.90) / (0.75 + 0.90) = 0.8182

This indicates strong balance, though precision still has room to improve.

Interpreting F1 Score Values

0.90 to 1.00: excellent balance
0.80 to 0.89: strong, usually production-ready with context
0.70 to 0.79: decent baseline, often tunable
Below 0.70: likely needs threshold tuning, feature work, or data quality improvements

There is no universal “good” threshold. What matters is the business or research context and the cost of FP vs FN.

F1 vs Accuracy, Precision, and Recall

Accuracy

Great for balanced datasets; can fail on rare-event problems.

Precision

High precision means your positive predictions are trustworthy.

Recall

High recall means you miss fewer true positive cases.

F1

Best when you need one balanced metric for positive-class performance.

Common Pitfalls

Using F1 alone without checking confusion matrix and per-class behavior
Comparing F1 across datasets with very different class prevalence
Ignoring threshold effects (F1 can change dramatically with decision threshold)
Forgetting macro/micro/weighted averaging in multi-class problems

Beyond Standard F1

F-beta Score

F-beta lets you weight recall more (beta > 1) or precision more (beta < 1). This is useful when your domain has asymmetric error costs.

Macro, Micro, and Weighted F1

Macro F1: treats each class equally
Micro F1: aggregates all decisions globally
Weighted F1: averages by class support

Final Thoughts

An F1 calculator is a practical tool for model evaluation, especially when positive-class performance matters more than overall accuracy. Use it with confusion matrices, threshold analysis, and domain-specific costs to make smarter decisions about model quality.