F1 Score Calculator
Calculate the F1 score for a classification model using either confusion matrix counts (true positives, false positives, false negatives) or direct precision/recall values.
What Is the F1 Score?
The F1 score is a single metric that combines precision and recall into one number. It is the harmonic mean of the two, which means it only becomes high when both precision and recall are high.
In practical terms, the F1 score answers this question: How well does my model balance catching positives and avoiding false alarms?
Why People Use an F1 Calculator
Accuracy can be misleading when classes are imbalanced. If only 1% of records are positive, a model that predicts “negative” for everything can still be 99% accurate and yet completely useless. F1 helps you evaluate meaningful positive-class performance.
- Precision-focused domains: fraud detection, spam filtering, moderation
- Recall-focused domains: disease screening, safety alerts, risk monitoring
- Balanced objective: F1 is useful when both errors matter
F1 Formula
F1 = 2 × (Precision × Recall) / (Precision + Recall)
Core definitions
- Precision = TP / (TP + FP)
- Recall = TP / (TP + FN)
- TP: correctly predicted positives
- FP: predicted positive but actually negative
- FN: predicted negative but actually positive
How to Use This Calculator
Method 1: From TP, FP, and FN
Enter the three count values from your confusion matrix. The calculator automatically computes precision, recall, and F1 score.
Method 2: From Precision and Recall
If you already have precision and recall values, switch to the second mode and enter them directly as decimals between 0 and 1.
Worked Example
Suppose your classifier returns:
- TP = 90
- FP = 30
- FN = 10
Then:
- Precision = 90 / (90 + 30) = 0.75
- Recall = 90 / (90 + 10) = 0.90
- F1 = 2 × (0.75 × 0.90) / (0.75 + 0.90) = 0.8182
This indicates strong balance, though precision still has room to improve.
Interpreting F1 Score Values
- 0.90 to 1.00: excellent balance
- 0.80 to 0.89: strong, usually production-ready with context
- 0.70 to 0.79: decent baseline, often tunable
- Below 0.70: likely needs threshold tuning, feature work, or data quality improvements
There is no universal “good” threshold. What matters is the business or research context and the cost of FP vs FN.
F1 vs Accuracy, Precision, and Recall
Accuracy
Great for balanced datasets; can fail on rare-event problems.
Precision
High precision means your positive predictions are trustworthy.
Recall
High recall means you miss fewer true positive cases.
F1
Best when you need one balanced metric for positive-class performance.
Common Pitfalls
- Using F1 alone without checking confusion matrix and per-class behavior
- Comparing F1 across datasets with very different class prevalence
- Ignoring threshold effects (F1 can change dramatically with decision threshold)
- Forgetting macro/micro/weighted averaging in multi-class problems
Beyond Standard F1
F-beta Score
F-beta lets you weight recall more (beta > 1) or precision more (beta < 1). This is useful when your domain has asymmetric error costs.
Macro, Micro, and Weighted F1
- Macro F1: treats each class equally
- Micro F1: aggregates all decisions globally
- Weighted F1: averages by class support
Final Thoughts
An F1 calculator is a practical tool for model evaluation, especially when positive-class performance matters more than overall accuracy. Use it with confusion matrices, threshold analysis, and domain-specific costs to make smarter decisions about model quality.