A/B Test Results Calculator
Enter your control and variant traffic/conversions to estimate lift, p-value, and confidence interval for the conversion-rate difference.
How to use this A/B test calculator
This tool helps you evaluate whether a new page, ad, CTA, or onboarding flow actually outperforms your current version. You only need four numbers: visitors and conversions for both your control (A) and variant (B). The calculator then computes:
- Conversion rate for A and B
- Absolute lift (percentage-point difference)
- Relative lift (percent change versus control)
- Z-score and two-tailed p-value
- Confidence interval for the observed difference
What the output means
1) Conversion rate
Conversion rate is simply conversions divided by visitors. If A converts 500 out of 10,000, its conversion rate is 5%. This is the baseline for judging your experiment.
2) Absolute lift vs. relative lift
Absolute lift is the direct difference between rates (for example, 5.5% minus 5.0% equals +0.5 percentage points). Relative lift expresses that same change as a percentage of control (0.5 / 5.0 = +10%). Both matter: product teams often communicate relative lift, while finance teams prefer absolute impact.
3) P-value and statistical significance
The p-value estimates how likely it is to see a difference this large (or larger) if there is actually no true difference. If p-value is below your alpha threshold (for example, 0.05 at 95% confidence), the result is considered statistically significant. Significant does not automatically mean valuable; it only means the observed difference is unlikely due to random noise.
4) Confidence interval
The confidence interval gives a plausible range for the true conversion-rate difference. If the interval crosses zero, your result is statistically inconclusive. If the entire interval is above zero, your variant likely wins. If it is entirely below zero, your control likely wins.
A practical interpretation framework
- Statistical significance: Is the effect likely real?
- Effect size: Is the uplift large enough to matter for revenue or activation?
- Operational risk: Could rollout break another metric (retention, support load, quality)?
- Replicability: Would this result hold across segments, seasons, and channels?
Common A/B testing mistakes
Stopping the test too early
If you repeatedly peek and stop the first time p-value drops below 0.05, false positives increase. Set a minimum sample size or test duration before launch.
Ignoring business impact
A tiny uplift can be statistically significant with very large traffic but still not worth engineering or design effort. Always translate lift into expected monthly conversions and revenue.
Running too many variants without correction
Testing many versions at once inflates Type I error rates. If you test multiple hypotheses, use proper adjustments or a framework designed for multiple comparisons.
Uneven tracking quality
Any discrepancy in event tracking between A and B can create fake winners. Validate instrumentation before interpreting outcomes.
Example scenario
Suppose version A gets 12,000 visitors and 840 conversions (7.0%), while version B gets 11,850 visitors and 930 conversions (7.85%). That is an absolute lift of +0.85 percentage points and relative lift of about +12.1%. If your p-value is below 0.05 and the confidence interval is entirely above zero, this is a strong candidate for rollout.
Final takeaway
A/B testing is most useful when it is disciplined: clear hypothesis, clean instrumentation, enough sample size, and honest interpretation. Use this calculator to quickly evaluate your experiment, then pair the output with product context and business goals before deciding to ship.