ab test results calculator - Aaron Graves, PhDude Replica

A/B Test Results Calculator

Enter your control and variant traffic/conversions to estimate lift, p-value, and confidence interval for the conversion-rate difference.

Control visitors (A)

Control conversions (A)

Variant visitors (B)

Variant conversions (B)

Confidence level

How to use this A/B test calculator

This tool helps you evaluate whether a new page, ad, CTA, or onboarding flow actually outperforms your current version. You only need four numbers: visitors and conversions for both your control (A) and variant (B). The calculator then computes:

Conversion rate for A and B
Absolute lift (percentage-point difference)
Relative lift (percent change versus control)
Z-score and two-tailed p-value
Confidence interval for the observed difference

What the output means

1) Conversion rate

Conversion rate is simply conversions divided by visitors. If A converts 500 out of 10,000, its conversion rate is 5%. This is the baseline for judging your experiment.

2) Absolute lift vs. relative lift

Absolute lift is the direct difference between rates (for example, 5.5% minus 5.0% equals +0.5 percentage points). Relative lift expresses that same change as a percentage of control (0.5 / 5.0 = +10%). Both matter: product teams often communicate relative lift, while finance teams prefer absolute impact.

3) P-value and statistical significance

The p-value estimates how likely it is to see a difference this large (or larger) if there is actually no true difference. If p-value is below your alpha threshold (for example, 0.05 at 95% confidence), the result is considered statistically significant. Significant does not automatically mean valuable; it only means the observed difference is unlikely due to random noise.

4) Confidence interval

The confidence interval gives a plausible range for the true conversion-rate difference. If the interval crosses zero, your result is statistically inconclusive. If the entire interval is above zero, your variant likely wins. If it is entirely below zero, your control likely wins.

A practical interpretation framework

Statistical significance: Is the effect likely real?
Effect size: Is the uplift large enough to matter for revenue or activation?
Operational risk: Could rollout break another metric (retention, support load, quality)?
Replicability: Would this result hold across segments, seasons, and channels?

Common A/B testing mistakes

Stopping the test too early

If you repeatedly peek and stop the first time p-value drops below 0.05, false positives increase. Set a minimum sample size or test duration before launch.

Ignoring business impact

A tiny uplift can be statistically significant with very large traffic but still not worth engineering or design effort. Always translate lift into expected monthly conversions and revenue.

Running too many variants without correction

Testing many versions at once inflates Type I error rates. If you test multiple hypotheses, use proper adjustments or a framework designed for multiple comparisons.

Uneven tracking quality

Any discrepancy in event tracking between A and B can create fake winners. Validate instrumentation before interpreting outcomes.

Example scenario

Suppose version A gets 12,000 visitors and 840 conversions (7.0%), while version B gets 11,850 visitors and 930 conversions (7.85%). That is an absolute lift of +0.85 percentage points and relative lift of about +12.1%. If your p-value is below 0.05 and the confidence interval is entirely above zero, this is a strong candidate for rollout.

Final takeaway

A/B testing is most useful when it is disciplined: clear hypothesis, clean instrumentation, enough sample size, and honest interpretation. Use this calculator to quickly evaluate your experiment, then pair the output with product context and business goals before deciding to ship.