ab test statistical significance calculator - Aaron Graves, PhDude Replica

A/B Test Statistical Significance Calculator

Compare two conversion rates using a standard two-tailed z-test for proportions.

Variant A (Control)

Visitors / Users in A

Conversions in A

Variant B (Treatment)

Visitors / Users in B

Conversions in B

Confidence Level

Assumes independent samples and a fixed-horizon test (not continuous peeking).

What this calculator tells you

An A/B test can show a difference in raw conversion rates, but that difference might be random noise. This calculator helps answer the practical question: is Variant B truly different from Variant A, or could this happen by chance?

Conversion rate for each variant
Absolute uplift (percentage-point difference)
Relative lift (%)
Z-score and p-value
Whether the result is statistically significant at your chosen confidence level
Confidence interval for the conversion-rate difference

How the significance test works

1) Estimate conversion rates

For each group, conversion rate is:

rate = conversions / visitors

2) Build the null hypothesis

The null hypothesis says both variants convert at the same true underlying rate. Under that assumption, we pool the data to estimate a common conversion rate.

p_pool = (conversionsA + conversionsB) / (visitorsA + visitorsB)

3) Compute z-score

The z-score measures how far apart the observed rates are, relative to expected random variation:

z = (rateB - rateA) / sqrt(p_pool * (1 - p_pool) * (1/nA + 1/nB))

4) Convert z-score to p-value

The p-value is the probability of seeing a difference this extreme (or more extreme) if the null hypothesis were true. A low p-value means the observed gap is unlikely to be random.

How to interpret the output correctly

Significant + positive uplift: B is likely better than A.
Significant + negative uplift: B is likely worse than A.
Not significant: you do not have strong enough evidence yet; this does not prove A and B are equal.

Also check the confidence interval. If it includes zero, the real difference could still be zero (or opposite direction), even if point estimates look promising.

Common mistakes this page helps you avoid

Stopping too early

Early results are noisy. If you stop when numbers “look good,” false positives increase dramatically.

Calling every uplift a win

Even a 10% relative lift can be non-significant with small sample sizes. Statistical significance and practical significance are different checks—you need both.

Ignoring data quality

Tracking bugs, bot traffic, mismatched audiences, and uneven assignment can invalidate test results before statistics even begin.

Practical A/B testing checklist

Define one primary metric before launching.
Estimate required sample size before test start.
Run long enough to include normal traffic cycles (weekday/weekend patterns).
Keep assignment random and balanced.
Avoid changing experiment rules mid-test.
After significance, verify business impact (revenue, retention, not just clicks).

FAQ

Is 95% confidence always required?

95% is a common default, but not universal. In low-risk UI tests you may use 90%; for high-stakes decisions you might use 99%.

Can I use this for click-through rate, signup rate, and purchase rate?

Yes—any binary conversion metric (converted / not converted) is a good fit.

Does significance mean high business value?

No. A tiny improvement can be statistically significant with large samples but still not worth implementation cost. Always evaluate effect size and expected ROI.

Bottom line

This AB test statistical significance calculator gives you a fast, transparent way to evaluate experiment outcomes. Use it to support stronger product decisions—but pair it with good experimental design, clean data, and a realistic view of business impact.