A/B Test Sample Size Calculator
Use this calculator to estimate how many users you need per variation before calling an A/B test. This follows the same core approach popularized by Evan Miller: baseline conversion rate + minimum detectable effect + significance + power.
What this Evan Miller sample size calculator is doing
When teams run A/B tests, the biggest mistake is often stopping too early. A result can look “promising” after a day or two, but still be mostly noise. This calculator helps you avoid that trap by estimating the minimum users needed in each variant before comparing outcomes.
In practical terms, this tool answers one question: “How much traffic do I need so I can trust my test result?”
Why sample size matters in experimentation
- Too small a sample: high chance of false positives and unstable lifts.
- Right-sized sample: better confidence in real performance differences.
- Too large a sample: slower decisions and wasted opportunity cost.
A good plan balances speed with reliability. That’s exactly what significance and power are for.
Inputs explained
1) Baseline conversion rate
Your current control performance. If your landing page converts 8 out of 100 visitors, baseline is 8%.
2) Minimum Detectable Effect (MDE)
The smallest lift that matters to your business. If baseline is 10% and MDE is 20%, your variant target is 12% (10% × 1.20).
3) Significance level (alpha)
Usually 5%. This controls Type I error (false positive risk).
4) Statistical power
Usually 80% or 90%. This controls Type II error (missing a real effect).
Formula used (two-sided test, two proportions)
The calculator uses the standard normal approximation for two-proportion tests:
n per group = ((z1-α/2 √(2p̄(1-p̄)) + zpower √(p1(1-p1) + p2(1-p2)))²) / (p2 - p1)²
Where:
- p1 = baseline conversion rate
- p2 = expected variant rate after MDE lift
- p̄ = (p1 + p2) / 2
How to use this in the real world
- Pick a realistic MDE based on business value, not wishful thinking.
- Run your test until each variant reaches the estimated sample size.
- Avoid peeking and repeatedly stopping/restarting.
- Check secondary metrics (revenue per visitor, retention, quality).
Example
Suppose baseline conversion is 10%, MDE is 15%, alpha is 5%, and power is 80%. You may need several thousand users per variant before a reliable read. If your site gets limited traffic, either accept a larger MDE or run longer.
Common pitfalls
- Declaring winners after only a few hundred sessions.
- Changing targeting rules mid-test.
- Using post-hoc segments without correction.
- Ignoring practical significance (small but statistically significant lift).
Final note
This calculator is a planning tool, not a guarantee. Real experiments can have seasonality, novelty effects, and tracking issues. Use this estimate to set expectations, then run disciplined tests and validate implementation quality before shipping changes.