ab test duration calculator - Aaron Graves, PhDude Replica

Baseline conversion rate (%)

Minimum detectable uplift (% relative)

Total daily visitors in experiment

Traffic allocated to variant (%)

Confidence level

Statistical power

Assumes a two-sided test for conversion rate with independent users and stable traffic quality.

Why an A/B test duration calculator matters

One of the biggest mistakes in experimentation is stopping a test too early. A short test can make random noise look like a real win, and that leads teams to ship changes that do not actually improve performance. A proper duration estimate helps you avoid false positives and gives your team confidence when results are finally called.

This calculator estimates how long your test should run by combining six key inputs: baseline conversion rate, minimum detectable effect (MDE), daily traffic, allocation split, confidence level, and statistical power. The output includes required sample size by group and estimated days to completion.

How this calculator works

1) Define the baseline and target lift

Start from your current conversion rate. Then choose the smallest relative improvement worth detecting. For example, if baseline is 5% and your MDE is 10%, your variant target is 5.5%.

2) Choose confidence and power

Confidence level controls Type I error (false positives). 95% is a common default.
Power controls Type II error (missed real effects). 80% is common, 90% is stricter.

3) Convert sample size into time

Once sample size is estimated, we divide by your daily eligible traffic (and traffic split) to estimate runtime. If your split is not 50/50, test duration usually increases because statistical efficiency drops.

Input guidance

Baseline conversion rate: Use recent stable data, not an unusually good or bad week.
MDE: Smaller MDE means longer tests. Pick a business-meaningful lift, not a vanity target.
Daily visitors: Use users who can actually be randomized and tracked correctly.
Variant allocation: 50/50 is fastest for two variants; imbalanced splits slow things down.
Confidence/power: More stringent settings increase required sample size.

Practical interpretation of results

Treat the calculator output as a planning baseline, not a rigid promise. Real tests face variability: day-of-week patterns, campaign spikes, technical outages, attribution delays, and novelty effects. A good rule is to run for at least two full business cycles (often 14 days) and complete whole weeks.

A quick example

Suppose baseline conversion is 5%, MDE is 10% relative, confidence is 95%, power is 80%, and daily traffic is 5,000 with a 50/50 split. You may need several tens of thousands of users per arm, which often translates to a few weeks of runtime. If you reduce MDE to 5%, required duration can grow dramatically.

Common A/B testing mistakes to avoid

Peeking daily and stopping when p-value first dips under threshold.
Changing targeting rules mid-test without restarting analysis.
Ignoring sample ratio mismatch or tracking quality issues.
Running many metrics without a clear primary success metric.
Underestimating seasonality and day-of-week effects.

Final checklist before launch

Primary metric and guardrails clearly defined.
MDE tied to business value (revenue, leads, retention, etc.).
Expected runtime acceptable to stakeholders.
Randomization and analytics QA completed.
Decision rules pre-committed before data collection starts.

Use this calculator to set realistic expectations early, align teams, and avoid rushed decisions. Better planning means cleaner evidence—and better product decisions.