power calculation in statistics - Aaron Graves, PhDude Replica

Statistical Power Calculator (Two-Group Mean Test)

Use Cohen’s d, per-group sample size, significance level, and tail type to estimate achieved power. You can also solve for the required sample size.

Effect size (Cohen's d)

Sample size per group (n)

Significance level (α)

Test type

Target power (for required n)

Enter values and click a button to compute.

What is statistical power?

In statistics, power is the probability that a test correctly detects a real effect when that effect truly exists. It is written as 1 − β, where β (beta) is the Type II error rate. If your study has 80% power, it means you have an 80% chance of finding a statistically significant result if the true effect size is the one you planned for.

Why power calculation matters

A power calculation is not a bureaucratic step. It directly affects study quality. Too little power means you can miss real effects. Too much power can waste money, time, and participant effort. Good power planning helps you design efficient studies and interpret null results more responsibly.

Underpowered study: high risk of false negatives (you miss real differences).
Overpowered study: may detect tiny differences that are statistically significant but practically unimportant.
Balanced design: appropriate sample size for your scientific question.

The four ingredients in a power calculation

1) Effect size

Effect size quantifies how large the phenomenon is. In this calculator, the effect size is Cohen’s d for a two-group mean comparison: difference in means divided by pooled standard deviation.

Small effect: around 0.2
Medium effect: around 0.5
Large effect: around 0.8

2) Significance level (α)

Alpha is your threshold for Type I error (false positive), often set to 0.05. Lower alpha reduces false positives but usually requires larger sample size to maintain power.

3) Sample size (n)

More observations provide more information and reduce uncertainty. As sample size increases, power usually increases as well.

4) Tail type (one-sided vs two-sided)

Two-sided tests split alpha across both tails and are more conservative. One-sided tests focus on one direction and can have higher power if that directional assumption is justified.

How this calculator works

This page uses a normal-approximation framework for a two-group comparison with equal group sizes. Internally, it computes a noncentrality-like quantity:

δ = d × √(n / 2)

Then it evaluates the rejection region defined by alpha and the selected tail option to estimate power. This is a practical planning approximation commonly used at the design stage.

Example: planning a study

Suppose you expect a medium effect size of d = 0.5, use α = 0.05, and plan a two-sided test. If you enter n = 64 per group, the calculator returns power near 0.80. That means your experiment is reasonably likely to detect the effect if the expected effect is real.

If your budget only allows n = 25 per group, power drops substantially. At that point, consider broadening recruitment, reducing measurement noise, or refining your intervention to increase effect size.

Interpreting the output correctly

Power is conditional: it depends on the effect size you assumed.
Power is not the chance your hypothesis is true: it is about test sensitivity under a specified true effect.
Post-hoc “observed power” is often misleading: use prospective planning whenever possible.

Common mistakes in power analysis

Using unrealistic effect sizes

If you assume an overly optimistic effect, you will underestimate required sample size. Use prior studies, pilot data, or minimally important clinical/practical effects.

Ignoring attrition or exclusions

If dropout is expected, inflate sample size before recruitment. For example, with 20% expected attrition, divide your required final sample by 0.8.

Confusing statistical and practical significance

Even tiny effects can be significant in very large samples. Always interpret p-values together with effect size and confidence intervals.

Quick rules of thumb

80% power is a common default; 90% may be preferred for confirmatory work.
Smaller alpha or smaller effects require larger sample sizes.
Two-sided tests usually need more participants than one-sided tests for the same target power.
Plan power before data collection, not after.

Final takeaway

Power calculation in statistics is about making your study credible and efficient. It helps you answer: “How many observations do I need to reliably detect an effect that matters?” Use the calculator above to run quick planning scenarios, then refine assumptions based on subject-matter expertise, pilot evidence, and study constraints.