False Discovery Rate Calculator (Benjamini-Hochberg / Benjamini-Yekutieli)
Paste p-values (comma, space, or newline separated), choose a target FDR level, and calculate significant tests while controlling the expected false discovery rate.
| Rank | Test # | p-value | Critical Value | Adjusted p-value | Significant? |
|---|
What this FDR calculator does
This tool helps you control the false discovery rate (FDR) when you run many hypothesis tests at once. Instead of treating every p-value in isolation, FDR procedures evaluate the full set of tests together and estimate how many “discoveries” may be false positives on average.
In practical terms, if you set FDR to 0.05, you are targeting a situation where roughly 5% of the results you call significant could be expected to be false discoveries, on average, over repeated studies.
False discovery rate in plain English
Multiple testing is tricky. If you run 1 test at a 5% significance level, false positives are relatively controlled. If you run 1,000 tests, you can get many “significant” findings just by chance. FDR methods are designed for exactly this setting.
- Family-wise error rate (FWER) methods (like Bonferroni) focus on preventing even one false positive.
- FDR methods allow more power and focus on controlling the proportion of false positives among discoveries.
- This balance is especially useful in genomics, A/B testing at scale, neuroimaging, and high-dimensional research.
How the Benjamini-Hochberg method works
Step 1: Sort p-values from smallest to largest
Suppose you have m tests. Rank them so that p(1) ≤ p(2) ≤ ... ≤ p(m).
Step 2: Build the critical line
For each rank i, compute a critical value: (i / m) × q, where q is your chosen FDR level. For Benjamini-Yekutieli, the threshold is adjusted by a harmonic factor, making it stricter.
Step 3: Find the largest passing rank
Identify the largest rank k such that p(k) ≤ critical(k). Then declare ranks 1 through k significant. The calculator also reports adjusted p-values (often called q-values in software outputs).
Benjamini-Hochberg vs Benjamini-Yekutieli
You can choose between two popular procedures:
- Benjamini-Hochberg (BH): more powerful, widely used, assumes independent or positively dependent tests.
- Benjamini-Yekutieli (BY): robust under arbitrary dependence, but more conservative.
If your tests are strongly dependent and you want a safer upper bound, BY may be appropriate. If not, BH is often the practical default.
How to use this calculator effectively
- Enter all p-values from a coherent family of hypotheses.
- Set your FDR target (common choices are 0.10, 0.05, or 0.01).
- Choose BH or BY based on your dependence assumptions.
- Review the ranked table and the “Significant?” column.
Worked example
Imagine 40 simultaneous tests in an experiment. At q = 0.05, BH might mark 9 tests significant. That does not mean exactly 0.45 false findings in your current dataset; it means the expected proportion of false discoveries among selected results is controlled in repeated sampling under method assumptions.
If you switch to BY, you may see fewer significant results because the threshold is stricter. This tradeoff is normal: stronger error protection usually costs statistical power.
Choosing a good q-value
Common defaults
- q = 0.05: balanced in many scientific settings.
- q = 0.10: exploratory studies where missing true effects is costly.
- q = 0.01: high-stakes confirmatory analyses.
Decision context matters
There is no universally correct q. Choose it based on domain risk: clinical decisions and safety systems often require stricter thresholds, while discovery-phase screening may use a more permissive q with follow-up validation.
Common mistakes to avoid
- Interpreting FDR as the probability that one specific significant result is false.
- Applying correction separately to arbitrary subgroups just to increase significance count.
- Mixing one-sided and two-sided p-values inconsistently within the same family.
- Ignoring study design problems and relying on correction as a fix-all.
Quick FAQ
Do adjusted p-values replace raw p-values?
They complement raw p-values. Raw p-values reflect single-test evidence; adjusted p-values reflect that evidence under multiplicity control.
Can I use this for thousands of tests?
Yes. The method scales well. For very large lists, consider exporting from your analysis pipeline and pasting directly into the calculator.
Is this a substitute for preregistration or robust design?
No. FDR correction helps with multiplicity but does not solve bias, poor measurement, or model misspecification.
Bottom line
A good FDR calculator makes multiple-testing decisions transparent and reproducible. Use it as part of a full analytical workflow: clear hypotheses, quality data, appropriate models, and explicit reporting of correction method and q threshold.