Wilson Score (Evan Miller) Calculator
Use this tool to rank items by a conservative estimate of quality. It takes upvotes and downvotes, then returns the lower-bound score at your chosen confidence level.
Tip: Higher confidence gives a stricter (more conservative) score.
What is the Evan Miller calculator?
The “Evan Miller calculator” typically refers to the Wilson score lower bound method popularized by Evan Miller for ranking items with positive and negative votes. Instead of sorting by simple average (upvotes / total votes), it computes a cautious score that protects against small sample sizes.
This matters because a new item with 5/5 upvotes should not instantly outrank an older item with 870/1000 upvotes. The Wilson lower bound solves that by asking: “What is the minimum likely quality of this item, given uncertainty?”
Why simple averages are misleading
- Small sample inflation: 100% from 2 votes is weak evidence.
- No uncertainty penalty: Basic averages treat 2 votes and 2,000 votes too similarly.
- Ranking instability: New items jump around dramatically without enough data.
How this calculator works
Inputs
- Upvotes: Count of positive ratings.
- Downvotes: Count of negative ratings.
- Confidence level: Usually 90%, 95%, or 99%.
Output
The tool returns the Wilson lower-bound score, a conservative probability estimate (from 0 to 1, and shown as a percentage). Higher values are better and safer for ranking.
Formula used
score = (p̂ + z²/(2n) - z * √((p̂(1-p̂) + z²/(4n))/n)) / (1 + z²/n)
Where p̂ = upvotes / n, n = upvotes + downvotes, and z comes from your confidence level.
Practical interpretation
If two posts both show a raw 90% approval, the one with more total votes usually gets a higher Wilson lower bound because we are more certain about its quality. In production systems, this produces rankings that feel fairer and less noisy.
When to use this method
- Product reviews with thumbs up/down
- Forum post ranking
- Comment moderation queues
- Internal quality voting systems
- Any binary feedback dataset
Best practices
1) Pick a confidence level and keep it consistent
95% is common for general ranking. 99% is stricter and penalizes low-volume items more heavily.
2) Show both score and vote counts
Users trust rankings more when they can see sample size alongside the quality metric.
3) Recompute after each vote event
Because Wilson is lightweight, it can be updated in real time for modern applications.
Limitations to know
- It assumes binary outcomes (positive/negative).
- It does not model time decay by itself.
- It is a ranking metric, not a full causal truth signal.
Final takeaway
The Evan Miller approach gives you a ranking score that respects both quality and certainty. If you manage a review site, social feed, or voting tool, this is one of the simplest high-impact upgrades you can make.