err calculator - Aaron Graves, PhDude Replica

Expected Reciprocal Rank (ERR) Calculator

Calculate ERR from a ranked list of graded relevance labels. This is commonly used in search quality evaluation and learning-to-rank experiments.

Maximum relevance grade (g_max)

Typical values: 3, 4, or 5

Cutoff rank (k, optional)

Use this to compute ERR@k

Graded relevance list (comma or space separated)

Example scale with g_max=3: 0 = not relevant, 3 = highly relevant

What is an ERR calculator?

An ERR calculator computes Expected Reciprocal Rank, a ranking metric designed to estimate user satisfaction as they scan a list of results from top to bottom. ERR gives more credit when highly relevant results appear earlier, and less credit when users likely need to keep searching.

Unlike simple top-1 accuracy, ERR works with graded relevance labels (for example 0, 1, 2, 3). That makes it useful for search engines, recommendation systems, marketplace ranking, and any model where some results are “good,” others are “okay,” and others are irrelevant.

ERR formula (intuitive view)

R(g) = (2^g - 1) / 2^gmax
ERR = Σ[r=1..k] (1 / r) × P(user reaches rank r) × R(g_r)
P(user reaches rank r) = Π[i=1..r-1] (1 - R(g_i))

In plain language: at each rank, the user has a chance of being satisfied by that result. If they are not satisfied, they continue. ERR combines those probabilities and discounts lower-ranked positions by 1/r.

How to use this ERR calculator

Enter your maximum relevance grade (the top of your label scale).
Paste your ranked relevance labels in order from rank 1 onward.
Optionally set a cutoff rank to compute ERR@k.
Click Calculate ERR to see the score and rank-by-rank contribution breakdown.

Input example

If your ranking labels are 3,2,3,0,1,2 and your scale is 0–3, then this tool computes user satisfaction probability per rank and returns total ERR, plus each row’s contribution.

Interpreting ERR scores

ERR ranges from 0 to 1. Higher is better.

Near 1.0: Users usually find very relevant content early.
Mid-range: Some useful content appears, but often not early enough.
Near 0: Top ranks fail to satisfy users.

Because ERR is probability-based and rank-sensitive, even small improvements at the top can matter a lot.

ERR vs. MRR vs. NDCG

ERR

Best when you want a user-behavior-inspired metric with graded relevance and diminishing continuation probability.

MRR

Focuses on the first relevant hit only. Great for single-answer tasks, but ignores graded relevance after that first hit.

NDCG

Also supports graded relevance and position discounting, widely used in ranking competitions and production systems.

Common ERR mistakes

Mixing label scales across experiments (for example, using 0–3 in one run and 0–4 in another without adjustment).
Comparing ERR@10 and ERR@20 as if they are directly equivalent.
Passing unsorted labels (ERR assumes labels are in ranked order).
Confusing “ERR” with “Error Rate Reduction” in classification contexts.

Practical tips for better ranking evaluation

Track ERR alongside NDCG and business metrics (CTR, conversion, retention).
Evaluate by query segment (head, torso, tail) to reveal model weaknesses.
Use confidence intervals or repeated sampling for stable comparisons.
Inspect per-rank contributions to debug whether your model fails early or late.

Final takeaway

If you care about how quickly users find satisfying results, ERR is a strong metric. Use the calculator above during offline evaluation, experiment reviews, and model iteration to spot gains where they matter most: the top of the list.