ceph erasure coding calculator

Ceph Erasure Coding Capacity Calculator

Estimate usable capacity, overhead, and fault tolerance for an erasure-coded Ceph pool.

Assumes equal-sized OSDs and simple EC math (raw × k/(k+m)). Real-world usable may be lower due to metadata, PG balancing, and operational reserve.

What this Ceph erasure coding calculator does

Erasure coding (EC) in Ceph improves storage efficiency compared to triple replication, but capacity planning can get confusing fast. This calculator gives you a quick, practical estimate of:

  • Total raw cluster capacity
  • Erasure coding efficiency for your chosen profile (k+m)
  • Usable capacity at 100% fill and at your target fill level
  • How much raw capacity you need for a specific usable target
  • How many chunk failures a stripe can tolerate (equal to m)

Ceph EC basics in plain English

What do k and m mean?

In an EC profile, Ceph splits data into k data chunks and adds m coding chunks. So a profile of 8+3 means each stripe has 11 chunks total: 8 hold original data and 3 hold parity-like recovery information.

  • Storage efficiency = k / (k + m)
  • Raw overhead factor = (k + m) / k
  • Failure tolerance per stripe = m chunk losses

Bigger k generally improves efficiency, but recovery and small-write behavior can become more expensive.

How to use the calculator

  1. Enter OSD count and per-OSD capacity in TB.
  2. Enter your EC profile values (k and m).
  3. Pick a safe max fill level (many operators use 70–85%).
  4. Optionally enter a target usable capacity to size the cluster.
  5. Click Calculate.

The output gives you both “math maximum” capacity and “operational” capacity at your selected fill threshold.

Common EC profiles and efficiency

EC Profile Efficiency (k/(k+m)) Raw Overhead Stripe Failure Tolerance
2+1 66.7% 1.50x 1 chunk
4+2 66.7% 1.50x 2 chunks
6+2 75.0% 1.33x 2 chunks
8+3 72.7% 1.38x 3 chunks
10+4 71.4% 1.40x 4 chunks

Operational caveats you should not ignore

1) Failure domain matters

Chunk tolerance is not automatically node tolerance unless your CRUSH rule and failure domain are configured correctly (host, rack, etc.). Always model expected failures at the domain level you care about.

2) Don’t plan to run at 100%

Ceph needs free space for rebalancing and recovery. Use this calculator’s fill-level field to keep realistic headroom. Most production environments reserve substantial slack.

3) Small I/O can cost more with EC

EC is highly space-efficient but can increase read/modify/write overhead for small random writes. For metadata-heavy or very latency-sensitive workloads, replicated pools are often still preferred.

4) Recovery impact grows with large drives

As OSD size grows, rebuild/recovery windows can become longer. Capacity math might look great, but recovery-time objectives can still fail.

EC vs replicated pools (quick comparison)

  • Replicated (size=3): simpler, often faster for small writes, ~33% efficiency.
  • Erasure coded: better capacity efficiency, but more CPU/network overhead and potentially slower small random writes.

Many production clusters use both: replicated pools for metadata/hot paths and EC pools for bulk object or block data tiers.

Planning checklist

  • Choose EC profile based on workload behavior, not just efficiency numbers.
  • Set realistic fill threshold (and keep emergency free space).
  • Validate CRUSH failure domain design before production.
  • Test degraded-mode performance and recovery speed.
  • Re-check capacity when hardware generations or drive sizes change.

Bottom line

A good Ceph erasure coding calculator saves time and reduces planning mistakes, but it is a first-pass model. Use it to compare scenarios quickly, then validate with your topology, CRUSH map, failure testing, and workload benchmarks.

🔗 Related Calculators