ceph calculator - Aaron Graves, PhDude Replica

Ceph Capacity Calculator

Estimate raw capacity, protected usable capacity, and safe operating headroom for a Ceph cluster.

Number of OSDs

OSD Size (TB)

System/Operational Reserve (%)

Reserved for metadata, rebalancing, and operational overhead.

Target Max Utilization (%)

Ceph clusters are usually healthier below full utilization.

Data Protection Type

Replication Size

Common values: 2 or 3 copies.

EC Data Chunks (k)

EC Coding Chunks (m)

Current Used Data (TB)

Monthly Growth (TB)

What this Ceph calculator helps you estimate

A Ceph cluster can look large on paper, but the practical usable capacity is always lower than raw disk total. This ceph calculator gives you a realistic planning view by accounting for protection strategy (replication or erasure coding), reserved overhead, and a safe utilization ceiling.

In other words, the calculator answers a more useful question than “How many TB do I own?” It answers: “How many TB can I safely store while keeping the cluster stable during failure and recovery events?”

How the calculation works

1) Raw capacity

Raw capacity is straightforward:

Raw TB = OSD count × OSD size (TB)

2) Operational reserve

Real environments keep some space reserved for metadata growth, balancing, temporary amplification, and normal operational headroom. The calculator subtracts a reserve percentage from raw capacity.

Post-reserve TB = Raw TB × (1 − reserve%)

3) Protection efficiency

Protection settings directly change usable storage:

Replication size 3: efficiency is 1/3 (about 33.3%).
Erasure coding k+m: efficiency is k/(k+m).

Theoretical usable TB = Post-reserve TB × efficiency

4) Safe target utilization

Running near 100% full is dangerous for distributed systems. Recovery and rebalance operations need free room. The calculator applies your max safe utilization target (often 70–85% depending on workload risk tolerance).

Safe usable TB = Theoretical usable TB × target utilization%

Replication vs erasure coding: quick guidance

When replication is usually better

Latency-sensitive workloads.
Small object and random-write heavy patterns.
Simpler operational behavior and troubleshooting.

When erasure coding is usually better

Capacity efficiency is a high priority.
Large object or archive-style data.
You can tolerate higher CPU and recovery complexity.

Capacity planning tips for Ceph

Plan for failures, not just steady-state: ensure spare room during OSD/node loss and backfill.
Separate hot and cold data: use replicated pools for hot paths, EC pools for bulk capacity.
Track growth trends: monthly ingest rates are more reliable than one-time snapshots.
Use alerts early: warning thresholds should trigger before critical “nearfull/full” states.
Review CRUSH and failure domains: resilience assumptions depend on proper placement rules.

Common mistakes this calculator helps avoid

Assuming raw TB equals application-usable TB.
Ignoring overhead from protection settings.
Operating too close to 100% and then struggling during recovery.
Missing expansion timelines until a nearfull alert appears.

FAQ

Is this calculator a full replacement for Ceph benchmarking?

No. It is a planning calculator for capacity and growth. Performance still depends on media, network, CPU, placement groups, workload shape, and tuning choices.

What utilization target should I choose?

Many teams choose around 75–80% for safer operations. Aggressive environments may go higher, but risk rises when recovery events hit a cluster with limited free space.

Can I use one set of numbers for the entire cluster?

It is best to calculate by pool class (for example SSD hot pool vs HDD archive pool), because protection methods and growth rates can differ significantly.