Quanta
A budget-aware load generator that finds the request rate where your p99 latency falls off a cliff.
- Rust
- tokio
- hyper
- HDR histograms
Quanta is a small load generator with one opinion: averages lie, so it only reports the tail. It exists to answer a single question — at what request rate does p99 latency stop being linear and start being a cliff?
The problem
Most load tools either flood a target with open-loop traffic that ignores coordinated omission, or bury the tail under a mean that looks fine right up until production doesn't. I wanted honest tail numbers at a controlled rate.
What I built
A tokio + hyper client that drives a fixed arrival rate (closed-loop, with
correction for coordinated omission) and records latencies into HDR histograms.
It sweeps load upward and flags the inflection point where p99 detaches from the
median, so you get a number to design around instead of a vibe.
Outcome
A tool I reach for before shipping anything that takes traffic — including this site's API. It's deliberately tiny; the value is in what it refuses to average away.