Quanta — Robert Kirsammer

Quanta is a small load generator with one opinion: averages lie, so it only reports the tail. It exists to answer a single question — at what request rate does p99 latency stop being linear and start being a cliff?

The problem

Most load tools either flood a target with open-loop traffic that ignores coordinated omission, or bury the tail under a mean that looks fine right up until production doesn't. I wanted honest tail numbers at a controlled rate.

What I built

A tokio + hyper client that drives a fixed arrival rate (closed-loop, with correction for coordinated omission) and records latencies into HDR histograms. It sweeps load upward and flags the inflection point where p99 detaches from the median, so you get a number to design around instead of a vibe.

Outcome

A tool I reach for before shipping anything that takes traffic — including this site's API. It's deliberately tiny; the value is in what it refuses to average away.