Skip to content

← Back

Hero illustration for chapter 13, Reliability & Safety

Framework

Reliability & Safety

Evals, red teaming, regression budgets, reliability as a first-class engineering metric.

Reliability is the unsexy superpower

Everyone wants to talk about capability. The teams quietly winning are the ones who treat reliability as a first-class metric, with the same rigour they'd apply to latency or uptime.

The reliability stack

  • Offline evals. A golden set of prompts with expected behaviour, run in CI. Block merges on regression.
  • Online evals. Sample real traffic, score it, dashboard it. Watch trends weekly.
  • Red teaming. Adversarial prompts, jailbreaks, prompt injection, done by a dedicated rotation, not "whoever is free."
  • Regression budgets. A defined tolerance for behavioural drift. When you blow the budget, you stop, you don't ship around it.
  • Kill switches. Per-feature, per-tenant, per-tool. Tested quarterly. Documented.

Safety as architecture

Treat safety the way you treat security: as architecture, not vibes.

  • Input filters for known-bad patterns.
  • Output filters for sensitive categories.
  • Tool sandboxing so the agent can't do what it shouldn't, even if it tries.
  • Rate limits and scope limits as primary safety controls, not afterthoughts.

Further reading