Jay drew the diagram on a napkin at lunch. Not because he had to—he could have pulled out his laptop—but because some ideas need to be born in ink on paper, sketched fast before the shape of them fades.
"SRE error budgets," he said, tapping the napkin with his pen. "You know how it works. You define an SLO—say, 99.9% availability. That gives you an error budget of 0.1%. You're allowed that much downtime per month. If you burn through your error budget, you freeze deployments and focus on reliability. If you have budget remaining, you can take risks. Ship faster. Experiment."
Justin and Navan were listening. The lunch crowd at the Thai place buzzed around them.
"I want to apply the same framework to the agents." Jay drew a horizontal line on the napkin. Above it, he wrote 1.0. Below the line: 0.95. "Our satisfaction target is 0.95. Anything above that, the system is healthy. The gap between our current satisfaction metric and 1.0 is the error budget. Right now we're at 0.97. That means we have a budget of 0.03."
"Three percent of trajectories are allowed to fail," Navan said, leaning in.
"Exactly. And here's the key insight." Jay drew an arrow. "When we have error budget remaining, the agents should be allowed to take bigger swings. Larger refactors. Riskier architectural changes. Experiments that might temporarily tank the satisfaction metric but could lead to major improvements."
"And when the budget is exhausted?" Justin asked.
"When satisfaction drops below 0.95, the agents switch modes. No more experiments. No risky refactors. They focus exclusively on reliability. Small, safe changes. Bug fixes. Incremental improvements. They earn back the budget through conservative, correct work."
Navan picked up the napkin and studied it. "You're describing two operating modes for the agents. Explore and exploit. When things are stable, explore. When things are fragile, exploit."
"It's not just explore-exploit. It's about making the tradeoff explicit and automatic. Right now, we make that decision manually. We look at the metric, we make a judgment call about whether it's safe to let the agents try something ambitious. I'm saying we encode that judgment into the system."
Justin was quiet for a long moment. Jay knew that silence. It was the silence of Justin running scenarios in his head, looking for the flaw, the edge case, the way it breaks.
"What about cascading failures?" Justin said finally. "An agent takes a big swing, tanks satisfaction to 0.91. Now it's in conservative mode. But the big swing left architectural damage that conservative fixes can't easily unwind. The budget stays exhausted."
"Rollback threshold," Jay said. He'd already thought about this. "If satisfaction drops more than five points in a single iteration, we automatically roll back to the last known-good state. The budget system governs incremental risk. Catastrophic risk gets a different mechanism."
Navan set down the napkin. "I want to build this."
"You want to describe this," Justin corrected gently. "And let the agents build it."
Navan grinned. "Right. Old habits." He pulled out his phone and started typing. A scenario. When the satisfaction metric exceeds the target SLO, agents should be permitted to attempt high-risk optimizations. When satisfaction falls below the SLO threshold, agents should restrict changes to conservative, reliability-focused patches...
Jay watched him type and felt something shift. This was the thing about the factory that still surprised him six months in. You didn't build the thing. You described the thing. And then the thing built itself.
The error budget system was operational by the following Tuesday. The first time an agent burned through its budget—a bold attempt at parallelizing the Okta twin's event pipeline that dropped satisfaction to 0.93—it switched to conservative mode automatically, restored the metric to 0.97 within four hours, and then, with budget restored, tried the parallelization again with a more conservative approach.
It worked on the second try. Satisfaction held at 0.96. The Okta twin was thirty percent faster.
Jay pinned the napkin to the wall above his desk. It stayed there for the rest of the year, the ink fading slowly, the diagram still legible, a reminder that the best ideas start as fast sketches on cheap paper.
As an SRE, this fic speaks to my SOUL. The error budget framework applied to AI agents is genuinely brilliant. Explore when stable, exploit when fragile. And the rollback threshold for catastrophic risk is exactly the kind of safety mechanism you need. Jay is doing real engineering here.