Rate Limit Modeling

The Okta API allows 600 requests per minute per organization. The Jira API allows roughly one request per second for most endpoints, though Atlassian's documentation is characteristically vague about the exact numbers. The Slack API uses a tiered system: Tier 1 gets one request per minute, Tier 4 gets a hundred per minute. Google's APIs use a quota system measured in units per day, with different operations costing different numbers of units.

Navan had modeled all of them. Every twin tracked its own rate limit budget, decremented it with each request, and returned 429 Too Many Requests when the budget was exhausted. The response headers included Retry-After with the correct backoff interval, because rate limit handling wasn't just about the error code—it was about the recovery protocol.

"But here's the thing," Navan said, pulling up the twin configuration panel. "The limits are configurable."

He changed the Okta twin's rate limit from 600 per minute to 6,000 per minute. Then he changed it to 60,000. Then he turned rate limiting off entirely.

"With rate limits off, the twins respond as fast as the hardware allows," Navan explained. "No throttling. No backoff. No 429s. The agents can make ten thousand API calls in ten seconds and the twins just answer every one."

Jay watched the scenario runner's throughput counter. With production-equivalent rate limits, the Full Orbit scenario ran about sixty times per minute. With limits set to 10x production, it ran about five hundred times per minute. With limits turned off entirely, it ran 2,100 times per minute.

"That's the difference between 'testing' and 'stress testing,'" Justin observed. He was leaning against the doorframe, arms crossed, watching the numbers climb.

"We normally run at 10x," Navan said. "It gives us headroom to test concurrent scenarios without hitting synthetic limits, but it still exercises the rate limit handling code in the agents. The agents need to see 429s occasionally. They need to practice backing off. If we turn limits off entirely, the agents never learn to handle throttling."

"So you're training the agents against configurable resistance."

"Exactly. Production limits teach patience. 10x limits teach throughput. No limits teach us where the agents break when there's no external governor." Navan pulled up a chart from last week's unlimited run. "Without rate limits, Agent Seven tried to create four thousand Jira tickets in ninety seconds. It didn't crash, but its error rate on field validation went from 0.1% to 3.8%. Speed without limits exposed sloppiness."

Jay leaned forward. "So the rate limits in production aren't just protecting the service. They're protecting the agent from itself."

"In a way. The limits force the agent to slow down and be careful. Without them, the agent goes fast and gets careless." Navan reset the limits to 10x. "That's why 10x is the sweet spot. Fast enough to be useful. Slow enough to be careful."

The scenario runner settled into its new rhythm. Five hundred iterations per minute. The twins didn't charge. The twins didn't tire. They just kept answering.

Software Factory Archive

Kudos: 61