Jay configured the Jira twin to return a 500 Internal Server Error on every third request. Not randomly—deterministically. Every third request, without fail. He wanted to see what Agent Four would do when the world became predictably hostile.
Agent Four was responsible for creating and updating Jira tickets as part of the sprint management scenario. Under normal conditions, it created tickets, set priorities, assigned them, and transitioned them through the workflow. It was reliable, efficient, and utterly unprepared for systematic failure.
The first 500 hit on the third API call: a ticket creation request. Agent Four received the error, waited two seconds, retried. The retry succeeded because it was the fourth request. The agent continued. Good.
The sixth request was a ticket update. 500 again. Agent Four retried after two seconds. Success. The agent continued. Still good.
The ninth request was a status transition. 500. Agent Four retried. But the retry was the tenth request, which succeeded. However, Agent Four didn't check whether the transition had actually been applied on the first attempt. It had. The retry attempted the same transition again, and the Jira twin returned a 409 Conflict because the ticket was already in the target status.
Agent Four did not handle 409s.
"There it is," Jay said. The agent had logged the 409 as an unexpected error and halted the scenario. "The agent retries on 500, but it doesn't check for idempotency. If the 500 was a response-level failure but the operation actually succeeded server-side, the retry creates a duplicate operation."
Navan was already writing the spec for the fix. Not code—a description of the expected behavior. Before retrying a state-changing operation after a 500 error, the agent must verify whether the operation was actually applied by querying the current state of the resource.
"Now try timeouts," Justin suggested.
Jay reconfigured the twin. Instead of 500s, the Okta twin would introduce a thirty-second delay on every token refresh request. Not an error—a stall. The request would eventually complete, but only after the agent's patience ran out.
Agent Two hit the timeout. It logged a connection timeout error and retried. The retry also stalled. Agent Two retried again. And again. Four pending requests, all stalled, all consuming connections from the agent's HTTP client pool.
"Connection pool exhaustion," Navan observed. "The agent doesn't have a circuit breaker. It keeps opening new connections until the pool is full, and then every subsequent API call to any service fails because there are no connections available."
"A slow Okta takes down Jira, Slack, Drive, Docs, and Sheets," Jay said. "One misbehaving service poisons the entire agent through resource exhaustion."
Justin sat down at the table. "Write a malformed response scenario next. Valid HTTP 200, but the JSON body is truncated. See if the agent validates response structure or just parses and hopes."
"Parses and hopes," Jay predicted.
He was right. Agent Three received a truncated JSON response from the Slack twin and passed the partial data downstream without validation. The Sheets twin received a row of data with three missing fields and wrote it to the spreadsheet. The spreadsheet now contained a row that was technically present but structurally incomplete.
Silent corruption. The worst kind of failure. The kind you only find if you go looking.
They went looking all afternoon.
The connection pool exhaustion from a slow service cascading to take down all services is chef's kiss. This is exactly how production outages happen. Testing it in twins is genius.