Latency Modeling

Real Okta took about 180 milliseconds to respond to a user lookup. Real Jira took about 350 milliseconds for a ticket creation. Real Slack was fast, usually under 100 milliseconds for message posting. The Google services varied wildly: Sheets could take anywhere from 200 milliseconds to three seconds depending on the complexity of the spreadsheet.

The twins responded in under five milliseconds. For everything. Always.

"We have three latency modes," Jay explained during a Friday architecture review. "Zero latency: the twin responds as fast as the hardware allows. That's the default for scenario runs where speed matters more than realism. Simulated latency: the twin adds artificial delay to match the real service's average response time. We use this for scenarios that test timeout handling and async workflows. Random latency: the twin adds delay drawn from a probability distribution that matches the real service's observed latency profile. Percentiles, tail latencies, occasional spikes."

"Which mode do you use most?" Justin asked.

"Zero latency. By a wide margin. When you're running two thousand scenarios a day, shaving three hundred milliseconds per API call adds up. A scenario that makes forty API calls takes twelve seconds with simulated latency and under a second with zero latency."

"But we found something interesting," Navan added. "When we switched to zero latency mode for the first time, three scenarios that had always passed started failing."

Justin's eyebrow went up. "Explain."

"The scenarios had race conditions that were masked by real-world latency. In one case, Agent A sent a request to the Jira twin and Agent B sent a dependent request to the Slack twin. Under real latency, Jira's response always arrived before Slack needed the data, because Jira is slower than Slack. But under zero latency, both responses arrived nearly simultaneously, and Slack's handler tried to use Jira's data before it was available."

"The latency was hiding the dependency," Jay said. "The code assumed an ordering that only held because of network timing, not because of explicit synchronization. Zero latency removed the accidental synchronization and exposed the real bug."

Justin sat with that for a moment. "So zero latency testing is like running your tests on a machine that's too fast. It finds bugs that slow hardware hides."

"Exactly. Real APIs have built-in buffers because of network latency. The twins, running locally at wire speed, strip those buffers away. Every ordering assumption becomes visible. Every implicit dependency becomes a potential race condition."

Navan pulled up the list of bugs discovered through zero-latency testing. There were eleven since they'd started tracking. Seven were race conditions. Two were timeout-related: code that set a timeout too aggressively and fired before the operation completed. Two were ordering bugs where downstream operations assumed upstream operations had finished based on typical response times rather than explicit completion signals.

"The twins don't just model the services," Jay said. "They model the absence of the network. And the absence reveals things the presence concealed."

He added zero-latency mode to the nightly scenario run. Every bug they found this way was a bug that would only appear in production under very specific, very unlucky timing conditions. Better to find them here, in the quiet of the factory, where failures were cheap and lessons were permanent.

Software Factory Archive

Kudos: 82