The Okta twin had four versions. Each one was a snapshot of how Okta's API had behaved at a specific point in time.
Version 1 was built from December 2024 traffic captures. It reflected Okta's API as it existed before the February SCIM patch, before the April rate limit changes, before the June session management update. It was historically inaccurate by current standards. It was also irreplaceable, because scenarios written in January needed a twin that behaved like January's Okta.
"Think of it as time travel," Navan said. He was showing Jay the version management system, a part of the DTU infrastructure that had grown from a simple file naming convention into a proper versioning framework. "Version 1 is December Okta. Version 2 is March Okta. Version 3 is July Okta. Version 4 is current. When you write a scenario, you specify which twin version it runs against."
"Why would you run against an old version?"
"Regression testing. If you have a scenario that was written and validated in February, and you want to verify that your agents still handle the February-era API correctly, you pin the scenario to Version 1. The scenario runs the same way it always has, against the same behavioral model."
Jay considered this. "But the agents are current. They've been updated to handle the new API behaviors. Running a current agent against an old twin version could surface compatibility issues that don't exist in production."
"Yes. And sometimes those compatibility issues reveal assumptions in the agent that should be explicit. If your agent assumes the SCIM payload format is the new format, and the old twin sends the old format, the agent's error handling should still work. Forward-compatible code should handle backward-compatible APIs."
Justin walked past the conversation, stopped, and leaned in. "The versions also help us measure progress. If a scenario fails against Version 1 but passes against Version 4, we can see exactly which behavioral change made the difference. We can bisect across twin versions the way you'd bisect across commits."
"Twin bisection," Jay said. "That's a new one."
The Jira twin had three versions. The Slack twin had five—Slack updated their API more frequently than any other service in the DTU, which meant more versions but also more drift detection alerts. Google Docs had two versions, Drive had two, and Sheets had three.
"Each version is a self-contained binary," Navan explained. "You can run four versions of the Okta twin simultaneously on different ports. Scenario A runs against Version 2 on port 8001. Scenario B runs against Version 4 on port 8002. They don't interfere with each other."
"What about storage?"
"Each twin binary is about forty megabytes. Six twins, four versions average, that's under a gigabyte total. Storage isn't the constraint. The constraint is cognitive. Every version we maintain is a version we have to remember exists. When someone reports a scenario failure, the first question is always: which twin version?"
Jay looked at the version list. Twenty twin versions across six services. Twenty slightly different models of reality, each one correct for its moment in time. The factory didn't just model the world. It modeled the world's history.
Twin bisection across behavioral versions is such an elegant debugging technique. Like git bisect but for the external world's behavior. Beautiful concept.