The realization arrived the way most important realizations do: sideways, during an unrelated conversation, while everyone was thinking about something else.
Jay was explaining a scenario failure to Navan. An agent had called the Jira twin with a request that should have returned a paginated list of issues. The twin returned the first page correctly but included a nextPage token that didn't work. The agent followed the token, got a 400 error, and the scenario failed.
"The pagination token is wrong," Jay said. "The twin is generating it but not storing it properly. When the agent comes back with the token, the twin doesn't recognize it."
"So the twin has a bug," Navan said.
"Yes. But here's the thing—I went to write up the bug, and I realized I wasn't writing a bug report. I was writing a specification. The description of how pagination should work, the expected token lifecycle, the correct behavior when a token is reused or expired—it read exactly like a spec document."
Justin looked up from across the room. "Say that again."
"The twin bug report reads like a specification. Because the twin IS a specification. It's an executable specification of how Jira's API should behave. When the twin has a bug, what's actually happening is that the specification is incomplete or incorrect. Fixing the bug is the same as refining the spec."
The room went quiet for a moment. Not the uncomfortable quiet of disagreement but the productive quiet of an idea settling into place.
"We've been thinking of the twins as test doubles," Justin said. "Stand-ins for real services. But they're more than that. A test double only needs to return the right answers for the tests you've written. The twins try to return the right answer for every possible input. That's not a test double. That's a specification."
"An executable specification," Navan added. "You can run it. You can query it. You can ask it how the service should behave in any situation, and it will show you."
"Better than a written specification," Jay said. "A written spec has ambiguities. You read a paragraph about pagination and two engineers interpret it two different ways. The twin has no ambiguity. You send a request, you get a response. That response IS the specification. There's no interpretation needed."
Justin stood up and walked to the window. He did this when he was thinking about something that changed the shape of the project. "If the twins are specifications, then improving twin fidelity isn't a testing activity. It's a specification activity. We're not building better test infrastructure. We're building a more complete and accurate description of how these services work."
"And the scenarios," Jay continued, following the thread, "aren't test cases. They're validation probes against the specification. They verify that the spec—the twin—correctly describes the world."
"Which makes the factory's methodology something like executable specification-driven development," Navan said. He reached for his notebook and started writing. "The NLSpec drives the agents. The twins specify the environment. The scenarios validate the specification. Everything is specification, all the way down."
Justin turned from the window. "Write that up. Not as code. Not as a spec. As a blog post. I think other people need to hear this."
Jay opened his laptop. Navan kept writing in his notebook. They worked in parallel, each capturing the same idea in their own way, which felt appropriately specification-like.
The twins as executable specifications rather than test doubles. This is the kind of conceptual shift that reframes everything. The whole DTU series has been building to this insight.