Navan found the first one on a Tuesday. He'd been reviewing the scenario logs—not because anyone asked him to, but because Navan reviewed things the way other people scrolled social media, compulsively and with genuine interest. And there, in the middle of a routine Okta provisioning scenario, was a test case he didn't write.
The agent had generated a sub-scenario: what happens when an Okta provisioning event fires while the Jira twin is mid-restart? A timing edge case. The user gets created in Okta, the SCIM event propagates, but Jira is momentarily unavailable. Does the system retry? Does it drop the event? Does it queue it?
Nobody on the team had thought of this case. The scenario set didn't include it. The NLSpec didn't mention it. The agent had imagined a failure mode that the humans hadn't and had generated a test for it unprompted.
"Jay," Navan said. "The agents are writing test cases."
Jay came over. Read the log. Read it again.
"This isn't in the scenario set," Jay confirmed. "The agent... inferred this edge case from the scenario context and tested for it on its own?"
"It's not just testing for it. Look at the implementation." Navan scrolled down. "It handles the case. Retry with backoff. Event queue with deduplication. The agent imagined the problem and solved it in the same pass."
Over the next week, they found more. Dozens of emergent test cases. An agent testing what happens when the Google Drive twin runs out of storage quota during a bulk provisioning. Another agent testing concurrent SCIM updates to the same user from different identity providers. A third agent testing what happens when a Slack channel creation fails silently—no error code, just a timeout.
Edge cases that nobody imagined. Failure modes that hadn't appeared in any specification. The agents were exploring the possibility space around the scenarios, finding the dark corners, and handling what they found there.
Justin's reaction was the quietest form of satisfaction. He nodded. He said "good." He went back to his laptop. For Justin, this was equivalent to a standing ovation.
"The scenarios are growing organically," Navan said later, writing in his notebook. "We plant the seeds—the core scenarios, the user stories, the happy paths. And the agents grow branches we never planned. Edge cases that extend from the trunk like new growth."
"That's a very botanical metaphor for someone who writes Swift," Jay said.
"I contain multitudes."
Jay laughed, which he didn't do often enough in the office. Then he got serious. "We should catalog these. The emergent test cases. If the agents are finding edge cases we missed, those edge cases should feed back into the scenario set. The scenarios should grow from what the agents discover."
"A feedback loop," Navan said.
"A feedback loop. The scenarios teach the agents what to build. The agents teach the scenarios what to test."
Navan wrote that in his notebook. Underlined it. Put a star next to it. Then added a small drawing of a tree with branches spreading in all directions, each branch labeled with a failure mode that nobody had planned for and the agents had found anyway.
The Jira-mid-restart timing case is the kind of bug that would only surface in production at 2 AM on a Friday and would take three days to debug. The agent found it and fixed it in one pass. I am both amazed and slightly threatened.