The conference room in Munich held two hundred people, and nearly every seat was taken. ICCID 2025—the International Conference on Cybersecurity, Identity, and Digital Trust. The attendees were European security professionals, CISOs, identity architects, compliance officers. People whose careers were built on the premise that you trust nothing until it proves itself trustworthy.
Justin stood at the podium with a single slide behind him. The slide showed two numbers: a satisfaction percentage and a date. He let the audience look at the numbers before he started speaking.
He began by describing the factory. Non-interactive development. Specifications written by humans. Code written by agents. Validation through scenarios. No human code review. He delivered each statement clearly, watching the audience. The skepticism was visible. Arms crossed. Heads tilted. A few people typing notes that Justin suspected were objections rather than summaries.
When he said "no human code review," a man in the third row shook his head. Not rudely. Reflexively. The way you shake your head when someone tells you they drove cross-country without checking the oil.
Justin acknowledged the reaction without calling it out. "I know what that sounds like to this audience," he said. "This is a room full of people whose job is to verify, to audit, to prove compliance. The idea that code goes into production without a human reading it feels like a violation of everything you stand for."
He advanced to the next slide. A graph. The X-axis was time. The Y-axis was satisfaction. The curve started low, climbed steadily, plateaued briefly, then jumped in November. He pointed to the jump.
"This is the satisfaction metric over six months," he said. "Every data point represents the fraction of observed trajectories through all scenarios that likely satisfy the user. This isn't a test suite. It's not pass-fail. It's a probabilistic measurement of behavioral correctness."
The audience was listening now. Skepticism hadn't gone away, but it had shifted from rejection to interrogation. They wanted to challenge the metric, not dismiss it.
A woman near the back raised her hand. "What's the denominator? How many scenarios?"
"Thousands," Justin said. "Running continuously. Against digital twins that replicate the behavior of the production services we integrate with. We test against behavioral clones of Okta, Jira, Slack, Google Docs. No rate limits. No API costs. Thousands of scenario runs per hour."
"And the agents are constrained?" another attendee asked.
"Containerized. Every filesystem access logged. Every network connection monitored. Cedar policies governing what they can and cannot do. Full syscall monitoring." He paused. "The agents operate with less unsupervised freedom than most human developers."
That landed. Justin saw it land. The arms uncrossed. A few people in the audience exchanged glances—the kind of glance that means he has a point.
The Q&A ran over by fifteen minutes. The conference moderator let it. The questions were sharp, technical, and respectful. Nobody in the room was convinced that the factory was the future. But several people left the room no longer convinced that it was impossible.
For a European security audience, that was a win.
"The agents operate with less unsupervised freedom than most human developers." That's the line that changes the conversation. Reframing AI code generation as MORE controlled than human development, not less.