The question came from an engineer at KuppingerCole, during the Q&A. He was polite about it. He was also clearly horrified.
"If no human reads the code," the man said, adjusting his lanyard, "how do you trust it?"
Justin didn't pause. Didn't take a breath. He'd been waiting for this question the way a chess player waits for the obvious fork.
"I want to invert the question," Justin said. "How did you trust it before?"
The room got quiet. Not the polite conference-quiet where people are checking email. Actually quiet.
"In the old model," Justin continued, "a human writes code. A different human reviews it. Maybe they catch the bug. Maybe they're tired. Maybe the PR is four thousand lines and the reviewer skims the last third. Maybe the reviewer and the author are friends and the review is a rubber stamp. You call this trust?"
Back in San Mateo, Jay was watching the livestream on his phone. He'd propped it against his monitor while the agents ran a batch of scenarios against the Okta twin. He could hear the audience shifting in their seats.
"Trust in Software 1.0," Justin said, "was social. It was a feeling. You trusted the code because you trusted the person who wrote it, or the person who reviewed it, or the person who signed off on the deploy. But trust-as-feeling doesn't scale. Trust-as-feeling has a terrible track record."
Navan, sitting across from Jay, looked up from his notebook. He'd been sketching something—a diagram of satisfaction curves, pencil on graph paper, because Navan still used paper for thinking. "He's doing the thing," Navan said.
"He's doing the thing," Jay confirmed.
The thing was Justin's rhetorical move where he dismantled the premise before answering the question. Jay had seen it three times now. It was always effective.
"In the factory," Justin told the audience, "trust is empirical. We run scenarios. Thousands of them. End-to-end user stories executed against behavioral clones of every service the software touches. And we don't ask 'does it pass?' We ask: 'What fraction of observed trajectories through all scenarios likely satisfy the user?' That's the satisfaction metric. It's probabilistic. It's measurable. It's reproducible."
He let that land.
"So when you ask me how I trust code that no human has read—I trust it more. Because the old trust was a human squinting at a diff. The new trust is a thousand scenarios running against digital twins, producing a satisfaction metric I can put a number on."
The KuppingerCole engineer was writing something down. He hadn't expected this answer. He'd expected defensiveness, maybe a concession. Instead he got an inversion.
Jay muted the stream and turned back to his terminal. The Okta scenario batch had finished. Satisfaction was at 97.3 percent. He flagged the failing trajectories for examination and leaned back.
"You know what's funny?" he said to Navan. "I used to review PRs for three hours a day. I thought I was thorough. I thought I was building trust."
"Were you?"
Jay thought about it. Really thought about it. All those diffs, all those comment threads, all those approvals.
"I was building the feeling of trust," he said. "Not the same thing."
Navan went back to his notebook. The agents kept working. Outside, it was getting dark, but nobody in the factory noticed. The monitors were bright enough.
The line about rubber-stamping PRs when the reviewer and the author are friends hit me personally. I have done this. I am not proud of this.