Stanford Law

The paper arrived on a Thursday. Stanford Law CodeX—the Center for Legal Informatics—had published it through their AI governance working group. The title sat in Justin's inbox like a stone dropped into still water.

Built by Agents, Tested by Agents, Trusted by Whom?

Justin read the title aloud in the office. Jay looked up from his screen. Navan stopped typing. Nobody laughed.

The paper was twenty-three pages. Justin read it in one sitting, which he rarely did with academic papers. He usually skimmed, flagged sections, returned to them later. This one he read straight through because the authors had done something unusual: they'd actually understood the technical architecture before attempting the legal analysis.

They'd dissected the factory's trust model. Code written by agents. Tested by agents against scenarios. Validated by satisfaction metrics. Deployed without human code review. They'd identified the precise point where traditional software liability frameworks broke down: the absence of a human in the authorship chain. If an agent writes code that causes harm, who bears responsibility? The human who wrote the spec? The company that deployed the agent? The company that trained the model?

Jay read it next, more slowly. He lingered on the section about scenario coverage. The authors had grasped something subtle: that the factory's scenarios functioned as a form of due diligence. Not human review of code, but systematic validation of behavior. The paper argued that this might actually provide stronger evidence of due care than traditional code review, because scenarios were reproducible, quantifiable, and didn't depend on reviewer fatigue or expertise variation.

"They're saying our process might be more defensible than the traditional one," Jay said, sounding surprised.

"They're saying it could be," Justin corrected. "They're also saying the legal frameworks haven't caught up. We're in a gap. The technology has outpaced the law."

Navan found the section about Cedar policies and Leash. The authors had recognized that the factory didn't operate without guardrails. Agents were constrained. Their filesystem access was monitored. Their network connections were logged. Their tool calls were inspected. The paper framed this as a form of algorithmic governance—not human oversight of code, but systematic oversight of agent behavior.

"They called Leash 'a regulatory sandbox for autonomous software development,'" Navan read aloud.

Justin considered this. "That's not wrong."

The paper concluded with questions, not answers. That was honest. The legal landscape for AI-authored software was genuinely uncharted. The authors hadn't tried to force premature conclusions. They'd mapped the terrain and planted flags where the hard questions lived.

Justin forwarded the paper to StrongDM's legal team with a two-line note. Then he sat quietly for a while, looking at the title again. Built by Agents, Tested by Agents, Trusted by Whom?

The question wasn't rhetorical. It was the question. And the factory's answer—trusted by the scenarios, trusted by the metrics, trusted by the constraints—was either the beginning of a new framework or a footnote in a cautionary tale. Justin believed it was the former. The Stanford paper suggested the law hadn't decided yet.

Software Factory Archive

Kudos: 118