Quality Without Review

The numbers were counterintuitive and Jay said so.

He'd run the analysis because nobody asked him to, because he was the kind of engineer who ran analyses when he was curious. He'd compared satisfaction metrics across four months of factory output. Then he'd compared those metrics to historical data from StrongDM's pre-factory codebase, using the same scenarios retroactively applied to the old code.

Code quality had improved when human review was removed.

Not by a small amount. The satisfaction scores for agent-produced code were consistently higher than the scores for human-written, human-reviewed code evaluated against the same scenarios. The agent code was more consistent. More idiomatic. More resilient to edge cases. Better.

"This can't be right," Jay said to Navan, who was sitting across the table with his notebook open to a page full of diagrams. "Human code review is supposed to be the quality gate. That's the whole point. Two pairs of eyes are better than one."

"Two pairs of tired eyes," Navan said, not looking up from his drawing. "At three in the afternoon. After a two-hour meeting. Reading a diff that's nine hundred lines because someone combined three PRs to hit a deadline."

Jay opened his mouth. Closed it. Opened it again.

"The agents don't cut corners," Navan continued. "They don't have deadlines. There's no sprint ending on Friday. There's no PM asking if this can ship by end of quarter. They just run scenarios until they converge. If convergence takes an hour, it takes an hour. If it takes twelve hours, it takes twelve hours. They don't care."

"But review catches things that tests don't."

"Review catches things that tests don't. Scenarios aren't tests. Scenarios catch things that review doesn't."

Jay brought the numbers to Justin. Justin looked at them the way he looked at everything: carefully, completely, without visible emotion until he'd formed an opinion.

"This is what I expected," Justin said.

"You expected quality to improve without review?"

"I expected quality to improve without human review because human review was never the quality mechanism. Human review was the trust mechanism. People reviewed code to feel confident about it. But confidence and quality are different things."

Justin pulled up the satisfaction dashboard. Ninety-seven point six percent across the board. Higher than any period in the company's history.

"Code review was a compromise," Justin said. "It was the best quality tool we had when humans were writing the code. But it was always bottlenecked by human attention, which is limited, and human judgment, which is inconsistent. The agents replaced both the writing and the review with a tighter loop: write, run scenarios, evaluate satisfaction, iterate. The loop is the review."

Jay went back to his desk and stared at his analysis for a long time. He'd spent years getting good at code review. He'd prided himself on it. Thorough comments. Thoughtful suggestions. Catching the subtle bugs.

He'd been very good at a thing that turned out to be unnecessary. It was a strange feeling. Not bad, exactly. Just strange.

Software Factory Archive

Kudos: 189