The November Inflection

Justin had been watching the capability curves for months. He'd plotted them himself, on a spreadsheet that nobody else had access to, tracking the factory's satisfaction metrics against the release dates of new model versions. The first inflection had been October 2024—Claude 3.5's second revision, the moment when long-horizon agentic workflows started compounding correctness instead of error. That inflection had birthed the factory.

The second inflection arrived in November 2025.

Claude Opus 4.5 and GPT 5.2 landed within days of each other. Neither announcement was dramatic. No keynote theater, no live demos with planted questions. Just release notes, API updates, and new model endpoints. The drama was in the results.

Jay noticed it first in the CXDB scenarios. A batch of edge-case scenarios that had been hovering at 87% satisfaction for weeks suddenly jumped to 96%. He hadn't changed the specs. He hadn't changed the scenarios. The agents had gotten better overnight, and the improvement was visible in the metrics like a step function.

"Look at this," he said, pointing at the satisfaction dashboard. The curve had a visible discontinuity. A cliff edge going up.

Navan saw the same thing in the Leash scenarios. A set of complex Cedar policy tests that agents had been struggling with—policies with nested conditions and exception paths—were suddenly passing cleanly. The agent's ability to reason about policy logic had improved qualitatively, not just quantitatively. It wasn't making fewer mistakes. It was understanding the policies differently.

Justin pulled up his private spreadsheet and added two data points. The satisfaction averages across all projects, before and after the model updates. The gap was the largest single-event improvement the factory had ever recorded.

"I saw it coming," Justin said, and he wasn't boasting. He was reporting. "The October 2024 inflection told us what these models could do when they crossed a capability threshold. The question was always: when would they cross the next one? Not if. When."

The factory had been designed for this. The non-interactive model, the spec-driven workflow, the satisfaction metrics—all of it was built on the assumption that AI capabilities would continue to improve and that the factory should capture those improvements automatically. When a better model arrived, you didn't rebuild your process. You plugged the new model into the existing pipeline and measured what happened.

What happened was a step change in quality. Scenarios that had been borderline started passing cleanly. Agents that had needed three iterations to converge on a correct implementation started converging in one. The Agate loops shortened. The sprint cycles compressed. The satisfaction metrics climbed.

"The architecture amplifies capability improvements," Jay said, understanding something he'd sensed but hadn't articulated. "We built a system that gets better when the models get better, without us doing anything."

"That was the design," Justin said. He closed his spreadsheet. "That was always the design."

The November inflection wasn't a surprise. It was a confirmation. The factory wasn't built for one model generation. It was built for the curve.

Software Factory Archive

Kudos: 141