The codergen handler was the beating heart of Attractor, and Jay was the one who wrote the spec for it. Not the code. The spec. Seventy-three lines of precise English describing what should happen when a node of type codergen was reached in the pipeline.
The handler's job was simple in concept and staggering in implication: call a language model and ask it to write code. Take the context from upstream nodes—the requirements, the architecture decisions, the previous iteration's test results—and feed it all to an LLM with a prompt that said, in essence, build this.
"The critical thing," Jay told Navan, pointing at a section of his spec, "is the CodergenBackend interface. The handler doesn't know or care which LLM it's talking to. Claude, Codex, Gemini—the interface is the same. You just swap the backend."
"Like a database driver," Navan said.
"Exactly like a database driver. The handler speaks a protocol. Any backend that implements the protocol can be plugged in."
They watched the first codergen run in the late afternoon. An agent traversed the pipeline graph, reached the codergen node, and invoked Claude. The terminal filled with the rhythmic cadence of a streaming API response—tokens arriving in bursts, like a telegraph machine transcribing a message from somewhere far away.
Jay watched the output. He caught himself reading the generated code, evaluating it, and then stopped. Deliberate naivete. He closed the terminal window showing the raw output and opened the scenario results instead.
"Did it pass?" Navan asked.
"Satisfaction at 0.73 on the first run. Below threshold."
"So it loops?"
"The pipeline routes back to the codergen node with the assessment results attached. The agent sees what went wrong and tries again."
Second run: 0.81. Third run: 0.89. Fourth run: 0.94. The pipeline crossed the threshold and advanced to the next node. Each iteration, the agent had taken the previous failure, understood it, and corrected its approach. Not by being told how to fix the code, but by being told what the code was failing to do.
"It's compounding correctness," Justin said from across the room. He'd been watching from his own monitor. "Each iteration, the agent gets closer. It doesn't diverge. It converges."
Jay stared at the satisfaction curve on his dashboard—the new dashboard, the one he'd specced himself. Four data points, rising. A trajectory toward something that worked.
"You know what this feels like?" Jay said.
"What?"
"Teaching. It feels like teaching. You don't give the student the answer. You describe the world accurately and let them figure it out."
Navan nodded slowly. "Except the student has read the entire internet."
"Yeah," Jay said. "Except that."
0.73, 0.81, 0.89, 0.94. That convergence curve. Each number is the agent getting smarter about the problem. This is what "compounding correctness" looks like as data.