The Assessment

The assessment didn't care about your feelings.

Jay learned this on his third Agate project, a Go tool for validating Kubernetes manifests against a custom policy schema. Sprint one had gone well. The parser worked. The policy engine worked. The CLI accepted a manifest file and a policy file and printed violations in a clean table format. Jay had watched the implementation with the quiet satisfaction of someone who had seen the system produce good work before.

Then the assessment ran.

[assessment] Goal check: INCOMPLETE

He blinked. The tests had passed. All of them. Green across the board. How could the goal be incomplete?

He opened the assessment report in .ai/sprints/sprint-01/assessment.md and read.

The GOAL.md specifies that the tool should "validate Kubernetes manifests against a custom policy schema." The current implementation validates individual manifest files. However, the goal description uses the plural "manifests," and the interview response to question 5 confirms that the tool should accept a directory of manifests and validate all of them recursively. The current implementation does not support directory input. Goal status: incomplete. Recommended action: plan sprint 2 to add directory traversal and recursive validation.

Jay read it twice. The assessment had caught a discrepancy between the goal and the implementation that the tests had not caught, because the tests had been generated to match the implementation, not the goal. The tests were correct for what had been built. But what had been built wasn't complete.

"Ruthless," he muttered.

"What's ruthless?" Justin asked from across the room.

"The assessment. It's... it won't round up. I thought sprint one was done. Tests pass. Code works. But the assessment checked the output against the original goal description and found a gap."

Justin walked over and read the assessment. He nodded slowly. "This is the mechanism. This is why the loop converges. The assessment doesn't ask 'did the sprint succeed?' It asks 'is the goal met?' And those are different questions."

Jay leaned back. In his previous life, code review served this function—or tried to. A human reviewer would read the pull request, compare it to the ticket, check for gaps. But human reviewers were tired, overworked, sometimes just rubber-stamping PRs at the end of a long week. The assessment agent was none of those things. It had no Friday fatigue. It had no backlog of other PRs to review. It read the goal, checked the implementation, and said exactly what was true.

Sprint two added directory traversal. Sprint two's assessment said: Goal status: complete.

Two words. No qualifications. No caveats. No "looks good to me." Just the binary truth that the goal was met.

"Ruthlessly honest," Jay told Justin later. "That's the right adjective for the assessment phase."

"Good," Justin said. "Honesty is the only thing that makes a feedback loop work."

Software Factory Archive

Kudos: 88