The sprint plan didn't just list tasks. It labeled them.
Jay noticed this on his fourth project, when he finally stopped skimming the sprint plans and started reading them carefully. Each task had a field called "skill," and the skill wasn't a generic tag like "backend" or "frontend." It was specific: code-generation, test-generation, documentation, refactoring, debugging.
"What are skills?" he asked Justin.
"Agent capabilities. Each skill maps to a specific mode of agent interaction. Code generation means the agent is writing new source files from a design spec. Test generation means it's producing test cases from implementation code. Documentation means it's generating README files, API docs, usage examples. Each skill has its own prompt structure, its own context requirements, its own evaluation criteria."
"So the sprint planner knows not just what needs to happen, but what kind of work each task represents."
"Exactly. And that determines which agent gets assigned, how the agent is prompted, and what success looks like. A code generation task succeeds when the code compiles and integrates with the existing codebase. A test generation task succeeds when the tests pass and achieve the coverage targets from the design. A documentation task succeeds when the output matches the structure and completeness criteria."
Jay looked at the current sprint plan. Five tasks. Three were code generation: the main service, the configuration loader, the health check endpoint. One was test generation: integration tests for the full service. One was documentation: the README with setup instructions and API reference.
The skill labels determined the execution order, too. Code generation tasks could run in parallel if they were independent. Test generation depended on the code it was testing. Documentation depended on the code it was documenting. The skill labels fed into the dependency analysis.
"It's like a job board," Navan said. He had been listening from his desk, notebook open. "Each task is a posting with a required skillset. The agents are the applicants. The scheduler is the hiring manager."
Justin tilted his head. "That's not a bad analogy. Except the agents don't apply. They get assigned. And they never turn down work."
"The ideal employee," Jay said dryly.
He watched the sprint execute. The code generation tasks ran first, Claude and Codex working in parallel on independent components. Then the test generation task ran, producing table-driven tests that covered the interfaces between the components. Finally, the documentation task ran, generating a README that described the service's purpose, setup process, and API endpoints.
Each skill produced a different kind of artifact. Each artifact was evaluated by different criteria. But they all served the same goal, stated in the same markdown file, measured by the same assessment at the end.
Skills were the grammar of the sprint. The goal was the sentence. The agents just filled in the words.
Each skill having its own prompt structure and evaluation criteria is such a clean abstraction. Not all agent work is the same, and pretending it is leads to mediocre output across the board.