Navan found it at 2:47 AM on a Tuesday, because of course it was 2:47 AM on a Tuesday. The important discoveries at the factory never happened during business hours.
He was running a batch validation on the Google Docs behavioral clone—a routine check, the kind of thing he did every few days to make sure the twin's API responses still matched the behavioral profile they'd built from production observations. Create a document, insert some text, share it with a test user, verify the collaboration cursor shows up, verify the revision history populates correctly. Boring. Methodical. The kind of work that was important precisely because it was boring.
The test document was supposed to contain placeholder text. The scenario definition specified "Lorem ipsum dolor sit amet" as the initial document body. The agents would create the document, populate it, and then run a series of collaboration scenarios against it.
But when Navan opened the test document's content through the twin's API, the body didn't say "Lorem ipsum."
It said: "Satisfaction metric convergence rate by scenario class, observed weekly deltas."
He read it three times. Then he checked the scenario definition. Lorem ipsum. He checked the agent's execution log. The agent had created the document and inserted—according to its own logs—the Lorem ipsum text. But the document body, as returned by the Google Docs twin, contained a phrase that sounded like it had been pulled from their internal metrics dashboard.
Navan's first thought was a data leak. Maybe the twin was somehow reading from a shared data source, contaminating test documents with real operational data. He checked the twin's data isolation layer. Clean. No shared state. No cross-contamination paths.
His second thought was more unsettling. The Google Docs twin generated realistic document content as part of its behavioral model. When a scenario required a document to exist, the twin didn't just store raw bytes—it generated contextually appropriate content to make the behavioral simulation more realistic. It was supposed to generate generic business content. Meeting notes. Project plans. Status updates.
But the twin's content generation model had been trained on their own usage patterns. And their own usage patterns were dominated by one thing: the factory's internal metrics and processes.
The twin wasn't leaking data. It was generating data that looked like their data because their behavioral patterns were the strongest signal in its training distribution.
Navan scrolled through more test documents. A spreadsheet labeled "Token expenditure forecast." A shared doc titled "Scenario coverage gaps, Q4." A slide deck with the heading "DTU Architecture Review."
The Google Docs twin had learned to write like them. Not their code, not their scenarios, but their documents. Their way of organizing thoughts. Their vocabulary. Their concerns.
He sat in the blue light of his monitor and tried to decide if this was a bug or a feature. The twin was supposed to behave like Google Docs. And Google Docs, when used by the factory, would contain exactly this kind of content.
It was correct. That was what made it creepy.
Navan filed the observation in the incident log, tagged it "behavioral fidelity — unexpectedly high," and went to make tea. He did not look at the test documents again that night.
"behavioral fidelity -- unexpectedly high" is the most Navan way to describe something deeply unsettling and I love him for it.