Building a twin started with watching. Not reading documentation, not studying API specs. Watching. Recording. Capturing the actual behavior of the real service as it responded to real requests.
Navan called it behavioral capture. The process was methodical. Set up a proxy between the client and the real service. Route all API traffic through the proxy. Record every request and every response. Headers, body, timing, status codes. Everything. Then analyze the recordings.
"The documentation tells you what the service is supposed to do," Navan explained to Jay during one of their early capture sessions. They were building the behavioral model for the Sheets twin, and the proxy had been recording traffic for three days. "The recordings tell you what it actually does. Those are different things."
Jay was scrolling through recordings. "This endpoint says it returns a 404 when you request a non-existent sheet. But the recording shows a 400 with an error message about invalid sheet ID format."
"Because the ID we sent contained a hyphen, and Sheets doesn't use hyphens in sheet IDs. The documentation says 'not found.' The actual service says 'bad request.' Our twin has to match the actual behavior."
The capture phase typically lasted two to three weeks per service. During that time, the team exercised as many API paths as they could. They created resources, modified them, queried them, deleted them. They tried valid operations and invalid ones. They tested boundary conditions: empty strings, maximum-length fields, special characters, unicode, null values.
"We're not trying to break the service," Navan said. "We're trying to map its behavior space. Every response we record is a data point. The more data points, the more accurate the twin."
After capture came analysis. An agent processed the recordings and built a behavioral model: a structured representation of how the service responded to each type of request under each set of conditions. The model captured patterns. A user creation request with a valid payload always returned a 201. A user creation request with a duplicate email returned a 409, but only if the email was already active. If the existing user was deactivated, the service returned a 201 and reactivated the user. That conditional behavior wasn't in any documentation.
"Edge cases are the whole game," Justin observed during a review of the Okta twin's behavioral model. "The happy path is fifteen percent of the model. The other eighty-five percent is what happens when things aren't happy."
The final phase was gap filling. The captures couldn't cover every possible interaction. Some behaviors were too rare to trigger during a three-week window. Some required specific configurations that the capture environment didn't have. For those, the team relied on documentation, community forums, Stack Overflow posts, and occasionally direct experimentation with creative API calls.
"We found seven undocumented behaviors in the Jira API through creative experimentation," Navan reported. "One of them is that Jira silently truncates custom field values at 32,767 characters. No error. No warning. It just cuts off the data."
"And the twin replicates that?"
"The twin replicates that. Silent truncation at 32,767 characters. Because that's what the real service does, and the twin's job is to be real."
The gap between what documentation says and what the service actually does is the central truth of API development. This story captures it perfectly.