All postsInside the lab

Inside the lab: the breakthrough that lets us swap models without breaking anything

Written by: Marcus Shepherd
Published: April 24, 2026
Read time: 5 min read

An engineering milestone we hit this quarter. Our agent runtime is now fully model-agnostic, which means everything we have ever shipped can move to a better model overnight.

This would normally be an internal note that stayed in our Slack. We are putting it on the blog because the breakthrough behind it changed how we ship at Sync, and the shape of the story is worth sharing.

The breakthrough, in plain language: as of this quarter, our agent runtime is fully model-agnostic. Every AI Employee, every agentic workflow, every internal tool we have built, they all run on a runtime where the underlying model is now a swappable component. Not a refactor. Not a migration project. A swap.

Why this was hard

When we started building AI Employees, every system had its model wired into it. The prompts were tuned to that model's quirks. The output schemas matched that model's strengths. The retry logic worked around that model's failure modes. Sounds reasonable, looks fine on a system diagram, and it absolutely paints you into a corner.

The corner shows up the first time a new flagship model lands and we want to upgrade. Suddenly "swap the model" means re-tuning prompts, re-validating evals, re-running QA against every system. Multiply that across a book of AI Employees and the upgrade cost rivals the cost of the next build.

We knew the corner was coming. We started designing out of it about eighteen months ago.

What we built

The runtime now sits between our AI Employees and the model. Every model interaction passes through three layers:

A capability adapter. Our systems describe what they need from the model in capability terms, "structured tool-call with retry," "long-context summarization with citation," "stateful agent loop with checkpointing." The adapter translates that into whatever the current best model exposes.
A prompt-shape normalizer. Each model has its own preferred prompt shape. The normalizer holds the correct shape per model and lets the AI Employee write its prompts once, model-neutral.
An eval harness that runs on every swap. Before a model is allowed to take over for an AI Employee in production, the harness replays a curated set of historical decisions through the new model and checks the answers against ground truth. A failure on the harness blocks the swap. A pass routes it.

The first AI Employee we migrated this way moved between three models in a single weekend without a single production regression.

Why it matters

Two reasons we are leaning into this hard.

The first is internal: it cuts the cost of staying current to nearly zero. The AI lab landscape moves fast and we are not going to bet our work on any one provider. Every flagship that ships with better reasoning, lower cost, or new capabilities is a free upgrade for the systems we already have running.

The second is what it lets us promise the people we work with: the work compounds. Everything we build at Sync sits on a foundation that gets stronger over time, not one that ages out of relevance the next time the model landscape shifts. The integrations, the training data, the trust thresholds, the audit trails, all of it survives. Only the model in the middle changes, and it usually changes for the better.

That is the kind of foundation we wanted to build from day one. As of this quarter, it is the foundation we are running on.

Back to all postsMarcus Shepherd

Why this was hard

What we built

Why it matters

Want this kind of work running inside your business?