Den Delimarsky, the lead maintainer of the Model Context Protocol at Anthropic, was asked in a recent interview what people get wrong when they build MCP servers. His answer, paraphrased: don't translate your entire API into an MCP server. Take your API, turn every endpoint into a tool, hand the agent dozens of near-identical things to pick from, and you've built something that burns tokens, guesses wrong, and falls over at launch.
I see this constantly right now. Here is what actually goes wrong, why the failure hides from your dashboards, and what we build instead.
Twenty tools is not a workflow
Here is the pattern. A team stands up a sales MCP server with twenty-plus tools, one per Salesforce object, and points their reps at it. Technically, it works. The agent can read opportunities, update contacts, pull cases. But that's the back end. It's not a workflow. The rep still has to know what to ask and stitch the context together themselves, so every rep reinvents the wheel, every time.
The engineering evidence on this is unusually consistent. Anthropic's own engineering team documented the mechanics: every tool definition is loaded into the model's context before any work happens, and every intermediate result makes a round trip through the model. Their guidance on writing effective tools for agents opens with the principle most teams skip: deciding which tools not to build, and consolidating the rest around real tasks. Independent testing by Jentic found that tool-selection accuracy starts degrading after just a handful of tools, and that adding more detail to compensate makes it worse.
"It sort of works" is the part that should scare you
If raw-tool MCP servers simply failed, this would be a short post. They don't fail. They sort of work, and that's worse.
Because the AI is nondeterministic, some reps figure out the magic phrasing and get real value. They're the ones in the demo. The rest hit friction twice, get a wrong answer once, and quietly go back to their old way of working. Nothing errors. Nothing pages anyone. The login dashboard still shows activity, because the power users are active. The tool looks adopted in a demo and dies in the field.
Deterministic software never failed like this. When a workflow tool broke, it broke loudly: a ticket, an error, an angry Slack message, a fix. Agent rollouts fail by attrition, and attrition doesn't file tickets. By the time the renewal conversation happens, the only evidence is a usage curve that flattened in week three, and nobody can say exactly why.
This is the part of the MCP conversation almost nobody is writing about. The engineering blogs cover token budgets. The vendors cover governance. The thing that actually kills these projects is sitting in the gap between "the demo worked" and "my team uses this every day."
The wrapper is the product
The fix is the wrapper: the layer between raw system access and the person doing the work. It has three parts, and they're design work, not plumbing.
Skills. Collapse the raw tools into a few purpose-built, persona-specific operations. Not "query opportunity" and "list contacts" but "prep me for my QBR with this account." Behind that one skill: five queries (opportunity history, open support cases, product usage, recent call summaries, email threads) and one structured brief, the same format every time, for every rep. The phrasing a rep uses stops mattering, because the workflow lives in the skill, not in the prompt.
A knowledge layer. The agent needs the data model explained: what a "tier-1 account" means here, which fields are trustworthy, which pipeline stage names changed last year. Without it, every answer is technically grounded and contextually naive.
Playbooks. Written around what the user actually needs in the moments that matter: pre-call, post-call, QBR prep, deal slipping. The playbook decides which skills exist at all. If you can't name the moment a skill serves, it shouldn't be in the menu.
We've shipped this pattern. Sales Genie, the sales copilot we built for a global enterprise AI company, is exactly this wrapper: a handful of persona-specific skills living in Slack, grounded in the customer's sales motion, returning the same structured output every time. The result was about four hours back per rep, per week, and the detail that matters for this post is why it got used: no rep ever had to learn which tools to chain. They asked for the thing they needed, in the place they already worked.
The skills behind it are governed in Workato, and that has a consequence worth underlining: the same skills can be served to an agent in Slack today and published as MCP servers to Claude, ChatGPT, or Cursor tomorrow, without a rewrite. The wrapper outlives your protocol and agent choices. That's what makes it the layer worth investing in.
Why we build the wrapper on Workato
Wrapper design is iterative by nature. You will get the first skill menu wrong. A playbook that made sense in week one gets rewritten in week four once you watch real usage. If every iteration is a code rewrite and a redeploy, the wrapper never converges, and the project dies in the gap.
This is why we build on Workato: skills are governed platform objects, and collapsing twenty tools into four, or rewriting a playbook, is configuration, not a deployment. The same skills publish as composite MCP servers when you want them in front of Claude, ChatGPT, or any MCP-compatible agent. Workato's own numbers make the iteration-speed case for me: in their internal hackathon this spring, 46 teams produced 180 working builds on Enterprise MCP in 30 days, and roughly three in four builders were in non-technical roles. That's what "the wrapper is config" looks like in practice.
Five questions before you point reps at an MCP server
If you're evaluating an agent rollout right now, these five questions will tell you whether you're about to ship a back end or a product.
1. Is the unit of value a task or an endpoint? Can a rep say "prep me for my QBR" and get one structured brief? Or do they need to know which of twenty tools to invoke, in what order?
2. Who owns the playbook? Someone should be able to show you, in writing, which moments the skills serve and what the output format is. If the answer is "each rep figures out what works," that's the back end talking.
3. What happens when a rep phrases it wrong? If the answer quality depends on prompt phrasing, the workflow is living in the prompt instead of the skill, and only your power users will ever see value.
4. How will you measure adoption after the demo? Logins lie. Per-skill usage in week six, across the whole team rather than the three power users, is the number that predicts renewal.
5. What does changing a playbook cost? If the honest answer involves a sprint, the wrapper can't iterate at the speed the field will demand, and the field always demands it.
The teams getting this right aren't the ones with the most tools exposed. They're the ones who treated the wrapper as the product: a small skill menu, a knowledge layer, playbooks somebody owns, on a platform where iteration is configuration. The MCP is the back end. The wrapper is the difference between shipped and used.