Gartner published a figure suggesting around 40 percent of enterprise AI agent projects get cancelled by 2027. From our seat building agent systems across DACH and EU, the cancellations are not mysterious. They cluster into 8 failure modes. Each one has a tell, each kills the project in a different way, and most have a cheap fix if you catch them in the first month instead of the sixth. This post is the post-mortem we wish someone had written before our first agent build.
The evidence base. Wavect engagements on agent and AI products including Twinsoft AI, PromptID, Quivr, and Hyperstate AI (shipped successfully; later ran out of funding after launch, not a product or tech failure).
Agent project at risk?
Book Free ConsultationThe tell. Two confident wrong answers in a demo. The team patches the prompt. Next demo, two more wrong answers in different shape.
How it kills the project. Trust does not decay linearly. One demonstrably wrong answer in front of an exec is worth ten silent successes. The agent becomes "that AI thing that lies" and budget walks.
The cheap fix if caught early. Constrain the action space. An agent that says "I cannot answer this from the provided sources, here is the closest human in the loop" beats an agent that confabulates. Build the refusal path before the happy path.
The tell. P50 latency looks fine in isolation. P95 user-facing latency on multi-step tasks is 25 to 45 seconds.
How it kills the project. Users abandon the agent for the manual flow they were trying to replace. Adoption flatlines. The CFO asks why we are paying for tokens nobody uses.
The cheap fix. Measure tail latency per tool call from week one. Parallelize tool calls where order is not load-bearing. Cache idempotent reads. Pick an LLM tier per step, not per agent. The cheapest model that satisfies the task wins.
The tell. The team ships a prompt change. Nobody knows if it improved anything. Vibes-based regression testing on a Slack thread.
How it kills the project. Without evals, every change is a bet. The system drifts. After eight sprints, nobody trusts the agent enough to expose it to real users. The project quietly stops getting prioritized.
The cheap fix. TDD for agents. Build the eval harness in sprint one. Golden-set tests for the top 20 user intents. Pass-rate as a deployment gate. We have written about this in our broader QA practice and it applies double for agents.
The tell. The first invoice from the LLM provider is fine. The third invoice is 12x.
How it kills the project. The CFO asks for the unit economics. Cost per resolved ticket exceeds gross margin. The agent is technically successful and commercially dead.
The cheap fix. Track cost-per-action from day one. Model selection per step. Aggressive prompt-shortening. Caching of static context. RAG with smaller embeddings beats stuffing 200k tokens of context into the prompt. We have seen 4 to 8x cost reductions from architecture choices that took a week to implement.
The tell. The agent works for 80 percent of cases. The other 20 percent have no escape hatch. Users complain to support. Support cannot see what the agent did.
How it kills the project. Customer-facing teams build a parallel workaround. The agent becomes a Tier-0 they route around. The cost of operating both flows kills the case for either.
The cheap fix. Design the handoff before the autonomy. Every agent action logged with full context. One-click escalation to a human with the conversation history attached. Clear policy on what the agent must defer.
The tell. The agent returns wrong answers from the knowledge base. The team tunes the prompt. Nothing improves.
How it kills the project. The team is fixing the wrong layer. The source data is stale, contradictory, or wrong. No prompt fixes that. Months disappear into prompt engineering on rotten foundations.
The cheap fix. Audit the source corpus before scaling the agent. Owner per document, refresh cadence, contradiction detection. The fastest path to a useful agent is often a cleaner data pipeline, not a smarter model.
The tell. The roadmap reads "the agent will handle support, sales qualification, internal knowledge lookup, scheduling, and contract review."
How it kills the project. Each capability competes for prompt budget, tool budget, eval budget. None of them gets good. The team optimizes for the demo and ships an agent that is mediocre at nine things.
The cheap fix. One agent, one job, one eval. Ship narrow. Add capabilities only after the previous one passes its eval at the production bar. Composition over conflation.
The tell. The agent ships. Two weeks later, legal asks "where is the audit log?" and "how do we handle a GDPR Art. 22 objection?"
How it kills the project. The agent gets pulled from production until the gap closes. The team retrofits compliance for six weeks. Momentum dies.
The cheap fix. Audit log as a first-class data structure, not a console.log. MCP tool calls logged with input, output, model version, timestamp, operator. Human-override surface that records who overrode what and why. We covered the artifact layer in our companion post on stacking GDPR and AI Act compliance.

"Evals are the only honest measure of an agent. Everything else is a demo with cherry-picked queries."
From our experience the failure modes do not appear in isolation. They cluster. The most common combinations we see in stuck projects:
| Cluster | Failure modes that travel together | What it looks like |
|---|---|---|
| The Demo-to-Production Cliff | 1, 3, 7 | Great demo, no evals, agent scope kept growing, production launch reveals hallucinations on real queries |
| The Silent Cost Death | 2, 4 | Latency tolerable, costs invisible until the third monthly invoice, unit economics never modeled |
| The Operations Reject | 5, 8 | No handoff, no audit trail, ops team refuses to take ownership, agent stays in pilot forever |
| The Data-Layer Mirage | 3, 6 | Months of prompt tuning on a broken corpus, team blames the model, the data is the problem |
Three discipline moves we have seen consistently. None are exotic.
Hyperstate AI shipped. Then the company ran out of funding after launch, which is a fundraising failure, not a product or tech failure. The point. Even a clean technical execution does not save a project from external causes. But sloppy execution guarantees cancellation regardless of capital.
Agent projects fail in predictable ways. Hallucination, latency, eval-debt, cost runaway, missing handoff, dirty data, scope greed, audit gaps. None of these are exotic problems. All of them have cheap fixes if caught in the first month and expensive ones if caught in the sixth.
If you are building an agent in DACH or EU in 2026, run your current project against the 8 modes above. The honest answer to which ones you are exposed to is also the highest-leverage backlog for the next sprint. The 40 percent cancellation number is not destiny. It is what happens when teams skip the eval harness, ignore the cost dashboard, and design autonomy before handoff.
Need a second pair of eyes on your agent build?
Book Free Consultation