Agent Orchestration Anti-Patterns: What Not to Do
Mentiko Team
We've reviewed hundreds of agent chain configurations from early Mentiko users, teams building on other platforms, and our own internal projects. Some patterns look clever during development and then detonate in production. Here are the anti-patterns we see most often and how to fix them.
Anti-pattern 1: The God Agent
A single agent with a 2,000-word prompt that handles extraction, analysis, formatting, quality checking, and output delivery. It does everything. It does nothing well.
{
"name": "do-everything",
"agents": [
{
"name": "god-agent",
"prompt": "Read the input document. Extract key information. Analyze the data for trends. Format the results as a report. Check the report for accuracy. If accuracy is below 90%, revise. Then email the report to the stakeholders and log the results to the database.",
"triggers": ["chain:start"],
"emits": ["chain:complete"]
}
]
}
Why it fails: LLMs perform worse as prompt complexity increases. When you ask a model to do six things in one prompt, it prioritizes the first few instructions and rushes the rest. Error handling becomes impossible because you can't isolate which step went wrong. Retries re-run everything from scratch.
The fix: one agent, one job. Split the god agent into a pipeline where each agent handles a single responsibility. Yes, this means more agents and more event handoffs. That's the point. Each agent is testable, debuggable, and replaceable independently.
{
"name": "proper-pipeline",
"agents": [
{ "name": "extractor", "prompt": "Extract key information from the input document...", "triggers": ["chain:start"], "emits": ["extraction:complete"] },
{ "name": "analyzer", "prompt": "Analyze the extracted data for trends...", "triggers": ["extraction:complete"], "emits": ["analysis:complete"] },
{ "name": "formatter", "prompt": "Format the analysis as a structured report...", "triggers": ["analysis:complete"], "emits": ["report:ready"] },
{ "name": "reviewer", "prompt": "Check the report for accuracy...", "triggers": ["report:ready"], "emits": ["review:approved", "review:revision-needed"] },
{ "name": "deliverer", "prompt": "Email the report and log to database...", "triggers": ["review:approved"], "emits": ["chain:complete"] }
]
}
More agents, more clarity, fewer production incidents.
Anti-pattern 2: Over-chaining
The opposite extreme. Every trivial operation gets its own agent. An agent to read a file. An agent to count words. An agent to format a date. An agent to append a string. You end up with a 15-agent chain that does what a single script could do in 10 lines.
Why it fails: each agent invocation costs money (LLM API call), adds latency (model inference time), and introduces a potential failure point (API timeout, rate limit, malformed event). When you chain 15 agents for a task that doesn't need AI reasoning at each step, you're paying for intelligence where none is required.
The fix: use agents for tasks that need language understanding, reasoning, or generation. Use regular code for everything else. Mentiko agents can execute bash scripts -- put your deterministic logic in scripts and reserve agents for the steps that actually need a model.
A rule of thumb: if you can write the step as a bash one-liner or a 20-line Python script with zero ambiguity about the expected output, it doesn't need to be an agent. If the step requires understanding unstructured text, making a judgment call, or generating natural language, it's an agent.
Anti-pattern 3: The Unbounded Loop
A writer agent and a reviewer agent in an iterative loop with no iteration limit:
{
"name": "infinite-polish",
"agents": [
{
"name": "writer",
"prompt": "Write or revise the content based on reviewer feedback.",
"triggers": ["chain:start", "review:needs-revision"],
"emits": ["draft:ready"]
},
{
"name": "reviewer",
"prompt": "Review the draft. If it's perfect, approve. Otherwise request revision.",
"triggers": ["draft:ready"],
"emits": ["review:approved", "review:needs-revision"]
}
]
}
Why it fails: the reviewer will almost always find something to improve. LLMs are good at finding flaws. Without an iteration cap, this loop runs until you hit your API spend limit or the context window overflows. We've seen loops hit 40+ iterations before someone noticed.
The fix: always set max_iterations on the reviewing agent. Three iterations is a good default for most content workflows. After three passes, if the content isn't good enough, the problem is in the prompt engineering, not the iteration count.
{
"name": "reviewer",
"prompt": "Review the draft...",
"triggers": ["draft:ready"],
"emits": ["review:approved", "review:needs-revision"],
"max_iterations": 3
}
Also consider adding a "good enough" threshold rather than a "perfect" standard. Tell the reviewer to approve if the draft meets 8/10 quality criteria, not 10/10. Perfection is the enemy of shipping, even for AI.
Anti-pattern 4: Ignoring costs until the bill arrives
You build a chain during development using GPT-5.4 or Claude Opus for every agent. It works beautifully. You deploy to production where it runs 500 times a day. Your first monthly API bill is $3,200.
Why it fails: development volume and production volume are different by orders of magnitude. The model that makes sense for 10 test runs doesn't necessarily make sense for 15,000 production runs.
The fix: right-size models per agent. Not every agent in a chain needs the most capable model. A classifier agent that routes tickets into three categories works fine on a smaller model. A summarizer that extracts key points from a document doesn't need the most expensive option. Reserve your most capable model for agents that require complex reasoning -- the analyzer, the risk scorer, the synthesizer.
In Mentiko, you can set the model per agent:
{
"name": "classifier",
"model": "gpt-5.4-mini",
"prompt": "Classify this ticket as: bug, feature, or question."
},
{
"name": "analyzer",
"model": "claude-sonnet-4-20250514",
"prompt": "Analyze the classified ticket and determine impact..."
}
A chain with mixed models can cut costs by 60-80% compared to running every agent on the most expensive model, with negligible quality impact on the simpler agents.
Anti-pattern 5: No error handling
The chain runs perfectly in development. In production, the third agent hits a rate limit and the whole chain crashes. No fallback. No retry. No useful error message.
Why it fails: every external dependency can fail. In a 5-agent chain with 100 runs per day, you'll hit failures regularly.
The fix: use on_error events to wire up fallback agents:
{
"name": "primary-summarizer",
"model": "claude-sonnet-4-20250514",
"prompt": "Summarize this document.",
"triggers": ["chain:start"],
"emits": ["summary:complete"],
"on_error": "summary:failed"
},
{
"name": "fallback-summarizer",
"model": "gpt-5.4-mini",
"prompt": "Summarize this document using a simpler approach.",
"triggers": ["summary:failed"],
"emits": ["summary:complete"]
}
At minimum, every chain should have error events that produce meaningful log output. Critical agents should have fallback paths that keep the chain running with degraded quality rather than total failure.
Anti-pattern 6: Prompt coupling
Agent B's prompt references the exact format of Agent A's output: "Parse the JSON from the previous agent where the key 'findings' contains an array of objects with 'title' and 'body' fields."
Why it fails: when you change Agent A's output format, Agent B breaks silently. It doesn't crash -- it just produces garbage because it's parsing the wrong structure. In a 5-agent chain, format changes cascade and create debugging nightmares.
The fix: define explicit schemas for event payloads. Document what each event contains. When agents reference upstream output, they reference the schema, not the specific agent.
Better yet, add a validation step between agents that checks the event payload against the expected schema before the downstream agent starts. Schema violations become clear error messages instead of silent corruption.
Anti-pattern 7: Testing in production
You build the chain, deploy it, and discover problems through customer complaints.
Why it fails: agent chains are non-deterministic systems interacting with external services. Testing only in production means every bug is a production incident.
The fix: build a testing pipeline. Run each agent in isolation with recorded inputs and verify output. Run the full chain with test data before deploying. Record production inputs as regression tests. Mentiko supports workspace isolation for this -- create a test workspace, validate there, then promote to production.
The meta-lesson
Every anti-pattern on this list comes from applying the wrong mental model. God agents come from thinking of agents like functions. Over-chaining comes from thinking of agents like microservices. Unbounded loops come from thinking of agents like compilers that converge to a correct answer.
Agents are more like junior employees. Give them clear, scoped instructions. Check their work. Don't ask them to do six things at once. Don't assume they'll get it right the first time. And always have a plan for when they mess up.
Build your chains with these anti-patterns in mind and you'll avoid the most common production failures we see across the platform.
Want the positive patterns? Read the 5 chain patterns guide or the design patterns overview.
Get new posts in your inbox
No spam. Unsubscribe anytime.