AI Agent Cost Optimization: How to Cut LLM Spend Without Cutting Quality
Mentiko Team
LLM API costs are the variable cost of agent orchestration. Your platform fee is fixed ($29/month with Mentiko). Your API spend scales with usage. Here's how to optimize it.
Where the money goes
In a typical 4-agent chain, costs break down roughly as:
- Input tokens (60-70%): The context you send to the model
- Output tokens (25-35%): The response the model generates
- Overhead (5%): API call overhead, retries
The largest cost driver is input tokens, not output. This means the most effective optimization is reducing what you send to the model, not what it produces.
Strategy 1: Right-size your models
Not every agent needs the most capable model. Match model capability to task complexity:
| Task type | Model tier | Example | |---|---|---| | Classification | Fast/cheap | Ticket priority, sentiment, category | | Extraction | Fast/cheap | Pull fields from structured data | | Summarization | Mid-tier | Condense research, meeting notes | | Analysis | Capable | Trend identification, root cause | | Writing | Capable | Blog posts, reports, responses | | Reasoning | Top-tier | Architecture decisions, complex review |
A chain that uses Claude Haiku for classification, Claude Sonnet for summarization, and Claude Opus only for final analysis can cost 60-80% less than using Opus for everything.
Strategy 2: Compress your prompts
Long prompts are expensive. Most agent prompts contain unnecessary context.
Before (expensive):
You are an expert content editor with 20 years of experience in
digital marketing. Your job is to review blog posts for quality,
ensuring they meet our high standards for readability, accuracy,
and engagement. Please carefully review the following article and
provide detailed feedback on grammar, style, tone, factual accuracy,
and overall quality. The article should target a Flesch-Kincaid
grade level of 8-10 and maintain a professional but approachable tone.
After (cheaper, same results):
Edit this article. Check: grammar, clarity, factual accuracy.
Target: Flesch-Kincaid grade 8-10, professional tone.
Output: edited article + list of changes.
The model doesn't need your backstory. It needs clear instructions. Shorter prompts = fewer input tokens = lower cost.
Strategy 3: Summarize between agents
When agent A produces a 5,000-word research report and agent B needs to use it, don't pass the full report. Add a summarization step:
Agent A: Researcher (produces 5,000 words)
Agent A.5: Summarizer (condenses to 500 words)
Agent B: Writer (works from 500-word brief)
Agent B's input is 90% smaller. The summarization step costs a few cents but saves dollars on every subsequent agent.
Strategy 4: Cache repeated work
If you run the same chain daily, some agent outputs don't change:
- Company descriptions (stable for weeks)
- Style guides (stable for months)
- Historical data analysis (stable until new data arrives)
Cache these outputs and inject them as context instead of regenerating them every run. Mentiko's workspace file system is perfect for this: agents read from cached files and only regenerate when the cache is stale.
Strategy 5: Limit output length
Agents that produce more output than needed waste tokens. Constrain them:
"Summarize in exactly 3 bullet points, max 50 words each."
vs.
"Summarize the findings."
The constrained prompt produces predictable-length output that costs the same every time. The unconstrained prompt might produce 100 words or 1,000.
Strategy 6: Use structured output
JSON output is typically shorter than prose for the same information:
Prose: "The sentiment of this review is positive, with a confidence score of 87%. The main topics discussed are product quality and customer service. The reviewer recommends the product."
JSON: {"sentiment":"positive","confidence":0.87,"topics":["quality","service"],"recommends":true}
The JSON version carries the same information in fewer tokens. And it's easier for the next agent to parse.
The optimization workflow
- Measure first. Track per-agent costs before optimizing. You can't improve what you don't measure.
- Optimize the most expensive agent first. Usually the one with the largest input context.
- Downgrade models where possible. Test cheaper models on each agent. Keep quality gates to catch regressions.
- Compress prompts. Remove backstory, examples, and redundant instructions.
- Add summarization steps. Cheaper than passing full context between agents.
- Cache stable data. Don't regenerate what hasn't changed.
Real numbers
A 4-agent content chain optimized with these strategies:
| Optimization | Before | After | Savings | |---|---|---|---| | Model right-sizing | $0.45/run | $0.18/run | 60% | | Prompt compression | $0.18/run | $0.14/run | 22% | | Summarization step | $0.14/run | $0.10/run | 29% | | Combined | $0.45/run | $0.10/run | 78% |
At 30 runs/month: from $13.50 to $3.00. At 1,000 runs/month: from $450 to $100.
The platform cost ($29/month) stays the same regardless. Optimization only affects the variable API spend.
Want to run cost-optimized agent chains? Get started or see the pricing math.
Get new posts in your inbox
No spam. Unsubscribe anytime.