Cost Allocation for Agent Chains: Charging Back by Team
Mentiko Team
Your agent platform is running. Multiple teams are using it. The monthly LLM bill arrives and nobody knows who spent what. Engineering says it's the data team. The data team points at marketing. Finance wants a breakdown you can't produce.
This is the cost allocation problem, and it gets worse the more successful your agent platform becomes. Here's how to solve it before it becomes political.
Why cost allocation matters for agent chains
Agent chains consume real money every time they run. LLM tokens, compute time, external API calls -- these costs accumulate across teams, and without attribution, you get two failure modes:
-
No accountability. Teams build expensive chains because they don't see the bill. A 6-agent chain using Claude Sonnet for every step runs at $0.40+ per execution. Run it 500 times a day and you're burning $6,000/month on a single chain.
-
Platform defunding. Finance can't justify the spend because they can't tie it to business outcomes. The AI platform becomes a line item nobody wants to own.
Cost allocation solves both. Teams see their spend, optimize their chains, and finance can tie costs to the teams generating value.
The cost components to track
Every chain run produces costs across four dimensions:
LLM token costs
The largest variable cost. Track per-agent, per-run:
- Input tokens consumed
- Output tokens produced
- Model used (pricing varies 10-100x between models)
- Total cost = (input_tokens * input_price) + (output_tokens * output_price)
Most LLM providers return token counts in the API response. Capture these in your run metadata. If you're self-hosting models, calculate equivalent costs based on your GPU amortization rate.
Compute costs
The infrastructure your agents run on. This is harder to attribute because it's shared:
- Agent execution time (seconds of CPU/memory)
- Workspace provisioning (Docker containers, SSH sessions)
- Storage for event files, logs, and artifacts
For shared infrastructure, allocate proportionally by execution time. If Team A consumed 60% of total agent-seconds this month, they get 60% of the compute bill.
External API costs
Agents that call third-party APIs incur additional costs:
- Search APIs (Google, Bing, Brave)
- Data APIs (financial data, weather, etc.)
- SaaS APIs (Slack, Jira, GitHub)
- Storage APIs (S3, GCS)
Track these per-chain. The agent making the API call should log the request and any associated cost.
Platform costs
The orchestration platform itself. For flat-rate platforms like Mentiko ($29/month), this is straightforward to split. For per-execution platforms, attribute the per-run fee to the team that owns the chain.
Building the tracking layer
You need three things: tagging, metering, and reporting.
Tagging
Every chain needs ownership metadata:
{
"chain": "content-research-pipeline",
"team": "marketing",
"department": "growth",
"cost_center": "CC-4200",
"environment": "production"
}
Enforce tagging at chain creation time. If a chain doesn't have a team tag, it shouldn't deploy. This is a policy decision, not a technical one -- but it's the most important one.
In Mentiko, chain metadata is stored in the chain definition file. Add your cost allocation tags there:
name: content-research-pipeline
tags:
team: marketing
cost_center: CC-4200
agents:
- name: researcher
model: claude-haiku
- name: synthesizer
model: claude-sonnet
Metering
Capture cost data on every run. The minimum viable meter records:
- Run ID
- Chain name
- Team tag
- Timestamp
- Per-agent token counts and model used
- Per-agent execution time
- Total computed cost
Store this as structured data. A simple approach: write a JSON line to a cost log after each run completes.
# After run completes, append cost record
echo '{"run":"run-abc123","chain":"content-research","team":"marketing","cost":0.14,"tokens":{"input":12400,"output":4200},"duration_s":18,"timestamp":"2026-03-19T14:22:00Z"}' >> costs/2026-03.jsonl
File-based cost logs work well at moderate scale. At high volume, stream to a database or analytics platform.
Reporting
Build three views:
Daily team summary. Each team sees their total spend, broken down by chain. This is the accountability layer -- teams can see which chains are expensive and optimize them.
Monthly department rollup. Aggregate team costs to department level for finance. This is the chargeback layer -- the number that goes on the internal invoice.
Chain-level drill-down. Per-chain cost over time, with per-agent breakdown. This is the optimization layer -- engineers use it to identify expensive agents and swap models.
Chargeback models
There are three common approaches to actually charging teams for their usage:
Direct allocation
Each team pays exactly what they consumed. Simple, fair, transparent.
Works when: Teams have direct budget control and can absorb variable costs.
Fails when: A team has a spike month (ran a big backfill, tested a new chain heavily) and blows their budget. Direct allocation can discourage experimentation.
Tiered allocation
Set usage tiers with fixed monthly rates. Team gets X runs per month for $Y. Overages charged at a per-run rate.
Works when: Teams need predictable budgets but you still want usage-based fairness.
Fails when: Tiers are set wrong and teams consistently over- or under-use their allocation.
Shared pool with proportional split
Total platform cost is split proportionally by usage. If the total bill is $2,000 and marketing used 35% of runs, they pay $700.
Works when: The total spend is manageable and you want simplicity.
Fails when: One team's heavy usage raises costs for everyone. The data team running 10,000 daily chains makes the marketing team's 50 daily chains more expensive per-unit.
Our recommendation
Start with direct allocation. It's the most transparent and creates the right incentives. Teams that use more, pay more. Teams that optimize, save money. The feedback loop is immediate and clear.
Budget alerts and guardrails
Cost tracking without limits is just accounting. You need guardrails:
Per-team budget caps
Set a monthly budget per team. When a team hits 80% of their budget, alert the team lead. At 100%, you have two options: hard stop (chains stop running) or soft stop (chains continue but alerts escalate).
Hard stops are safer for cost control but risky for production chains. A reasonable middle ground: hard stop on development chains, soft stop on production chains with escalation to the team lead and platform admin.
Per-chain cost limits
Set a maximum cost per run. If a chain exceeds $X per execution, kill it. This catches runaway chains -- an agent stuck in a loop burning tokens, or a chain processing unexpectedly large input.
name: data-enrichment-pipeline
cost_limit:
per_run: 2.00
daily: 100.00
monthly: 2000.00
Anomaly detection
Track the rolling average cost per chain. If a run costs 3x the average, flag it. This catches gradual cost drift (prompts getting longer, models being swapped) and sudden spikes (bad input causing retries).
A simple implementation: compare each run's cost to the 30-day moving average for that chain. If it exceeds 2 standard deviations, log a warning. If it exceeds 3, alert the team.
Optimization feedback loop
Cost allocation only works if teams can act on the data. Give them levers:
Model selection per agent
The biggest cost lever. A chain using Haiku for classification and Sonnet only for synthesis costs 50-70% less than all-Sonnet. Show teams which agents use which models and what the cost difference would be with alternatives.
Caching
If the same input produces the same output, cache it. Semantic caching (similar inputs map to cached outputs) can reduce token costs by 30-60% for repetitive workloads. Track cache hit rates alongside costs so teams can see the savings.
Chain consolidation
Teams often build multiple chains that do similar things. A cost report that shows three teams each running their own "summarize document" chain is a signal to build one shared chain.
Run frequency review
Some chains run on cron schedules that were set once and never revisited. A daily chain that only needs to run weekly is burning 6x more than necessary. Monthly cost reports make this visible.
Implementation timeline
Start with visibility, then add control, then incentives:
Weeks 1-2: Add team tags to all chains. Implement per-run cost metering.
Weeks 3-4: Build daily team summary reports. Set initial budget caps and alerts (generous at first, tighten over time).
Month 2-3: Add per-chain drill-downs. Implement the chargeback model.
The hardest part isn't technical -- it's getting teams to care. Make costs visible by default (on the run detail page, not buried in logs). Celebrate optimization wins. And don't punish experimentation -- dev environments should have generous budgets so teams keep building new chains.
Cost allocation turns your agent platform from a shared expense into an investment with clear returns per team. Without it, the platform is always one budget review away from being cut.
Want to understand the full cost picture first? Read The Real Cost of Running AI Agent Chains in 2026 or learn about flat-rate vs per-execution pricing.
Get new posts in your inbox
No spam. Unsubscribe anytime.