Version Control for Agent Chains: Git Workflows That Work
Mentiko Team
Agent chains break in production. Prompts get tweaked. Agents get added, removed, reordered. Without version control, you're flying blind -- no history, no rollback, no way to know what changed when things went wrong.
If your chain definitions live in JSON files (and they should), Git gives you everything you need. Here's how to set up workflows that actually work.
Why agent chains need version control
Traditional software has clear reasons for version control: code changes, team coordination, deployment safety. Agent chains have the same reasons plus a few unique ones.
Reproducibility. When a chain produces unexpected output, you need to know the exact chain definition that ran. Not "whatever was on the server." The exact JSON, pinned to a commit hash.
Audit trail. In regulated industries, you need to prove what workflow was running at a specific time. Git gives you immutable history with timestamps and authors.
Rollback. A prompt change that seemed harmless in testing causes failures in production. With version control, you revert one commit and you're back to the working state.
Collaboration. Multiple engineers working on the same chain without stepping on each other. Branching and merging handle this natively.
Without version control, you end up with chains named content-pipeline-v2-final-FINAL-marco-edit.json. We've all been there.
JSON chains as first-class Git citizens
Mentiko defines chains in JSON. This is deliberate. JSON diffs cleanly. It's human-readable. It doesn't require a runtime to understand.
Your repository structure should treat chains as a first-class concern:
chains/
production/
content-pipeline.json
support-triage.json
daily-report.json
staging/
content-pipeline.json
support-triage.json
templates/
research-and-summarize.json
review-loop.json
agents/
researcher.json
writer.json
reviewer.json
Each environment gets its own directory. Templates live separately. Agent definitions (prompts, models, parameters) are versioned alongside the chains that use them.
Branching strategy
Don't overthink this. Two patterns cover 90% of cases.
Feature branches for new chains
Building a new chain? Branch off main.
main
└── feature/invoice-processing-chain
Develop the chain, test it in a dev environment, open a PR when it's ready. The PR is where the team reviews the chain structure before it runs in production.
Hotfix branches for broken chains
A chain is failing in production. You need to fix it now.
main
└── hotfix/fix-content-pipeline-timeout
Fix the chain definition, test it against the production scenario that broke, merge directly to main with expedited review. Deploy immediately.
Avoid long-lived branches for chain development. Chains are configuration, not application code. They should merge fast and deploy fast.
PR reviews for chain changes
Code review for chains is different from code review for application logic. Here's what reviewers should actually check.
Agent ordering. Does the sequence make sense? Does Agent B actually need Agent A's output? Could any agents run in parallel instead of sequentially?
Trigger and emit events. Every agent should emit an event that the next agent triggers on. A broken event chain means agents that never execute. Check for typos in event names.
Prompt changes. The most common chain change is a prompt tweak. Reviewers should ask: does this change the agent's behavior scope? A prompt that goes from "summarize this document" to "summarize and critique this document" is a scope change, not a tweak.
Model selection. Someone swapped GPT-5.4 for GPT-5.4 Mini to save money. Does the task require the stronger model? Cost optimization is fine, but quality regression isn't.
Timeout and retry settings. Agent timeout changed from 30s to 120s. Why? Is the agent doing more work, or is it hanging?
New agents added. What permissions does the new agent need? What secrets does it access? Every new agent is a new security surface.
A good PR template for chain changes:
## What changed
[Which chain, what was modified]
## Why
[The problem this solves or the improvement this makes]
## Testing
[How you verified the chain works with this change]
## Rollback
[How to revert if this causes issues]
Automated validation in CI
Chain definitions are structured data. You can validate them automatically.
Schema validation. Every chain JSON should conform to a schema. CI runs a JSON Schema validator on every PR. Catches structural errors before they hit production.
# .github/workflows/validate-chains.yml
- name: Validate chain schemas
run: |
for chain in chains/**/*.json; do
jsonschema -i "$chain" schemas/chain-schema.json
done
Event connectivity check. Write a script that parses every chain and verifies that every triggers event has a corresponding emits event from another agent. Disconnected events mean dead agents.
Duplicate detection. Two chains listening for the same event with conflicting behavior. CI catches this before it causes race conditions in production.
Prompt linting. Optional but useful. Check that prompts don't contain hardcoded values that should be variables, don't reference agents by name that aren't in the chain, and don't exceed token limits.
These checks run in seconds. The cost of catching a broken chain in CI is near zero. The cost of catching it in production is your team's afternoon.
Rollback patterns
Two approaches to rolling back chain changes. Use both.
Git revert
The simplest rollback. A chain change broke production. Revert the commit.
git revert abc123
git push origin main
The chain definition returns to its previous state. The platform picks up the new (old) definition on the next run. Total rollback time: under a minute.
This works for any change -- agent additions, prompt tweaks, event rewiring. Git revert is always available and always reliable.
Chain versioning
For more granular control, maintain explicit versions alongside git history.
{
"name": "content-pipeline",
"version": "2.3.1",
"agents": [...]
}
The version field lets you track chain iterations independently from git commits. A single commit might update the version from 2.3.0 to 2.3.1 along with the actual change. This is useful when you need to reference chain versions in logs, monitoring dashboards, or compliance reports.
Semantic versioning works here: major version for structural changes (agents added/removed), minor for behavior changes (prompt updates), patch for parameter tweaks (timeouts, retries).
Environment promotion
Chains should promote through environments, not deploy directly to production.
dev -> staging -> production
Dev. Engineers iterate here. Chains run against test data with cheaper models. Fast feedback loops. No review required for changes.
Staging. Mirrors production configuration. Uses the same models and similar data. Changes require PR approval before promotion to staging.
Production. Only accepts changes that have been validated in staging. Promotion is a merge from the staging branch to main, or a copy from the staging directory to the production directory.
The promotion workflow:
- Engineer builds/modifies chain on a feature branch
- Tests pass in dev environment
- PR opened, team reviews the chain change
- Merged to staging branch, chain runs in staging environment
- After validation period (24-48 hours for critical chains), promoted to production
- Monitoring confirms the chain performs as expected
For urgent fixes, the hotfix branch pattern bypasses staging with an expedited review. But this should be the exception, not the rule.
The practical minimum
If all of this feels like too much process, here's the minimum that every team should do:
- Chain definitions in a git repository (not on a server somewhere)
- Changes go through pull requests (at least one reviewer)
- A way to revert to the previous version quickly (git revert)
Everything else -- CI validation, environment promotion, semantic versioning -- is useful but optional. Start with the basics. Add process when you need it.
The teams that skip version control for their chains always regret it. Usually after a Friday afternoon prompt change takes down a production workflow and nobody can remember what it looked like before.
Already version-controlling your chains? See the JSON chain definition guide or learn the design patterns.
Get new posts in your inbox
No spam. Unsubscribe anytime.