Version Control for Agent Chains: Git Workflows That Work

Agent chains break in production. Prompts get tweaked. Agents get added, removed, reordered. Without version control, you're flying blind -- no history, no rollback, no way to know what changed when things went wrong.

If your chain definitions live in JSON files (and they should), Git gives you everything you need. Here's how to set up workflows that actually work.

Why agent chains need version control

Traditional software has clear reasons for version control: code changes, team coordination, deployment safety. Agent chains have the same reasons plus a few unique ones.

Reproducibility. When a chain produces unexpected output, you need to know the exact chain definition that ran. Not "whatever was on the server." The exact JSON, pinned to a commit hash.

Audit trail. In regulated industries, you need to prove what workflow was running at a specific time. Git gives you immutable history with timestamps and authors.

Rollback. A prompt change that seemed harmless in testing causes failures in production. With version control, you revert one commit and you're back to the working state.

Collaboration. Multiple engineers working on the same chain without stepping on each other. Branching and merging handle this natively.

Without version control, you end up with chains named content-pipeline-v2-final-FINAL-marco-edit.json. We've all been there.

JSON chains as first-class Git citizens

Mentiko defines chains in JSON. This is deliberate. JSON diffs cleanly. It's human-readable. It doesn't require a runtime to understand.

Your repository structure should treat chains as a first-class concern:

chains/
  production/
    content-pipeline.json
    support-triage.json
    daily-report.json
  staging/
    content-pipeline.json
    support-triage.json
  templates/
    research-and-summarize.json
    review-loop.json
agents/
  researcher.json
  writer.json
  reviewer.json

Each environment gets its own directory. Templates live separately. Agent definitions (prompts, models, parameters) are versioned alongside the chains that use them.

Branching strategy

Don't overthink this. Two patterns cover 90% of cases.

Feature branches for new chains

Building a new chain? Branch off main.

main
  └── feature/invoice-processing-chain

Develop the chain, test it in a dev environment, open a PR when it's ready. The PR is where the team reviews the chain structure before it runs in production.

Hotfix branches for broken chains

A chain is failing in production. You need to fix it now.

main
  └── hotfix/fix-content-pipeline-timeout

Fix the chain definition, test it against the production scenario that broke, merge directly to main with expedited review. Deploy immediately.

Avoid long-lived branches for chain development. Chains are configuration, not application code. They should merge fast and deploy fast.

PR reviews for chain changes

Code review for chains is different from code review for application logic. Here's what reviewers should actually check.

Agent ordering. Does the sequence make sense? Does Agent B actually need Agent A's output? Could any agents run in parallel instead of sequentially?

Trigger and emit events. Every agent should emit an event that the next agent triggers on. A broken event chain means agents that never execute. Check for typos in event names.

Prompt changes. The most common chain change is a prompt tweak. Reviewers should ask: does this change the agent's behavior scope? A prompt that goes from "summarize this document" to "summarize and critique this document" is a scope change, not a tweak.

Model selection. Someone swapped GPT-5.4 for GPT-5.4 Mini to save money. Does the task require the stronger model? Cost optimization is fine, but quality regression isn't.

Timeout and retry settings. Agent timeout changed from 30s to 120s. Why? Is the agent doing more work, or is it hanging?

New agents added. What permissions does the new agent need? What secrets does it access? Every new agent is a new security surface.

A good PR template for chain changes:

## What changed
[Which chain, what was modified]

## Why
[The problem this solves or the improvement this makes]

## Testing
[How you verified the chain works with this change]

## Rollback
[How to revert if this causes issues]

Automated validation in CI

Chain definitions are structured data. You can validate them automatically.

Schema validation. Every chain JSON should conform to a schema. CI runs a JSON Schema validator on every PR. Catches structural errors before they hit production.

# .github/workflows/validate-chains.yml
- name: Validate chain schemas
  run: |
    for chain in chains/**/*.json; do
      jsonschema -i "$chain" schemas/chain-schema.json
    done

Event connectivity check. Write a script that parses every chain and verifies that every triggers event has a corresponding emits event from another agent. Disconnected events mean dead agents.

Duplicate detection. Two chains listening for the same event with conflicting behavior. CI catches this before it causes race conditions in production.

Prompt linting. Optional but useful. Check that prompts don't contain hardcoded values that should be variables, don't reference agents by name that aren't in the chain, and don't exceed token limits.

These checks run in seconds. The cost of catching a broken chain in CI is near zero. The cost of catching it in production is your team's afternoon.

Rollback patterns

Two approaches to rolling back chain changes. Use both.

Git revert

The simplest rollback. A chain change broke production. Revert the commit.

git revert abc123
git push origin main

The chain definition returns to its previous state. The platform picks up the new (old) definition on the next run. Total rollback time: under a minute.

This works for any change -- agent additions, prompt tweaks, event rewiring. Git revert is always available and always reliable.

Chain versioning

For more granular control, maintain explicit versions alongside git history.

{
  "name": "content-pipeline",
  "version": "2.3.1",
  "agents": [...]
}

The version field lets you track chain iterations independently from git commits. A single commit might update the version from 2.3.0 to 2.3.1 along with the actual change. This is useful when you need to reference chain versions in logs, monitoring dashboards, or compliance reports.

Semantic versioning works here: major version for structural changes (agents added/removed), minor for behavior changes (prompt updates), patch for parameter tweaks (timeouts, retries).

Environment promotion

Chains should promote through environments, not deploy directly to production.

dev -> staging -> production

Dev. Engineers iterate here. Chains run against test data with cheaper models. Fast feedback loops. No review required for changes.

Staging. Mirrors production configuration. Uses the same models and similar data. Changes require PR approval before promotion to staging.

Production. Only accepts changes that have been validated in staging. Promotion is a merge from the staging branch to main, or a copy from the staging directory to the production directory.

The promotion workflow:

Engineer builds/modifies chain on a feature branch
Tests pass in dev environment
PR opened, team reviews the chain change
Merged to staging branch, chain runs in staging environment
After validation period (24-48 hours for critical chains), promoted to production
Monitoring confirms the chain performs as expected

For urgent fixes, the hotfix branch pattern bypasses staging with an expedited review. But this should be the exception, not the rule.

The practical minimum

If all of this feels like too much process, here's the minimum that every team should do:

Chain definitions in a git repository (not on a server somewhere)
Changes go through pull requests (at least one reviewer)
A way to revert to the previous version quickly (git revert)

Everything else -- CI validation, environment promotion, semantic versioning -- is useful but optional. Start with the basics. Add process when you need it.

The teams that skip version control for their chains always regret it. Usually after a Friday afternoon prompt change takes down a production workflow and nobody can remember what it looked like before.

Already version-controlling your chains? See the JSON chain definition guide or learn the design patterns.