File-Based Events vs Message Queues for Agent Orchestration

Every orchestration system needs a way for agents to communicate. "Agent A finished, Agent B should start." The obvious choice is a message queue -- Redis, RabbitMQ, Kafka, NATS. That's what every distributed systems textbook recommends.

Mentiko uses files instead. Here's why.

How file-based events work

When a Mentiko agent completes its work, it writes an event file:

{"event": "research:complete", "status": "success", "output": "/workspace/research.md"}

The orchestration layer watches the events directory. When a new .event file appears, it reads the event name, finds which agent triggers on that event, and launches it.

That's the entire event system. Write a file. Watch a directory. Launch a process.

Why not a message queue?

Message queues solve real problems in distributed systems: guaranteed delivery, ordering, fan-out, dead letter queues, backpressure. They're essential when you have hundreds of microservices communicating across a network.

Agent orchestration isn't that system.

Agents run on one machine

Mentiko gives each customer their own isolated instance. Agents run on the same machine. There's no network partition to worry about. No distributed coordination problem. The filesystem is the most reliable communication channel on a single machine.

The event volume is low

A 4-agent chain produces 4 events per run. Even running 100 chains per day, that's 400 events. Redis handles millions per second. You don't need a Formula 1 car to drive to the grocery store.

Debugging is everything

When a chain fails at 3am, you need to understand what happened. With file-based events:

ls events/                          # see all events
cat events/research-complete.event  # read the event
grep "status" events/*.event        # find failures
git diff events/                    # compare with last run

With a message queue, you need a queue management UI, or you're writing consumer scripts to peek at messages, or you're reading logs that show message IDs you can't correlate without another tool.

Files are transparent. Message queues are opaque.

No infrastructure dependency

File events need: a filesystem. Every computer has one.

Redis needs: a Redis server, connection configuration, monitoring for the Redis server itself, restart handling, memory management, persistence configuration.

RabbitMQ needs: a RabbitMQ server, exchange/queue topology, connection pooling, message serialization, dead letter configuration.

Kafka needs: a Kafka cluster, ZooKeeper (or KRaft), topic configuration, consumer groups, offset management.

For each of these, you're adding infrastructure that can fail independently from your agents. More infrastructure = more failure modes.

Format flexibility

Mentiko's event parser is intentionally forgiving. It tries JSON, then YAML, then markdown frontmatter. If all fail, the filename itself is the event.

This means agents can write events however they want. A Python agent writes JSON. A bash agent writes YAML. A Claude Code agent writes markdown. They all work.

Try writing YAML to a Redis pub/sub channel and having a consumer that expects JSON. Message queues enforce format. Files don't care.

The tradeoffs

Being honest about what we give up:

No guaranteed ordering across agents

If two agents emit events simultaneously, the order they're processed depends on filesystem timing. For sequential chains, this doesn't matter (each agent waits for the previous one). For fan-in patterns, we use timestamp-based ordering.

Message queues provide strict ordering guarantees. Files provide eventual consistency. For agent orchestration, eventual consistency is fine.

No built-in retry semantics

Message queues can redeliver failed messages. With files, retry logic is in the chain runner -- if an agent fails, the runner decides whether to retry, route to a fallback, or alert a human.

This is actually a feature: the retry logic is explicit in your chain definition, not hidden in queue configuration.

No cross-machine communication

Files only work on one machine. If you need agents running on different servers, you need a network transport.

For Mentiko's model (isolated instances, agents on one machine), this isn't a limitation. For distributed agent swarms, you'd need something else.

No backpressure

A message queue can slow down producers when consumers can't keep up. With files, events accumulate in the directory. The watchdog detects stalled chains, but there's no automatic flow control.

In practice, agent chains are I/O bound (waiting for LLM API responses), not throughput bound. Backpressure hasn't been a real problem.

When to use message queues

If you're building a system where:

Agents run on different machines across a network
Event volume is thousands per second
You need strict ordering guarantees
You need durable message storage across restarts

Then yes, use a message queue. That's what they're for.

But if your agents run on one machine, produce a handful of events per chain, and you value debuggability over theoretical guarantees -- files are the better tool.

The simplicity argument

The strongest argument for file-based events isn't any individual technical advantage. It's the aggregate simplicity.

No infrastructure to deploy. No connection to manage. No serialization to configure. No monitoring for the event system itself. No failure modes beyond "the disk is full" (which breaks everything anyway).

The event system should be invisible. It should never be the thing that breaks. Files achieve this by being the simplest possible implementation.

Want to see file-based events in action? Read the complete events guide or build your first chain.