Skip to content
← all posts
7 min read

Concurrency Patterns for Agent Chains: Fan-Out, Fan-In, and Race

Mentiko Team

Running agents one at a time is easy to reason about and terrible for performance. A five-agent chain where each agent takes 30 seconds means your pipeline takes two and a half minutes -- even though four of those agents have no dependency on each other and could run simultaneously.

Concurrency patterns fix this. They let you express which agents can run in parallel, how their outputs combine, and what happens when parallel agents compete for the same resource. Here's how to think about concurrency in agent chains, and the specific patterns Mentiko supports for making it work.

Why Sequential Is the Wrong Default

Most agent orchestration tools default to sequential execution because it's simple. Agent A finishes, Agent B starts. But real-world workflows are rarely purely linear. Consider an onboarding pipeline: you need to provision a cloud workspace, generate a welcome email, create accounts in three SaaS tools, and seed a project template. None of those depend on each other. Running them sequentially means the user waits 5x longer than necessary.

Sequential execution is appropriate when there's a true data dependency -- Agent B literally cannot start until Agent A produces its output. For everything else, you're leaving performance on the table.

Fan-Out: One Event, Multiple Agents

Fan-out is the simplest concurrency pattern. One agent emits an event, and multiple downstream agents trigger on that same event simultaneously.

{
  "name": "customer-onboarding",
  "agents": [
    {
      "name": "intake",
      "prompt": "Validate the new customer record and prepare onboarding context.",
      "triggers": ["chain:start"],
      "emits": ["customer:validated"]
    },
    {
      "name": "provision-workspace",
      "prompt": "Create a cloud workspace for the customer using the provisioning API.",
      "triggers": ["customer:validated"],
      "emits": ["task:complete"]
    },
    {
      "name": "generate-welcome-email",
      "prompt": "Draft a personalized welcome email based on the customer's plan and industry.",
      "triggers": ["customer:validated"],
      "emits": ["task:complete"]
    },
    {
      "name": "seed-project",
      "prompt": "Create a starter project in the customer workspace with sample data.",
      "triggers": ["customer:validated"],
      "emits": ["task:complete"]
    },
    {
      "name": "create-saas-accounts",
      "prompt": "Provision accounts in Slack, Linear, and GitHub for the customer.",
      "triggers": ["customer:validated"],
      "emits": ["task:complete"]
    }
  ]
}

When intake emits customer:validated, all four downstream agents start at the same time. They each get their own execution context, their own event file, and they run independently. If provision-workspace takes 45 seconds and generate-welcome-email takes 3 seconds, that's fine -- they don't block each other.

The key constraint: fan-out agents must be genuinely independent. If seed-project needs the workspace to exist first, it can't run in parallel with provision-workspace. Move it downstream.

Fan-In: Collecting Parallel Results

Fan-out without fan-in is just scattering work into the wind. Fan-in is how you gather the results. A collector agent triggers on the shared event type and waits until it has received a specified number of those events before executing.

{
  "name": "multi-model-analysis",
  "agents": [
    {
      "name": "dispatcher",
      "prompt": "Prepare the dataset for analysis. Normalize the format.",
      "triggers": ["chain:start"],
      "emits": ["data:ready"]
    },
    {
      "name": "analyst-gpt",
      "prompt": "Analyze the dataset using GPT. Focus on trend detection.",
      "triggers": ["data:ready"],
      "emits": ["analysis:complete"]
    },
    {
      "name": "analyst-claude",
      "prompt": "Analyze the dataset using Claude. Focus on anomaly detection.",
      "triggers": ["data:ready"],
      "emits": ["analysis:complete"]
    },
    {
      "name": "analyst-gemini",
      "prompt": "Analyze the dataset using Gemini. Focus on correlation mapping.",
      "triggers": ["data:ready"],
      "emits": ["analysis:complete"]
    },
    {
      "name": "synthesizer",
      "prompt": "Compare analyses from all three models. Identify consensus findings and disagreements.",
      "triggers": ["analysis:complete"],
      "collect": 3,
      "emits": ["chain:complete"]
    }
  ]
}

The "collect": 3 on the synthesizer is the fan-in mechanism. It buffers incoming analysis:complete events until it has all three, then fires with access to all their outputs. Without collect, the synthesizer would trigger on the first event and miss the other two.

Always set collect to an explicit number. "Wait for all" sounds convenient, but if one analyst agent silently fails, the synthesizer waits forever. An explicit count combined with a timeout gives you a failure mode you can handle.

Fan-In with Timeouts

What happens when one of your parallel agents hangs? Without a timeout, the collector waits indefinitely. Production chains need a backstop.

{
  "name": "synthesizer",
  "prompt": "Combine available analyses. Note any missing sources.",
  "triggers": ["analysis:complete"],
  "collect": 3,
  "collect_timeout_ms": 60000,
  "collect_min": 2,
  "emits": ["chain:complete"]
}

This collector waits up to 60 seconds for all 3 events. If only 2 arrive within the timeout, it proceeds anyway because collect_min is set to 2. If fewer than 2 arrive, it transitions to its on_error event. The prompt tells the agent to note missing sources, so the output reflects the degraded state.

The collect_min field is the difference between "we need every source" and "we need enough sources." For most workflows, partial results with a note about what's missing beats a total failure.

Race: First Result Wins

Sometimes you don't want all parallel results -- you want the fastest one. The race pattern launches multiple agents and takes the first output, discarding the rest.

{
  "name": "fastest-response",
  "agents": [
    {
      "name": "dispatcher",
      "prompt": "Prepare the query for multiple providers.",
      "triggers": ["chain:start"],
      "emits": ["query:ready"]
    },
    {
      "name": "provider-a",
      "prompt": "Query Provider A for the answer.",
      "triggers": ["query:ready"],
      "emits": ["response:ready"]
    },
    {
      "name": "provider-b",
      "prompt": "Query Provider B for the answer.",
      "triggers": ["query:ready"],
      "emits": ["response:ready"]
    },
    {
      "name": "provider-c",
      "prompt": "Query Provider C for the answer.",
      "triggers": ["query:ready"],
      "emits": ["response:ready"]
    },
    {
      "name": "consumer",
      "prompt": "Use the first available response.",
      "triggers": ["response:ready"],
      "collect": 1,
      "emits": ["chain:complete"]
    }
  ]
}

The consumer has "collect": 1. It triggers as soon as any provider responds. The other providers continue running to completion (they don't know they lost), but their events are ignored because the consumer already fired.

Race is useful for latency-sensitive pipelines where you have redundant providers. It's also useful for speculative execution: try three different approaches in parallel and use whichever one finishes first.

The tradeoff is cost. You're running three agents to use one result. If your agents are calling expensive APIs or burning significant tokens, the race pattern may not be worth it. Use it when latency matters more than cost.

Semaphores: Limiting Concurrent Agents

Fan-out is great until you hit an API rate limit. If you fan out 20 agents that all call the same external API, you'll get rate-limited in seconds. Semaphores limit how many agents can run simultaneously.

{
  "name": "bulk-enrichment",
  "concurrency": {
    "max_parallel": 5,
    "queue_strategy": "fifo"
  },
  "agents": [
    {
      "name": "splitter",
      "prompt": "Split the input records into individual enrichment tasks.",
      "triggers": ["chain:start"],
      "emits": ["record:ready"]
    },
    {
      "name": "enricher",
      "prompt": "Enrich this record with data from the external API.",
      "triggers": ["record:ready"],
      "emits": ["record:enriched"],
      "instances": "dynamic"
    },
    {
      "name": "aggregator",
      "prompt": "Combine all enriched records into the final dataset.",
      "triggers": ["record:enriched"],
      "collect": "all",
      "emits": ["chain:complete"]
    }
  ]
}

The chain-level concurrency.max_parallel setting ensures at most 5 enricher instances run at the same time. When the splitter produces 100 record:ready events, the first 5 enrichers start immediately and the rest queue. As each enricher completes, the next queued one starts. FIFO ordering ensures records are processed in the order they arrived.

This is essential for working with rate-limited APIs, resource-constrained infrastructure, or any external system that can't handle unbounded parallelism.

Avoiding Race Conditions in Shared State

When multiple agents run concurrently and write to shared state -- a file, a database record, a shared directory -- you get race conditions. Agent A reads a file, Agent B reads the same file, both modify it, and whoever writes last silently overwrites the other's changes.

Mentiko's file-based event system sidesteps most of this. Each agent writes to its own event file. No two agents write to the same file. The synthesizer or collector reads from multiple event files, but it's the only writer of its own output.

When you do need shared state -- a running tally, a coordination flag, a shared configuration -- use explicit locking:

{
  "name": "inventory-updater",
  "prompt": "Update the inventory count for this item.",
  "triggers": ["sale:completed"],
  "emits": ["inventory:updated"],
  "lock": {
    "key": "inventory-{item_id}",
    "timeout_ms": 5000
  }
}

The lock block ensures only one instance of this agent can hold the lock for a given item at a time. If another instance tries to acquire the same lock, it waits up to 5 seconds. This serializes access to the shared resource without serializing the entire chain.

Lock on the narrowest scope possible. Locking on inventory serializes all inventory updates across all items. Locking on inventory-{item_id} only serializes updates to the same item -- different items can update concurrently.

Choosing the Right Pattern

Here's the decision framework:

  • Independent tasks with no shared output: Fan-out, let them run free.
  • Independent tasks whose results need combining: Fan-out with fan-in collector.
  • Redundant providers where latency matters: Race pattern, take the first result.
  • High-volume parallel work against rate-limited APIs: Semaphore with queue.
  • Parallel agents writing to shared state: Explicit locking on the narrowest key.
  • Partial results acceptable: Fan-in with collect_min and timeout.

Most production chains combine several of these. A data pipeline might fan-out to 50 enrichment agents, semaphore them to 5 at a time, fan-in with a timeout that accepts partial results, and lock on the output database table during the final write. The patterns compose cleanly because they operate at different levels -- execution scheduling, result collection, and resource access.

Start with the simplest pattern that meets your latency requirement. Add semaphores when you hit rate limits. Add timeouts when you need reliability guarantees. Add locking when you have shared mutable state. Each pattern adds a small amount of configuration complexity and a large amount of operational confidence.

Build your first concurrent chain with Mentiko's chain builder, or explore five foundational chain patterns for more building blocks.

Get new posts in your inbox

No spam. Unsubscribe anytime.