Why We Built Mentiko

We didn't start with a pitch deck. We started with a problem.

Our team was building AI-powered tools -- agents that could research, write, analyze, and review. Each agent worked well on its own. But making them work together was a mess.

We had bash scripts calling Python scripts calling APIs. Cron jobs that broke silently. Logs scattered across five terminal windows. No way to know which agent failed at 3am or why.

So we built a thing. A small thing. A chain runner that read a JSON file, launched agents in order, and watched for completion events. It was 200 lines of bash.

It worked. Surprisingly well.

From script to system

That 200-line script became our daily workhorse. We defined chains for everything: content research, code review, competitive analysis, customer ticket triage. Each chain was a JSON file we could git-commit, diff, and share.

But we kept hitting the same gaps:

"Which agent is running right now?" -- we needed monitoring
"Run this chain every Monday" -- we needed scheduling
"My API keys are in plain text" -- we needed a secrets vault
"The new hire can't see my chains" -- we needed multi-tenancy

Every gap led to another 200 lines. Then a web UI. Then a visual builder. Then real-time monitoring.

At some point, we stopped building tools and started building a platform.

Why not use an existing tool?

We looked at everything:

LangChain / LangGraph: Great for building individual agents. But orchestrating a chain of agents means writing Python state machines. Our agents weren't all Python -- some were bash scripts, some were Claude Code sessions, some were custom binaries. LangGraph wanted everything in Python.

CrewAI: Clean API, but per-execution pricing made our daily chains expensive. And again: Python-only. Our researcher agent was a shell script that called multiple APIs. Square peg, round hole.

Temporal / Airflow: Built for data pipelines and microservices, not AI agents. The abstraction didn't fit. We didn't need durable execution and saga patterns. We needed "launch this CLI tool, watch for a file, launch the next one."

Build our own orchestration in Node.js: We tried. 3,000 lines, 14 dependencies, bugs for weeks. Then we replaced it with 200 lines of bash and it was more reliable. That was the moment we realized: the orchestration layer should be simple. Radically simple. The complexity belongs in the agents, not the plumbing.

The design decisions

Every platform encodes opinions. Here are ours:

File-based events over message queues. When something goes wrong, you should be able to cat the event file and see what happened. No RabbitMQ, no Redis, no Kafka. Just files.

PTY sessions over sandboxed execution. Agents need to run real tools -- git, npm, python, terraform. Sandboxed execution breaks this. Our agents run in real terminal sessions with full access to the workspace.

JSON chain definitions over code. Your workflow should be data, not logic. Data is diffable, versionable, portable. Logic is opaque and language-specific.

Flat-rate pricing over per-execution. We ran hundreds of chains per day while building this. Per-execution pricing would have punished us for using our own tool. Flat-rate means you experiment freely.

Self-hosted over multi-tenant SaaS. Your API keys, your data, your infrastructure. We give you the software. You control everything else.

Where we are now

Mentiko is in invite-only early access. Teams are running chains for content, code review, research, support, and monitoring. The platform handles scheduling, monitoring, error recovery, and multi-tenancy.

We're not done. The decision flow feature -- AI-assisted team decisions with execution plans -- is our most ambitious feature yet. The marketplace for shared chains and agents is just starting. And we're working on making the visual builder even more powerful.

But the core works. Agents trigger agents through events. Chains run on schedule. Monitoring tells you what's happening. And the whole thing is yours to control.

Why this matters

AI agents are moving from demos to production. The gap isn't in agent intelligence -- Claude and GPT-5.4 are already capable enough. The gap is in coordination. How do you make multiple agents work together, reliably, on a schedule, with monitoring and error handling?

That's what we built. Not because we thought it would be a product. Because we needed it ourselves.

The best tools start that way.

Want to see what we built? Try the quick-start or join the waitlist for early access.