Why AI Agents Need Real PTY Sessions, Not Sandboxed Code Execution

Most agent platforms run your code in a sandbox. A locked-down container with no filesystem, no network, no persistent state. You get a Python interpreter, maybe a handful of pre-installed libraries, and a string back. For toy demos, this works. For production workflows where agents need to actually do things on real infrastructure, it's a dead end.

Mentiko gives agents real PTY sessions. Here's why that matters and what it actually means.

How most platforms run agent code

The typical model: your agent generates code (usually Python), the platform spins up an ephemeral container, runs it, captures stdout, and returns a string. The container is destroyed. No state persists between executions.

This inherits from the "code interpreter" pattern that ChatGPT popularized. It's safe. It's simple. And it makes sense when the agent's job is to do math or generate a chart.

But real-world automation doesn't look like that. A DevOps agent needs to SSH into a server, run diagnostics, and apply a fix. A deployment agent needs to run git pull, npm install, and pm2 restart. A data pipeline agent needs to invoke dbt run, check the output, and trigger downstream jobs. None of these work in a sandbox with no filesystem, no network access, and no persistent state.

What a PTY session actually is

PTY stands for pseudoterminal. It's the mechanism that makes terminal emulators work. When you open a terminal window on your machine, you're not directly connected to your shell -- there's a PTY in between.

A PTY is a pair: a master side and a slave side. The slave side looks like a real terminal to the process attached to it. The master side is what reads and writes to that terminal. Your terminal emulator (iTerm, Alacritty, whatever) holds the master side. Your shell (bash, zsh) is attached to the slave side and thinks it's talking to a real terminal.

This is what makes interactive programs work. vim needs a terminal. top needs a terminal. ssh needs a terminal. Any program that uses cursor positioning, color codes, or reads character-by-character input needs a PTY. A plain stdin/stdout pipe isn't enough -- programs detect they're not attached to a terminal and change their behavior.

When we say Mentiko gives agents real PTY sessions, we mean agents get the master side of a PTY. They can type commands, read output (including ANSI escape codes, prompts, and interactive elements), send control characters (Ctrl+C, Ctrl+D), and interact with programs that expect a human at the keyboard.

The agent's experience is identical to yours when you open a terminal. Same shell, same filesystem, same tools, same environment variables. No translation layer, no sandbox restrictions, no API abstraction over basic operations.

Why this matters in practice

The gap between "execute this Python snippet" and "use a real terminal" is enormous.

With a real PTY session, an agent can:

Run git status, read the output, decide to git stash, switch branches, apply a patch, and push -- all in one session, maintaining shell state between commands.
SSH into a remote server, tail -f a log while running a test in another window, and correlate the output.
Start a Docker container, docker exec into it, run migrations, verify they worked, and exit -- interactively, exactly like a human would.
Use npm, pip, cargo, kubectl, terraform, ansible -- any CLI tool that's installed. No pre-approved list, no SDK wrappers, no "supported integrations."

The terminal is the universal interface. Every tool a developer uses has a CLI. By giving agents a terminal, you give them access to every tool without building a specific integration for each one.

The security balance

The naive objection is "but the agent could rm -rf /!" Yes, it could. The same way a junior developer with SSH access could. The answer isn't to take away the terminal -- it's to control what the terminal can access.

Mentiko handles this with workspace isolation. Each agent runs in its own workspace -- a directory with defined boundaries. The PTY session starts in that workspace, and the agent operates within the permissions of the workspace user. Same security model as giving a contractor access to one project directory on a shared server.

You can tighten this further. Workspaces can be Docker containers with network policies, read-only filesystem mounts, or resource limits. The security boundary is at the workspace level, not at the "can the agent run code at all" level.

This is a fundamentally different philosophy from sandboxing. Sandboxes say "deny everything by default, allow specific things." Workspaces say "allow everything within this boundary, deny things outside it." The sandbox approach is safer but crippling. The workspace approach is practical -- agents can get work done while staying contained.

Session persistence

Here's something sandboxes can't do: persist.

When a sandbox runs your code and returns, everything is gone. Environment variables, installed packages, files created, shell history -- all destroyed. The next execution starts from scratch. If your agent needs to pip install pandas to do its job, it installs it every single time. If it set up a git repo and made three commits, those are gone.

PTY sessions in Mentiko persist. An agent's terminal session stays alive between chain steps. If Agent A installs a dependency and Agent B needs it, it's already there. If an agent is halfway through a long-running process and gets interrupted, it can resume. Working directory, environment variables, shell history, running processes -- all survive between invocations.

An agent can start a development server, run tests against it, observe the results, make changes, and re-test -- across multiple steps in a chain, maintaining the server process the entire time. Try doing that in a sandbox that resets between executions.

Session persistence also enables iterative debugging. When an agent runs a command and it fails, the error context is right there in the terminal history. The agent can scroll up, understand what happened, and try a different approach. In a sandbox, each attempt is isolated -- the agent has to reconstruct context from the error string alone.

Multi-agent terminal sharing

This is where PTY sessions unlock something truly new.

In Mentiko, agents can read each other's terminal output. Not through event files or message passing (though those work too) -- through the actual PTY session. Agent B can watch Agent A's terminal in real time, the same way you'd watch a colleague's screen during pair programming.

A code-writing agent works in one terminal while a review agent watches the same PTY, reading the code as it's written and flagging issues in real time. A deployment agent runs commands while a monitoring agent watches the output for error patterns. A test agent runs the suite while a debugging agent watches for failures and immediately starts investigating.

This is multiplexed through the PTY master -- Mentiko's orchestration layer holds the master side and exposes read access to multiple agents. The writing agent has write access. Observing agents get read-only access. It's tmux for AI agents.

When sandboxed execution is fine

We're not saying sandboxes are useless. They're the right choice for a set of tasks that don't need system access:

Pure computation (math, data transformation, statistics)
API calls through an SDK (no CLI needed)
Text processing (parsing, formatting, summarization)
Code generation where you just need the output string, not execution

If your agent's job is "take this JSON, transform it, return the result" -- a sandbox is simpler, safer, and perfectly adequate. You don't need a PTY session to parse a CSV.

The problem is when platforms offer only sandbox execution and market it as "agent code execution." It's like offering a calculator and calling it a computer.

How Mentiko implements it

Mentiko uses the node-pty library (the same PTY binding that powers VS Code's terminal) to allocate real pseudoterminals for each agent session. The orchestration layer -- written in bash -- holds the master file descriptors and manages the lifecycle.

Each chain definition specifies workspace configuration:

{
  "name": "deploy-pipeline",
  "workspace": {
    "type": "local",
    "path": "/opt/deploy/my-app",
    "shell": "/bin/bash",
    "env": {
      "NODE_ENV": "production",
      "DEPLOY_TARGET": "staging"
    }
  },
  "agents": [
    {
      "name": "deployer",
      "prompt": "Pull latest, install deps, run migrations, restart.",
      "triggers": ["chain:start"],
      "emits": ["deploy:complete"],
      "session": { "persist": true }
    },
    {
      "name": "verifier",
      "prompt": "Run smoke tests against the deployed service.",
      "triggers": ["deploy:complete"],
      "emits": ["chain:complete"],
      "session": { "watch": ["deployer"] }
    }
  ]
}

The workspace type can be local (a directory on the host), ssh (a remote machine), or docker (an isolated container). All three give the agent a real PTY -- the difference is where that terminal lives. The session.persist flag keeps the PTY alive between steps. The session.watch array lets an agent read another agent's terminal output.

This is the entire configuration. No SDK to learn, no executor plugins to install, no "supported languages" list to check. If it runs in a terminal, it runs in Mentiko.

Want to see PTY sessions in action? Build your first agent chain in five minutes or join the waitlist to get your own instance.