Skip to content
← all posts
10 min read

Building Internal Tools with Agent Chains

Mentiko Team

Every company has them. The bash script that provisions new employee accounts but breaks if someone's last name has an apostrophe. The Google Sheet that three people know how to use for quarterly planning. The Slack bot that was written by an engineer who left two years ago and nobody wants to touch. The manual checklist that someone copies from Confluence every time there's an incident.

Internal tools are the duct tape holding organizations together. They work until they don't, and when they break, nobody's job is to fix them because they were never anyone's job to build in the first place.

Agent chains are a better way to build internal tooling. They're readable (the chain definition is a JSON file, not a 400-line bash script), maintainable (update a prompt, not spaghetti code), and observable (every execution produces event logs). Here's how to think about replacing your internal tool sprawl with orchestrated agent chains.

The anatomy of an internal tool problem

Internal tools share common patterns that make them fragile:

Tribal knowledge. The tool works because Sarah knows that you have to run the prep script first, then wait 30 seconds, then run the main script with the --legacy flag because the API changed but nobody updated the default. None of this is documented.

Brittle integrations. The tool calls three APIs, scrapes two web pages, and reads from a shared drive. When any of these changes -- a new API version, a page redesign, a drive migration -- the tool breaks silently.

No error handling. The script either works or it doesn't. When it fails, you get a stack trace that means nothing to the person trying to use it. There's no retry logic, no fallback, no graceful degradation.

Single point of failure. It runs on Dave's laptop. Or on a VM that nobody remembers the password for. Or in a cron job on a server that's three OS versions behind.

Agent chains address all of these. The chain definition is the documentation. Agent prompts are human-readable descriptions of what each step does. The orchestrator handles retries, error routing, and execution environment management. And it runs on infrastructure, not on Dave's laptop.

Pattern 1: Employee onboarding automation

The classic internal tool nightmare. New hire starts Monday, and someone needs to:

  • Create accounts in 8 systems (Google Workspace, Slack, GitHub, Jira, AWS, VPN, the wiki, the time tracking tool)
  • Add them to the right teams and channels based on their role
  • Generate and send credentials
  • Create their first-week onboarding checklist
  • Notify their manager and buddy

Most companies handle this with a mix of manual steps, a half-working script, and an IT person who has all the admin passwords memorized.

Here's the agent chain version:

{
  "chain": "employee-onboarding",
  "trigger": "manual",
  "input": {
    "employee_name": "string",
    "email": "string",
    "role": "string",
    "department": "string",
    "manager_email": "string",
    "start_date": "string"
  },
  "agents": [
    {
      "name": "AccountProvisioner",
      "prompt": "Create accounts for the new employee in all required systems. Use the role and department to determine which systems and permissions. Engineering roles get GitHub and AWS. All roles get Google Workspace, Slack, and Jira. Create accounts using the provided APIs and collect all credentials.",
      "secrets": ["GOOGLE_ADMIN_TOKEN", "SLACK_ADMIN_TOKEN", "GITHUB_ORG_TOKEN", "JIRA_ADMIN_TOKEN", "AWS_IAM_KEY"],
      "on_failure": "pause_and_alert"
    },
    {
      "name": "TeamConfigurator",
      "prompt": "Based on the employee's department and role, add them to the appropriate Slack channels, GitHub teams, Jira projects, and Google Groups. Use the department-to-team mapping in the config. Don't add to optional channels -- let them self-select later.",
      "secrets": ["SLACK_ADMIN_TOKEN", "GITHUB_ORG_TOKEN", "JIRA_ADMIN_TOKEN"]
    },
    {
      "name": "CredentialPackager",
      "prompt": "Package all generated credentials into a secure onboarding email. Include: login URLs, temporary passwords (flagged for immediate change), 2FA setup instructions, VPN configuration file. Format as a clean HTML email. Do NOT include credentials in plain text in logs or event files.",
      "output_action": "decision_flow",
      "decision_config": {
        "reviewer_role": "it_admin",
        "prompt": "Review the onboarding package before sending. Verify all accounts were created and credentials are correct."
      }
    },
    {
      "name": "NotificationSender",
      "prompt": "Send the credential package to the new employee's email. Send a summary to their manager with the start date and onboarding checklist link. Post a welcome message to the team's Slack channel. Create the first-week checklist in Jira assigned to the new employee.",
      "secrets": ["SMTP_CREDENTIALS", "SLACK_BOT_TOKEN", "JIRA_API_TOKEN"]
    }
  ]
}

The decision flow gate before sending credentials is important. An IT admin reviews the package before it goes out. This catches edge cases: maybe the employee needs a non-standard permission, maybe an account creation failed silently, maybe the role mapping was wrong.

When this chain breaks -- and it will, because APIs change -- the failure is visible, traceable, and fixable. Agent 1 failed because the GitHub API returned a 422? You can see exactly what happened, update the prompt, and re-run. You don't need to debug a 400-line script.

Pattern 2: Incident response runbook

Incident response is a manual process at most companies. Someone declares an incident, a responder opens the runbook, and they work through the steps. The problem: steps get skipped under pressure, the runbook is outdated, and coordination between responders is ad hoc.

An agent chain can automate the mechanical parts of incident response while keeping humans in control of decisions:

{
  "chain": "incident-response",
  "trigger": "webhook:pagerduty",
  "agents": [
    {
      "name": "IncidentClassifier",
      "prompt": "Analyze the alert payload. Classify severity (SEV1-SEV4) based on: affected service, blast radius (percentage of users), data integrity risk, and revenue impact. Identify the likely affected systems and pull recent deployment history for those systems."
    },
    {
      "name": "ContextGatherer",
      "prompt": "Gather diagnostic context for the incident. Pull: last 30 minutes of error logs for affected services, recent deployments (last 4 hours), current resource utilization (CPU, memory, connections), dependency health checks, and any related alerts in the last hour. Summarize findings."
    },
    {
      "name": "RunbookExecutor",
      "prompt": "Based on the incident classification and context, identify the appropriate runbook. Execute the diagnostic steps from the runbook. For each step, report what you found. If any diagnostic step suggests a root cause, flag it. Do NOT execute remediation steps -- only diagnostics.",
      "on_failure": "continue_with_warning"
    },
    {
      "name": "IncidentBriefer",
      "prompt": "Compile all findings into an incident brief. Include: severity classification, affected systems, timeline of events, diagnostic results, suspected root cause (if identified), recommended next actions. Post to the incident Slack channel and update the PagerDuty incident with the brief.",
      "secrets": ["SLACK_BOT_TOKEN", "PAGERDUTY_API_KEY"]
    }
  ]
}

The chain does in 2 minutes what usually takes 15-20 minutes of manual investigation. The on-call engineer gets a formatted brief with diagnostic data already gathered, instead of starting from scratch.

The critical design choice: the chain only runs diagnostics, not remediation. It doesn't restart services or roll back deployments automatically. Those actions need human judgment. The chain prepares the context; the human makes the call.

Pattern 3: Vendor evaluation and procurement

Every team evaluates vendors. The process is always the same: gather requirements, research options, compare features, check pricing, review security posture, make a recommendation. It takes weeks because it's nobody's primary job.

{
  "chain": "vendor-evaluation",
  "trigger": "manual",
  "input": {
    "category": "string",
    "requirements": "array",
    "budget_range": "string",
    "security_requirements": "string"
  },
  "agents": [
    {
      "name": "MarketResearcher",
      "prompt": "Research the vendor landscape for the given category. Identify the top 5-8 vendors. For each, gather: product name, company size, founding year, primary use case, notable customers, pricing model. Focus on vendors that match the stated requirements."
    },
    {
      "name": "FeatureComparer",
      "prompt": "For each vendor from the research phase, evaluate against the stated requirements. Create a feature matrix: requirement vs. vendor, scored as full support, partial support, or not supported. Note any requirements that no vendor fully supports."
    },
    {
      "name": "SecurityReviewer",
      "prompt": "For the top 3 vendors from the feature comparison, evaluate security posture. Check: SOC 2 certification, data residency options, encryption standards, SSO support, audit logging, GDPR compliance, and any published security incidents. Score each vendor against the stated security requirements."
    },
    {
      "name": "RecommendationWriter",
      "prompt": "Synthesize the research, feature comparison, and security review into a recommendation document. Recommend a primary vendor and a runner-up. Include: summary comparison table, detailed pros/cons for the top 3, pricing analysis against budget, implementation timeline estimate, and risks. Format as a document ready for stakeholder review."
    }
  ]
}

An evaluation that takes 2-3 weeks of someone's part-time attention gets a solid first draft in 20 minutes. The team reviews the output, adjusts based on their domain knowledge and internal politics (which agents can't assess), and has a recommendation ready.

Pattern 4: Report generation from multiple sources

The weekly status report. The monthly metrics deck. The quarterly business review. Someone spends hours pulling data from Jira, Google Analytics, Salesforce, and the finance system, then formats it into slides.

{
  "chain": "weekly-status-report",
  "schedule": "0 8 * * 1",
  "agents": [
    {
      "name": "DataPuller",
      "prompt": "Pull last week's data from: Jira (tickets completed, bugs filed, sprint velocity), Google Analytics (traffic, conversion rates, top pages), Stripe (revenue, churn, new subscriptions), and GitHub (PRs merged, deploy count, incident count). Output structured JSON.",
      "secrets": ["JIRA_API_TOKEN", "GA_SERVICE_ACCOUNT", "STRIPE_SECRET_KEY", "GITHUB_TOKEN"]
    },
    {
      "name": "TrendAnalyzer",
      "prompt": "Compare this week's data to the previous 4 weeks. Identify: metrics trending up, metrics trending down, anomalies (anything more than 2 standard deviations from the 4-week average), and any correlations between metrics (e.g., deploy count and incident count)."
    },
    {
      "name": "NarrativeWriter",
      "prompt": "Write a concise weekly status update. Lead with the 3 most important things that happened. Include key metrics with trend arrows. Flag anything that needs attention. Keep it under 500 words. Write for an executive audience -- no jargon, no vanity metrics. Format as Markdown suitable for Slack and email."
    },
    {
      "name": "Distributor",
      "prompt": "Post the status update to the #weekly-status Slack channel. Send via email to the distribution list. Save a copy to the shared drive in the Weekly Reports folder with the date in the filename.",
      "secrets": ["SLACK_BOT_TOKEN", "SMTP_CREDENTIALS", "GDRIVE_SERVICE_ACCOUNT"]
    }
  ]
}

Every Monday at 8 AM, the report is written and distributed. Nobody had to spend Friday afternoon assembling it.

Pattern 5: Customer feedback synthesis

Support tickets, NPS surveys, app store reviews, social media mentions, sales call notes. Customer feedback comes from everywhere and goes nowhere because nobody has time to read all of it and synthesize patterns.

{
  "chain": "feedback-synthesis",
  "schedule": "0 9 * * 5",
  "agents": [
    {
      "name": "FeedbackCollector",
      "prompt": "Collect this week's customer feedback from: Zendesk (new tickets, tag distribution, sentiment), Delighted (NPS responses), App Store and Play Store reviews, Twitter mentions, and Gong call summaries. Pull raw text and metadata for each source."
    },
    {
      "name": "ThemeExtractor",
      "prompt": "Analyze all collected feedback and extract recurring themes. Group by: feature requests, bug reports, praise, confusion/UX issues, and competitive mentions. For each theme, count occurrences and provide 2-3 representative quotes. Rank themes by frequency and apparent business impact."
    },
    {
      "name": "InsightWriter",
      "prompt": "Write a weekly Voice of Customer report. Top 5 themes with evidence. New issues this week vs. recurring. Sentiment trend (improving, stable, declining). Specific product recommendations based on the data. Write for a product manager audience -- actionable, not academic."
    }
  ]
}

Pattern 6: Access review and compliance

SOC 2 and SOX compliance require periodic access reviews. Who has access to what systems, and is that access still appropriate? Most companies handle this with a spreadsheet that gets updated (maybe) once a quarter.

An agent chain can automate the entire review cycle: pull current access lists from every system, cross-reference with HR data (who's still employed, what role are they in), flag inappropriate access (a departed employee still has GitHub access, a marketing person has production database credentials), and generate the audit report that your compliance team needs.

The chain runs monthly. Findings go through a decision flow where the system owner approves or revokes flagged access. The audit trail is generated automatically. What used to be a quarterly fire drill becomes a background process.

Why agent chains beat traditional internal tools

The comparison comes down to three things:

Readability. A chain definition is a JSON file with human-readable agent prompts. A bash script is... a bash script. When something breaks, anyone can read the chain and understand what it's supposed to do. You don't need to be the person who wrote it.

Adaptability. When an API changes, you update the agent's prompt. You don't refactor code, update dependencies, or fix parsing logic. When requirements change (add a new system to the onboarding chain, add a new data source to the report), you add an agent. The change is additive, not surgical.

Observability. Every agent execution produces event files. You can see exactly what each agent received, what it produced, how long it took, and whether it succeeded. When the onboarding chain fails, you know which step failed and why. When the report chain produces wrong numbers, you can trace back to the data pull and see what the API returned.

Getting started

Pick the internal process that annoys your team the most. The one where someone says "I wish this was automated" every time they do it. Map it as a sequence of steps. Each step becomes an agent. The inputs and outputs between steps become the event contract.

Start simple. A 2-3 agent chain that automates the most painful part of the process. Run it manually for a week. Once you trust it, add a schedule or a webhook trigger.

The goal isn't to automate everything on day one. It's to replace the most fragile, most annoying, most time-consuming internal tool with something that works reliably and doesn't require tribal knowledge to operate.


For more use cases, see how teams use agent chains for engineering, DevOps, and support. Build your first chain with the 5-minute tutorial.

Get new posts in your inbox

No spam. Unsubscribe anytime.