Skip to content
← all posts
7 min readproduction

LLM Trust and Supply Chain Risks: What You're Actually Running

securitytrustsupply-chainmodel-evaluationrisk

Mentiko Team

When you deploy an npm package, you can read the source code. When you deploy a Docker image, you can inspect the layers. When you deploy an LLM, you're running a black box trained on data you've never seen, by a team you may know nothing about, under a license you might not have read carefully.

The AI model supply chain has risks that most teams haven't thought about. They should.

The provenance problem

A software library has a commit history. You can trace every change, read every line, audit every dependency. An LLM has none of that. You get weights -- billions of floating-point numbers that encode everything the model learned during training. There's no way to inspect what's "in" those weights by looking at them.

This creates a trust problem that doesn't exist in traditional software:

  • Training data is opaque. Most model providers don't fully disclose their training data. Even "open-source" models rarely publish complete data documentation. You don't know what the model learned, what biases it absorbed, or what copyrighted material it may reproduce.
  • Fine-tuning history is unknown. A model on Hugging Face might be a fine-tune of a fine-tune of a base model. Each layer of fine-tuning can introduce behaviors, biases, or capabilities that aren't documented.
  • Behavioral verification is limited. You can test a model on your benchmarks, but you can't test for every possible behavior. A model might behave perfectly on your evaluation set and still have problematic behaviors triggered by inputs you didn't test.

This is fundamentally different from the supply chain risk in traditional software. With software, the risk is malicious code -- and you can scan for it. With models, the risk is baked into the weights themselves, and there's no scanner that can find it.

The Russian MIT-licensed model controversy

In late 2025, the AI community confronted a concrete example of supply chain risk. Several widely-used models on Hugging Face, released under permissive MIT licenses by organizations with limited public presence, drew scrutiny when researchers flagged potential connections to Russian state-affiliated research institutions.

The concern wasn't that the models contained malware in the traditional sense. The concern was subtler:

  • Training data provenance was unclear. The organizations didn't provide documentation about training data sources, making it impossible to verify compliance with data protection regulations or evaluate potential biases.
  • Behavioral alignment was unverified. Models trained by teams with different values, regulatory environments, or institutional pressures may embed different behavioral patterns. Without transparency into the alignment process, downstream users can't evaluate these risks.
  • Licensing didn't mean safety. An MIT license tells you what you can legally do with the software. It tells you nothing about whether the software is safe, unbiased, or trustworthy. The permissive license was, in some cases, being interpreted as an endorsement of quality.
  • Geopolitical risk became real. Organizations in regulated industries faced questions from compliance teams about deploying models with unclear provenance in sensitive workflows.

The episode highlighted something important: the model marketplace operates largely on trust, and the mechanisms for verifying that trust are immature.

Training data risks

Training data shapes model behavior. If you can't see the training data, you can't fully predict the behavior.

Copyright exposure. Models trained on copyrighted material can reproduce that material in their outputs. If your agent chain generates content using a model trained on copyrighted text, you may have legal exposure. The legal landscape here is still evolving, but "we didn't know what was in the training data" is not a defense that inspires confidence.

Data poisoning. Research has demonstrated that training data can be intentionally poisoned to create specific behaviors in the resulting model. A backdoored model might perform normally on standard benchmarks but produce manipulated outputs when triggered by specific input patterns. This isn't theoretical -- published research has demonstrated practical data poisoning attacks.

PII contamination. Models trained on web scrapes inevitably contain personally identifiable information. A model might surface real names, email addresses, or phone numbers in its outputs. For agent chains that handle customer communications, this is a compliance risk.

Benchmark gaming. Training data contamination -- where benchmark test sets leak into training data -- inflates scores and makes models look better than they are. A model that achieves 95% on a contaminated benchmark might actually perform at 80% on genuinely novel inputs.

Licensing traps

Open-source in the LLM world doesn't always mean what it means in traditional software.

Meta's Llama license is permissive for most uses but restricts companies with over 700 million monthly active users (they need a special license). It also requires attribution and compliance with Meta's acceptable use policy.

Mistral's licenses vary by model. Some are Apache 2.0 (truly open), others are research-only or have commercial restrictions.

"Open-weight" is not "open-source." Many models release weights but not training code, training data, or fine-tuning recipes. This means you can use the model but you can't reproduce it, verify its training, or fully audit its behavior. The Open Source Initiative has been clear that weight-only releases don't meet the open-source definition.

License changes happen. A model released under one license can have its successor released under a different one. If your production pipeline depends on a specific model, a license change on the next version forces a decision: stay on the old version or accept new terms.

Read the license. The whole thing. Before you put the model in production.

Evaluating model trustworthiness

There's no certification body for LLM trustworthiness. You have to do your own evaluation. Here's a practical checklist:

Organization transparency. Who made this model? Are they a known entity with a public track record? Do they have a responsible disclosure process? Do they publish their research? A model from Meta, Mistral, or Alibaba has institutional accountability behind it. A model from an anonymous Hugging Face account does not.

Training documentation. Does the model card describe the training data, methodology, and alignment process? Lack of documentation isn't proof of a problem, but it makes the model harder to trust and harder to defend in an audit.

Community vetting. Has the model been widely deployed and evaluated by independent researchers? Models with thousands of downloads and published third-party evaluations are lower risk than models with no community scrutiny.

Behavioral testing. Run your own evaluations. Test for the specific tasks you care about. Test for edge cases. Test for bias in your domain. Don't rely solely on published benchmarks.

Alignment verification. Does the model refuse clearly harmful requests? Does it handle adversarial prompts appropriately? A model with no safety alignment is a liability in any customer-facing or regulated application.

Update cadence. Is the model actively maintained? Are known issues being addressed? An abandoned model is a ticking clock -- any discovered vulnerability will never be patched.

Infrastructure ownership as risk mitigation

Supply chain risk in AI has an asymmetry. When you use a SaaS platform that controls the model, you're trusting both the model provider and the platform operator. If the platform decides to swap models, change providers, or shut down, your workflow breaks.

When you control your infrastructure, you reduce the surface area of trust. You choose the model. You verify it against your criteria. You pin the version. You decide when to update. If a model's provenance comes into question, you swap it out on your timeline, not someone else's.

This is the same logic that drives infrastructure-as-code, reproducible builds, and vendored dependencies. Control what you can control, and minimize trust assumptions.

Self-hosted orchestration platforms like Mentiko fit this model. Your agents run on your infrastructure. Your model endpoints are configured by you. If you need to switch from Model A to Model B because of a trust concern, you change a configuration line, not a vendor contract.

Practical recommendations

For regulated industries: Stick with models from established organizations (Meta, Anthropic, OpenAI, Mistral, Cohere) that have public responsible AI policies and legal entities you can hold accountable. Avoid anonymous or thinly-documented models regardless of benchmark scores.

For sensitive data workflows: Use models where you control the infrastructure end-to-end. Self-hosted models on your own GPUs mean prompts and completions never leave your network.

For production pipelines: Pin model versions. Document which model is used at each step. Maintain a model inventory that tracks provenance, license terms, and last evaluation date.

For everyone: Treat model selection like dependency management. You wouldn't add an npm package without checking the maintainer, the license, and the download count. Apply the same rigor to the models your agents depend on.

The AI model supply chain will mature. Better provenance tools, model signing, reproducible training, and independent auditing are all being worked on. Until those become standard, the responsibility falls on the teams deploying these models to do their own due diligence.

Get new posts in your inbox

No spam. Unsubscribe anytime.