AI Supply Chain Security & Model Integrity

The Attack Surface Most Security Programs Miss

AI models are not static artifacts. They are code that runs during loading, training, inference, and agent execution. Four attack categories dominate the threat model.

The Pickle Format Problem

torch.load() executes arbitrary Python during deserialization. This is not a bug. It is the designed behavior of pickle serialization, and 80%+ of ML models use it.

A model named "baller423" on Hugging Face was found establishing a reverse shell to Kreonet. The model looked normal. It passed basic scans. It ran arbitrary code the moment someone loaded it.

PickleScan, the most widely used defense, has at least 3 known zero-day bypasses (CVE-2025-10155). Blacklist-based scanning is fundamentally broken because the attacker controls the serialization format.

Fine-Tuning Destroys Safety

Llama 3.1 8B drops from 0.95 to 0.15 on prompt injection resilience after a single round of fine-tuning. That is a 84% degradation in safety alignment from normal, non-adversarial training.

Almost nobody re-evaluates safety after fine-tuning. The model passes the initial safety evaluation, gets fine-tuned on domain data, and goes to production with its guardrails effectively removed. This is not an exotic attack. It is the default workflow at most organizations.

Shadow AI Proliferation

98% of organizations have unauthorized AI usage. That number is not a typo. The $670K additional breach cost for shadow AI incidents reflects a simple reality: you cannot secure what you cannot see.

62% of security teams cannot identify where LLMs are deployed in their environment. Developers download models from Hugging Face, call OpenAI APIs with personal keys, and deploy fine-tuned models on personal cloud accounts. Current security tools surface roughly 38% of this activity.

Agentic AI Amplification

GitHub Copilot's RCE vulnerability (CVE-2025-53773, CVSS 7.8) turned a prompt injection in a repository's documentation into full system compromise via YOLO mode. The agent read a malicious instruction, executed it as code, and the user's machine was owned.

Amazon Q's cleaner.md file distributed destructive commands to 950K+ users through the agent's context window. OpenClaw's marketplace accumulated 138 CVEs across 63 days, with 12% of submitted skills found to be malicious.

Agents turn prompt injections into system-level compromises because they have tool access, credentials, and execution privileges that traditional LLMs lack.

Who Does What in AI Model Security

The vendor ecosystem is maturing fast. Here is an honest view of what each player covers and where the gaps remain.

Provider	What They Do	What They Don't Do	Best For
Palo Alto / Protect AI	Model scanning, AI-BOM generation, integrated into Prisma AIRS platform	Architecture design, custom pipeline engineering, organizational change management	Enterprises already on the PANW platform
HiddenLayer	Runtime AI detection and response, agentic security monitoring	Supply chain architecture, ML-BOM implementation, compliance mapping	SOC teams adding AI visibility
JFrog	MLSecOps, model registry security, Hugging Face integration	Adversarial red-teaming, safety alignment validation, governance design	DevOps teams managing model artifacts
Wiz	AI-BOM in cloud security context, model scanning	On-prem model security, fine-tuning safety, agentic architecture	Cloud-first organizations
NVIDIA NeMo Guardrails	Open-source runtime guardrails for LLMs	Model scanning, supply chain security, provenance tracking	Teams building custom LLM applications
Big 4 / Large SIs	Governance frameworks, compliance documentation, board decks	Implementation. Building scanning pipelines, configuring ML-BOMs, deploying model signing. Engagements start at $500K strategy, scale to $3-10M.	Organizations needing audit-ready documentation
Open Source (ModelScan, PickleScan, SafeTensors)	Free basic scanning and safer serialization formats	Enterprise-grade orchestration, behavioral sandboxing, provenance, policy enforcement	Teams with strong internal security engineering

A gap nobody fills well. Organizational culture change is the hardest part. No tool or consultancy eliminates the human tendency to bypass governance for speed. We build the technical controls, but the CISO still needs executive buy-in. When a data scientist can download a model from Hugging Face in 30 seconds, any security gate that takes 30 minutes will get bypassed. The controls need to be fast enough that compliance is easier than circumvention.

What We Build for AI Security Programs

Six capabilities, each engineered to integrate with your existing security stack and CI/CD pipelines.

Model Vetting Pipelines

We build automated vetting that sits between public model repositories and your internal registry. Every model passes through behavioral sandboxing (loaded in isolated containers, syscalls monitored), multi-format deep analysis (pickle, PyTorch, GGUF, Keras, SafeTensors), and cryptographic signing with your enterprise PKI.

We reach for behavioral analysis over static scanning because PickleScan's zero-day bypasses prove blacklist approaches are fundamentally broken. Static scanning asks "does this file contain known-bad patterns?" Behavioral sandboxing asks "what does this code actually do when it runs?" The second question catches novel attacks.

ML-BOM & Provenance Architecture

CycloneDX ML-BOM generation integrated into CI/CD. Every model gets a bill of materials documenting training data provenance, framework versions, dependency trees, and fine-tuning history.

We use CycloneDX over SPDX because the ML-BOM tooling is more mature, though we ensure SPDX 3.0 export for organizations that need both. The ML-BOM is not a compliance checkbox. It is the data structure that makes every other security control possible: you cannot sign what you cannot inventory, and you cannot audit what you cannot trace.

Shadow AI Discovery

Network-level detection of unauthorized model downloads and AI API calls. Integration with your existing SIEM/SOAR. We map every AI touchpoint including shadow deployments, then build policy enforcement that blocks risk without blocking innovation.

The goal: your security team sees 100% of AI usage, not the 38% that current tools surface. Detection covers Hugging Face downloads, OpenAI/Anthropic/Google API calls, model weight transfers over HTTP/S, and local model execution via process monitoring on managed endpoints.

Post-Fine-Tuning Safety Validation

Automated safety re-evaluation after every fine-tuning run. OWASP LLM Top 10 benchmark suite, adversarial probing for backdoor triggers, and safety alignment regression testing.

We build this because almost nobody re-evaluates safety after fine-tuning. The safety degradation data in the section above makes the case. The validation pipeline runs as a CI/CD gate. A model that fails safety regression cannot be promoted to production, regardless of its task performance.

Agentic AI Security Architecture

Privilege separation for AI agents. Deterministic policy layers that prevent prompt-to-RCE escalation (the exact attack vector in CVE-2025-53773). Tool-use policy enforcement, human-in-the-loop gates for high-risk operations, and runtime behavior monitoring.

The architecture detects anomalous agent actions before they cascade. An agent that suddenly starts writing to filesystem paths outside its sandbox, calling APIs it has never called before, or attempting privilege escalation gets terminated and flagged for review.

AI Security Program Design

For CISOs building the function from scratch. NIST AI 100-2 control mapping, EU AI Act compliance architecture, board-level risk quantification, and incident response playbooks for AI-specific attacks.

We help translate technical risk into budget justification that boards approve. "We found 352K unsafe issues across public model registries" is a data point. "Our engineers downloaded 47 unvetted models last quarter, 3 contained executable code in their serialization layer, and our current controls detected none of them" is a budget justification.

How an Engagement Works

Three phases, each with defined deliverables and honest caveats about what to expect.

Phase 1

Discovery & Threat Modeling

Weeks 1-3

▶ AI asset inventory: catalog every model, API, agent, and pipeline in your environment
▶ Shadow AI sweep: network-level detection of unauthorized AI usage across all egress points
▶ Threat model: map attack surfaces specific to your deployment architecture and model types
▶ Gap analysis against NIST AI 100-2 and EU AI Act requirements

Deliverable: AI Security Posture Report with prioritized risk register

Caveat: This phase often surfaces 3-5x more AI usage than the CISO expected. That is normal. The shadow AI discovery is the most valuable and the most uncomfortable part of the engagement.

Phase 2

Architecture & Build

Weeks 4-10

▶ Design model vetting pipeline, ML-BOM generation, and signing infrastructure
▶ Build and deploy into your CI/CD (Jenkins, GitHub Actions, GitLab CI, Azure DevOps)
▶ Configure shadow AI detection and SIEM integration (Splunk, Sentinel, Chronicle)
▶ Implement post-fine-tuning safety validation as a CI/CD gate

Deliverable: Production-ready security controls integrated into existing workflows

Caveat: Timeline depends on CI/CD maturity. Teams with mature DevOps pipelines deploy faster. Organizations still moving models via USB drives or shared folders (more common than you would expect) need additional infrastructure work.

Phase 3

Operationalize & Transfer

Weeks 11-14

▶ Train security team on model vetting operations and alert triage
▶ Establish adversarial red-team cadence (quarterly recommended, monthly for high-risk systems)
▶ Build incident response playbooks for model-level attacks and agentic AI incidents
▶ Board-ready reporting templates with risk quantification

Deliverable: Self-sustaining AI security operations with documented runbooks

Caveat: The first adversarial red-team always finds something. That is the point. A red-team that finds nothing either was not trying hard enough or was scoped too narrowly.

Questions CISOs Ask About AI Supply Chain Security

How long does it take to build a model vetting pipeline from scratch?

4-6 weeks for a basic pipeline covering static scanning and signature verification. 8-12 weeks for full behavioral sandboxing with CI/CD integration. The bottleneck is rarely the scanning technology itself. It is integrating with your existing model registry (MLflow, Weights & Biases, JFrog ML) and defining the policy logic: what gets blocked vs. flagged vs. quarantined. We have found that the policy decisions take longer than the engineering.

Format complexity adds time. Pickle, PyTorch, GGUF, Keras, and SafeTensors each require different analysis approaches. Pickle remains the highest-risk format because torch.load() executes arbitrary Python during deserialization, which is why behavioral sandboxing matters more than static scanning for that format. SafeTensors is the safest serialization option and the simplest to scan, but fewer than 20% of production models use it today. Your pipeline needs to handle all of them because you cannot control what format upstream model providers choose.

We already use Palo Alto/Wiz/JFrog for security. Why do we need custom work?

Those platforms are excellent at what they do. Palo Alto's Protect AI integration (via Prisma AIRS) gives you model scanning within your existing security stack. JFrog's MLSecOps handles model registry governance. Wiz adds AI-BOM to cloud visibility. What they do not do: design the end-to-end architecture, configure ML-BOM generation in your specific CI/CD pipeline, build the policy logic for your regulatory context, or re-engineer your model deployment workflow. They are scanning tools. We are the implementation team that makes them work together.

Many engagements start with organizations that already have these platforms but need help operationalizing them. A common pattern: the security team purchased Protect AI six months ago, ran a scan, got 400 findings, and then stalled because nobody mapped those findings to remediation workflows or integrated scanning into the model promotion pipeline.

What is the actual risk of model poisoning? Has it happened in production?

The technical barrier to model poisoning is lower than most CISOs assume. Research demonstrates that as few as 250 poisoned documents in a training corpus can backdoor a 13B-parameter model. Microsoft published breakthrough detection methods in February 2026, but most organizations have zero detection capability deployed. The fine-tuning safety degradation problem is more immediate and more common: Llama 3.1 8B drops from 0.95 to 0.15 on prompt injection resilience after a single round of fine-tuning. That is not an attack. That is normal fine-tuning without safety re-evaluation.

Documented production incidents of intentional model poisoning remain rare. But the conditions are ripe: 80%+ of ML models use pickle serialization, 62% of security teams cannot identify where models are deployed, and a model named "baller423" on Hugging Face was found establishing a reverse shell to Kreonet. The FTC's model disgorgement precedent (Weight Watchers/Kurbo, 2022) means a poisoned model could force you to delete and retrain from scratch, at costs that dwarf the breach itself.

How do we handle the EU AI Act's model provenance requirements?

The EU AI Act is fully applicable August 2, 2026. For high-risk AI systems, you need technical documentation covering training data provenance, scope, characteristics, and cleaning methodologies. Supply chain obligations require importers and distributors to verify conformity assessment, technical documentation, and CE marking. Practically, this means ML-BOMs for every model in your pipeline, signed attestations for provenance, and audit trails for fine-tuning decisions.

CycloneDX ML-BOM is the most implementation-ready standard. SPDX 3.0 added AI/ML profiles in 2024, and some organizations need both formats for different regulatory audiences. We build the documentation pipeline so provenance tracking is automated, not a manual compliance exercise. The common mistake is treating this as a one-time documentation project. Every fine-tuning run, every model update, and every dataset change needs to generate updated provenance records. If your ML-BOM is static, it is wrong within weeks.

Can we secure AI agents without slowing them down?

Privilege separation is the foundation. Every agent gets a least-privilege profile that defines which tools it can call, which APIs it can access, and which file system paths it can touch. This mirrors Linux's capability model applied to AI agents. The GitHub Copilot RCE (CVE-2025-53773, CVSS 7.8) happened because YOLO mode gave the agent unrestricted system access, and a prompt injection in a repository's documentation escalated to full remote code execution. Deterministic policy layers prevent that escalation path entirely.

Runtime monitoring adds a behavioral baseline that detects anomalous agent actions (unexpected tool calls, unusual API patterns, privilege escalation attempts) without adding latency to normal operations. There IS a small latency cost for security checks on high-risk operations: filesystem writes, cloud API calls, credential access. For most enterprise deployments, this is 50-200ms per gated operation. Low-risk operations (reading approved data sources, generating text, calling pre-approved APIs) pass through with zero added latency. The question is whether 50-200ms on high-risk calls is acceptable compared to an agent with full system access and no guardrails.

What does an AI security incident response look like?

AI security incidents require different forensics than network intrusions. For model-level attacks (poisoning, backdoors), the response sequence is: isolate the model from production, verify the integrity of the training pipeline, check for data exfiltration through model outputs (models can encode stolen data in their weights and leak it via carefully crafted prompts), and determine whether you need to retrain from a known-clean checkpoint.

For agentic AI incidents, you also need to trace every tool call and action the agent took, verify the integrity of its memory and context window (prompt injection can persist across sessions if context is stored), and check for lateral movement via the agent's permissions. Generic IR processes do not cover model-level forensics because the artifacts are different. You are not analyzing network logs and memory dumps. You are analyzing model weights, training data provenance, fine-tuning histories, and agent action logs. We build playbooks specific to these scenarios, including evidence preservation procedures for model weights (which can be hundreds of gigabytes), chain-of-custody documentation for training data, and communication templates for regulators who may require model disgorgement.