AI Security Engineering
Your models are executable code. Most organizations treat them like data files. That gap is where breaches happen.
$4.63M
Average breach cost involving shadow AI
IBM Cost of a Data Breach 2025
83%
Of organizations lack automated AI security controls
Kiteworks 2025
352K
Unsafe issues found across 51,700 models on public registries
Protect AI 2025
AI models are not static artifacts. They are code that runs during loading, training, inference, and agent execution. Four attack categories dominate the threat model.
torch.load() executes arbitrary Python during deserialization. This is not a bug. It is the designed behavior of pickle serialization, and 80%+ of ML models use it.
A model named "baller423" on Hugging Face was found establishing a reverse shell to Kreonet. The model looked normal. It passed basic scans. It ran arbitrary code the moment someone loaded it.
PickleScan, the most widely used defense, has at least 3 known zero-day bypasses (CVE-2025-10155). Blacklist-based scanning is fundamentally broken because the attacker controls the serialization format.
Llama 3.1 8B drops from 0.95 to 0.15 on prompt injection resilience after a single round of fine-tuning. That is a 84% degradation in safety alignment from normal, non-adversarial training.
Almost nobody re-evaluates safety after fine-tuning. The model passes the initial safety evaluation, gets fine-tuned on domain data, and goes to production with its guardrails effectively removed. This is not an exotic attack. It is the default workflow at most organizations.
98% of organizations have unauthorized AI usage. That number is not a typo. The $670K additional breach cost for shadow AI incidents reflects a simple reality: you cannot secure what you cannot see.
62% of security teams cannot identify where LLMs are deployed in their environment. Developers download models from Hugging Face, call OpenAI APIs with personal keys, and deploy fine-tuned models on personal cloud accounts. Current security tools surface roughly 38% of this activity.
GitHub Copilot's RCE vulnerability (CVE-2025-53773, CVSS 7.8) turned a prompt injection in a repository's documentation into full system compromise via YOLO mode. The agent read a malicious instruction, executed it as code, and the user's machine was owned.
Amazon Q's cleaner.md file distributed destructive commands to 950K+ users through the agent's context window. OpenClaw's marketplace accumulated 138 CVEs across 63 days, with 12% of submitted skills found to be malicious.
Agents turn prompt injections into system-level compromises because they have tool access, credentials, and execution privileges that traditional LLMs lack.
The vendor ecosystem is maturing fast. Here is an honest view of what each player covers and where the gaps remain.
| Provider | What They Do | What They Don't Do | Best For |
|---|---|---|---|
| Palo Alto / Protect AI | Model scanning, AI-BOM generation, integrated into Prisma AIRS platform | Architecture design, custom pipeline engineering, organizational change management | Enterprises already on the PANW platform |
| HiddenLayer | Runtime AI detection and response, agentic security monitoring | Supply chain architecture, ML-BOM implementation, compliance mapping | SOC teams adding AI visibility |
| JFrog | MLSecOps, model registry security, Hugging Face integration | Adversarial red-teaming, safety alignment validation, governance design | DevOps teams managing model artifacts |
| Wiz | AI-BOM in cloud security context, model scanning | On-prem model security, fine-tuning safety, agentic architecture | Cloud-first organizations |
| NVIDIA NeMo Guardrails | Open-source runtime guardrails for LLMs | Model scanning, supply chain security, provenance tracking | Teams building custom LLM applications |
| Big 4 / Large SIs | Governance frameworks, compliance documentation, board decks | Implementation. Building scanning pipelines, configuring ML-BOMs, deploying model signing. Engagements start at $500K strategy, scale to $3-10M. | Organizations needing audit-ready documentation |
| Open Source (ModelScan, PickleScan, SafeTensors) | Free basic scanning and safer serialization formats | Enterprise-grade orchestration, behavioral sandboxing, provenance, policy enforcement | Teams with strong internal security engineering |
A gap nobody fills well. Organizational culture change is the hardest part. No tool or consultancy eliminates the human tendency to bypass governance for speed. We build the technical controls, but the CISO still needs executive buy-in. When a data scientist can download a model from Hugging Face in 30 seconds, any security gate that takes 30 minutes will get bypassed. The controls need to be fast enough that compliance is easier than circumvention.
Six capabilities, each engineered to integrate with your existing security stack and CI/CD pipelines.
We build automated vetting that sits between public model repositories and your internal registry. Every model passes through behavioral sandboxing (loaded in isolated containers, syscalls monitored), multi-format deep analysis (pickle, PyTorch, GGUF, Keras, SafeTensors), and cryptographic signing with your enterprise PKI.
We reach for behavioral analysis over static scanning because PickleScan's zero-day bypasses prove blacklist approaches are fundamentally broken. Static scanning asks "does this file contain known-bad patterns?" Behavioral sandboxing asks "what does this code actually do when it runs?" The second question catches novel attacks.
CycloneDX ML-BOM generation integrated into CI/CD. Every model gets a bill of materials documenting training data provenance, framework versions, dependency trees, and fine-tuning history.
We use CycloneDX over SPDX because the ML-BOM tooling is more mature, though we ensure SPDX 3.0 export for organizations that need both. The ML-BOM is not a compliance checkbox. It is the data structure that makes every other security control possible: you cannot sign what you cannot inventory, and you cannot audit what you cannot trace.
Network-level detection of unauthorized model downloads and AI API calls. Integration with your existing SIEM/SOAR. We map every AI touchpoint including shadow deployments, then build policy enforcement that blocks risk without blocking innovation.
The goal: your security team sees 100% of AI usage, not the 38% that current tools surface. Detection covers Hugging Face downloads, OpenAI/Anthropic/Google API calls, model weight transfers over HTTP/S, and local model execution via process monitoring on managed endpoints.
Automated safety re-evaluation after every fine-tuning run. OWASP LLM Top 10 benchmark suite, adversarial probing for backdoor triggers, and safety alignment regression testing.
We build this because almost nobody re-evaluates safety after fine-tuning. The safety degradation data in the section above makes the case. The validation pipeline runs as a CI/CD gate. A model that fails safety regression cannot be promoted to production, regardless of its task performance.
Privilege separation for AI agents. Deterministic policy layers that prevent prompt-to-RCE escalation (the exact attack vector in CVE-2025-53773). Tool-use policy enforcement, human-in-the-loop gates for high-risk operations, and runtime behavior monitoring.
The architecture detects anomalous agent actions before they cascade. An agent that suddenly starts writing to filesystem paths outside its sandbox, calling APIs it has never called before, or attempting privilege escalation gets terminated and flagged for review.
For CISOs building the function from scratch. NIST AI 100-2 control mapping, EU AI Act compliance architecture, board-level risk quantification, and incident response playbooks for AI-specific attacks.
We help translate technical risk into budget justification that boards approve. "We found 352K unsafe issues across public model registries" is a data point. "Our engineers downloaded 47 unvetted models last quarter, 3 contained executable code in their serialization layer, and our current controls detected none of them" is a budget justification.
Three phases, each with defined deliverables and honest caveats about what to expect.
Weeks 1-3
Deliverable: AI Security Posture Report with prioritized risk register
Caveat: This phase often surfaces 3-5x more AI usage than the CISO expected. That is normal. The shadow AI discovery is the most valuable and the most uncomfortable part of the engagement.
Weeks 4-10
Deliverable: Production-ready security controls integrated into existing workflows
Caveat: Timeline depends on CI/CD maturity. Teams with mature DevOps pipelines deploy faster. Organizations still moving models via USB drives or shared folders (more common than you would expect) need additional infrastructure work.
Weeks 11-14
Deliverable: Self-sustaining AI security operations with documented runbooks
Caveat: The first adversarial red-team always finds something. That is the point. A red-team that finds nothing either was not trying hard enough or was scoped too narrowly.
Answer eight questions to benchmark your AI security posture. No data is collected. Everything runs in your browser.
4-6 weeks for a basic pipeline covering static scanning and signature verification. 8-12 weeks for full behavioral sandboxing with CI/CD integration. The bottleneck is rarely the scanning technology itself. It is integrating with your existing model registry (MLflow, Weights & Biases, JFrog ML) and defining the policy logic: what gets blocked vs. flagged vs. quarantined. We have found that the policy decisions take longer than the engineering.
Format complexity adds time. Pickle, PyTorch, GGUF, Keras, and SafeTensors each require different analysis approaches. Pickle remains the highest-risk format because torch.load() executes arbitrary Python during deserialization, which is why behavioral sandboxing matters more than static scanning for that format. SafeTensors is the safest serialization option and the simplest to scan, but fewer than 20% of production models use it today. Your pipeline needs to handle all of them because you cannot control what format upstream model providers choose.
Those platforms are excellent at what they do. Palo Alto's Protect AI integration (via Prisma AIRS) gives you model scanning within your existing security stack. JFrog's MLSecOps handles model registry governance. Wiz adds AI-BOM to cloud visibility. What they do not do: design the end-to-end architecture, configure ML-BOM generation in your specific CI/CD pipeline, build the policy logic for your regulatory context, or re-engineer your model deployment workflow. They are scanning tools. We are the implementation team that makes them work together.
Many engagements start with organizations that already have these platforms but need help operationalizing them. A common pattern: the security team purchased Protect AI six months ago, ran a scan, got 400 findings, and then stalled because nobody mapped those findings to remediation workflows or integrated scanning into the model promotion pipeline.
The technical barrier to model poisoning is lower than most CISOs assume. Research demonstrates that as few as 250 poisoned documents in a training corpus can backdoor a 13B-parameter model. Microsoft published breakthrough detection methods in February 2026, but most organizations have zero detection capability deployed. The fine-tuning safety degradation problem is more immediate and more common: Llama 3.1 8B drops from 0.95 to 0.15 on prompt injection resilience after a single round of fine-tuning. That is not an attack. That is normal fine-tuning without safety re-evaluation.
Documented production incidents of intentional model poisoning remain rare. But the conditions are ripe: 80%+ of ML models use pickle serialization, 62% of security teams cannot identify where models are deployed, and a model named "baller423" on Hugging Face was found establishing a reverse shell to Kreonet. The FTC's model disgorgement precedent (Weight Watchers/Kurbo, 2022) means a poisoned model could force you to delete and retrain from scratch, at costs that dwarf the breach itself.
The EU AI Act is fully applicable August 2, 2026. For high-risk AI systems, you need technical documentation covering training data provenance, scope, characteristics, and cleaning methodologies. Supply chain obligations require importers and distributors to verify conformity assessment, technical documentation, and CE marking. Practically, this means ML-BOMs for every model in your pipeline, signed attestations for provenance, and audit trails for fine-tuning decisions.
CycloneDX ML-BOM is the most implementation-ready standard. SPDX 3.0 added AI/ML profiles in 2024, and some organizations need both formats for different regulatory audiences. We build the documentation pipeline so provenance tracking is automated, not a manual compliance exercise. The common mistake is treating this as a one-time documentation project. Every fine-tuning run, every model update, and every dataset change needs to generate updated provenance records. If your ML-BOM is static, it is wrong within weeks.
Privilege separation is the foundation. Every agent gets a least-privilege profile that defines which tools it can call, which APIs it can access, and which file system paths it can touch. This mirrors Linux's capability model applied to AI agents. The GitHub Copilot RCE (CVE-2025-53773, CVSS 7.8) happened because YOLO mode gave the agent unrestricted system access, and a prompt injection in a repository's documentation escalated to full remote code execution. Deterministic policy layers prevent that escalation path entirely.
Runtime monitoring adds a behavioral baseline that detects anomalous agent actions (unexpected tool calls, unusual API patterns, privilege escalation attempts) without adding latency to normal operations. There IS a small latency cost for security checks on high-risk operations: filesystem writes, cloud API calls, credential access. For most enterprise deployments, this is 50-200ms per gated operation. Low-risk operations (reading approved data sources, generating text, calling pre-approved APIs) pass through with zero added latency. The question is whether 50-200ms on high-risk calls is acceptable compared to an agent with full system access and no guardrails.
AI security incidents require different forensics than network intrusions. For model-level attacks (poisoning, backdoors), the response sequence is: isolate the model from production, verify the integrity of the training pipeline, check for data exfiltration through model outputs (models can encode stolen data in their weights and leak it via carefully crafted prompts), and determine whether you need to retrain from a known-clean checkpoint.
For agentic AI incidents, you also need to trace every tool call and action the agent took, verify the integrity of its memory and context window (prompt injection can persist across sessions if context is stored), and check for lateral movement via the agent's permissions. Generic IR processes do not cover model-level forensics because the artifacts are different. You are not analyzing network logs and memory dumps. You are analyzing model weights, training data provenance, fine-tuning histories, and agent action logs. We build playbooks specific to these scenarios, including evidence preservation procedures for model weights (which can be hundreds of gigabytes), chain-of-custody documentation for training data, and communication templates for regulators who may require model disgorgement.
The technical foundations behind this solution, published as detailed whitepapers.
WP-91
ML-BOMs, model scanning, cryptographic signing, shadow AI detection, and confidential computing for enterprise ML pipelines.
WP-18
Multi-layer AI validation, adversarial robustness testing, and NIST AI RMF compliance frameworks.
WP-89
2025 breach analysis, neuro-symbolic guardrails, and constitutional AI safety architecture for production systems.
WP-93
Data poisoning detection, provenance tracking, and sovereign AI infrastructure for high-assurance environments.
62% of security teams cannot identify where AI models are deployed in their own environment.
Most organizations discover their AI security gaps after an incident. We help you find them before one happens.