Biosecurity AI Safety for Pharma & Biotech

Three Attack Vectors Your Current Safety Layer Cannot Stop

Refusal training, RLHF alignment, and structural-alert filters were designed for a world where attacks looked like "design me a nerve agent." The 2025 attack surface is subtler, more automated, and operates below the level these defenses monitor.

Reward Inversion (The MegaSyn Pattern)

A generative chemistry model optimizes for a reward function. In drug discovery, that function scores for therapeutic properties. Flip the sign, and the same model optimizes for lethality. The MegaSyn experiment required changing a single Python config value. Most pharma generative pipelines built on REINVENT 4, AutoDesigner, or custom reward-shaped models have the identical architectural vulnerability: the reward function is a configuration parameter, not a hardcoded constraint.

Why current defenses miss it: Toxicophore filters (Chemistry42's 460+ MCFs, Chemaxon structural alerts) catch known toxic substructures in the output. They do not constrain the optimization objective. A model optimizing toward the CWA manifold can generate novel structures that pass every known-toxicophore check because they are structurally novel.

Homology-Guided Beam Search (GeneBreaker)

GeneBreaker does not ask a biology model for "a pathogen." It asks for a protein homologous to a benign reference that happens to be structurally similar to a Select Agent protein. An LLM agent orchestrates bioinformatic tools, uses PathoLM and log-probability heuristics to guide beam search, and evaluates candidates against BLAST. The attack achieved up to 60% success rate on Evo 2-40B across 6 viral categories, with demonstrated structural and sequence fidelity on SARS-CoV-2 spike protein and HIV-1 envelope protein.

Why current defenses miss it: Keyword-based safety filters and refusal training look for explicit requests. Homology attacks never mention the target pathogen. The request looks like legitimate comparative genomics research until you analyze the functional properties of the generated sequence.

Malicious Fine-Tuning and Relearning Recovery

For any open-weight model running on-premise: 10-50 fine-tuning examples and a few hundred dollars of GPU time strip safety alignment and restore pre-training biological capability to near-frontier levels (arXiv 2508.03153). For models that have undergone machine unlearning (RMU): benign relearning on loosely related public data (medical articles, biology textbooks) can jog the model back toward pre-unlearning performance (CMU/ICLR 2025). The strong claim that "the knowledge is gone" is closer to "the knowledge is deeply obfuscated" as of 2025.

Why current defenses miss it: RLHF refusal is a behavioral constraint, not a capability constraint. It teaches the model to refuse, not to forget. MFT removes the refusal while preserving the capability. Even unlearning (a capability constraint) is partially reversible. Defense requires multiple independent layers, not a single technique.

The 2026 Regulatory Vacuum

The US executive framework pharma compliance teams planned against through 2024 has been rescinded. The EU framework keeps tightening. A pharma with EU operations must comply to the EU standard regardless of US posture. ISO 42001 certification increasingly serves as the baseline insurers and partners expect.

Framework	Status (April 2026)	What It Requires
EU AI Act (GPAI)	Enforcing Aug 2026	Systemic-risk assessment, adversarial testing, incident reporting for GPAI models used in biology. Penalties: €15M / 3% turnover.
EU AI Act (High-Risk)	Enforcing Aug 2026	Risk management system, data governance, human oversight, accuracy/robustness. Penalties: €35M / 7% turnover for prohibited practices.
ISO/IEC 42001:2023	Active, voluntary	AI management system with controls proportionate to risk. For CBRN-adjacent AI: elimination controls required, not just administrative. Increasingly expected by insurers.
NIST AI 600-1	Published July 2024	GenAI Risk Profile explicitly names CBRN as 1 of 12 unique risks. Maps to AI RMF functions (Govern, Map, Measure, Manage).
FDA Draft Guidance	Draft, Jan 2025	Context-specific credibility assessment for AI in drug/biological product development. Final guidance expected 2026.
US EO Framework	Rescinded	EO 14110 (AI safety) rescinded Jan 2025. EO 14081 (Bioeconomy) rescinded Mar 2025. EO 14292 (bio research safety) issued May 2025 but 90-day implementation deadline passed without replacement framework.
BIOSECURE Act	Active 2026	Restricts US federal contracts with certain foreign biotech companies. Creates new supply-chain compliance obligations for anyone in the federal funding ecosystem.

Who Does What Today

A reference for internal conversations. Every row is honest about gaps, including the gaps we cannot close either.

Category	Examples	What They Do	What They Miss
Frontier Labs	Anthropic (ASL-3), OpenAI	Model-level CBRN evaluations, constitutional classifiers, refusal training at the API boundary	Cannot protect your internal fine-tuned models, generative chemistry pipelines, or RAG workflows. ASL-3 protects Claude, not your REINVENT instance.
GenChem Platforms	Chemistry42, REINVENT 4, Schrödinger	Structural-alert filtering (toxicophores, PAINS, reactive groups), ADMET scoring, physics-based docking	Filter outputs, not objectives. Cannot detect latent-space proximity to CWA manifold. REINVENT's reward function is a config file with the MegaSyn vulnerability.
DNA Screening	IGSC, SecureDNA, IBBIS	Homology-based screening against Select Agent lists. SecureDNA adds cryptographic hashing. Post-Paraphrase Project patches deployed late 2025.	Screening happens after you place the order. No visibility into what your generative models propose internally. Functional-prediction still limited for novel scaffolds.
Academic / CAIS	CAIS (WMDP), CMU, Stanford	Publish benchmarks (WMDP), develop unlearning techniques (RMU, UIPE), run evaluations	Do not deploy, integrate, maintain, or certify. Research outputs need engineering to become operational controls.
Big 4 / Large SIs	Deloitte, Accenture, EY, KPMG	AI governance frameworks, policy writing, risk assessments, ISO 42001 gap analysis on paper	Implement governance, not technical controls. Will not build a latent-space critic, run relearning attacks, or integrate SAE feature ablation into your MLOps. Engagements run $500K-$5M+ and deliver documents, not deployed systems.
In-House ML Teams	Your pharma's AI/ML group	Domain expertise, model training, pipeline engineering, deep knowledge of your specific data and workflows	Rarely have specialist background in adversarial robustness, LLM unlearning, topological data analysis for manifold detection, or CBRN-specific threat modeling. Not their job.

Honest gaps we cannot close either: If your R&D leadership does not want biosecurity reviews slowing iteration, no technical layer will stick. If an adversary exfiltrates weights AND has a curated bioweapons dataset, capability can be rebuilt regardless of unlearning. Unknown-unknown threats (capabilities not yet enumerated in WMDP) remain outside any benchmark's reach. Upstream data poisoning requires cooperation we cannot compel.

What We Build

Five capabilities, each addressing a specific gap in the current defense landscape. We sit on top of whatever stack you already run. Not a product. A custom build per engagement.

Generative Chemistry Safety Middleware

Intercepts SMILES, SELFIES, and graph outputs from your generative pipeline before they reach the researcher. Not a filter on known bad structures. A latent-space proximity scorer that measures distance to the chemical weapons agent manifold using topological data analysis.

Technical choices: We reach for persistent homology (Vietoris-Rips filtration) to characterize the CWA region of latent space because it is robust to the coordinate transforms that defeat simpler distance metrics. Combined with activity-cliff detection for borderline candidates. Every intercept produces an ISO 42001 audit log entry.

Knowledge-Gap Engineering for Biology LLMs

RMU + SAE feature ablation + UIPE applied to your specific biology model. We target the capability circuits that enable pathogen-related generation while preserving the therapeutic-discovery capabilities your researchers need daily.

Technical choices: SAE (Sparse Autoencoder) feature identification locates the specific neurons and attention heads responsible for CBRN-relevant generation. Ablation is surgical: we verify therapeutic performance benchmarks hold within 2% of pre-intervention baselines. Monthly re-certification catches relearning drift. This is not set-and-forget.

Biosecurity Red-Team on Demand

Quarterly adversarial testing covering the full 2025-2026 attack surface: GeneBreaker-style homology attacks against your biology models, SMILES-prompting jailbreaks against your chemistry pipelines, malicious fine-tuning simulation on your open-weight models, and relearning recovery tests on unlearned systems.

Deliverable: Written report mapped to NIST AI 600-1 controls (Govern, Map, Measure, Manage). Each finding scored by exploitability, impact, and remediation difficulty. Not a penetration test report format. A controls-gap analysis that your ISO auditor can read directly.

Pre-Synthesis In-House Screening

Moves the DNA screening checkpoint from your vendor (post-order) to your pipeline (pre-order). Integrates with SecureDNA's cryptographic protocol and adds functional-prediction scoring that catches AI-paraphrased variants homology alone misses.

Why this matters: The Paraphrase Project (Microsoft/Twist/IDT, Science 2025) generated thousands of AI-paraphrased ricin variants that slipped past every commercial screen. Patches are deployed, but your compliance posture improves measurably when you screen before the sequence enters your ELN, not after your vendor flags an order.

Compliance Evidence Package

Maps all technical controls to ISO 42001, NIST AI RMF, EU AI Act GPAI obligations, NIH DURC policy, and ISO 20688-2:2024. The deliverable is a control matrix your compliance team can hand directly to an ISO auditor, an EU notified body, or a cyber-liability insurer. Not a policies-and-procedures document. Evidence that technical controls are deployed, tested, and continuously validated.

Insurance relevance: Cyber-liability insurers (Munich Re Specialty, November 2025 onwards) are raising premiums or excluding "AI-generated harm" for companies running open-weight models without documented risk controls. This package is what your risk team needs to answer the underwriting questionnaire.

How an Engagement Works

Four phases. Realistic timelines. Explicit about what each phase cannot achieve.

Pipeline Manifold Audit

3-4 weeks

Map every generative model in your pipeline: chemistry (REINVENT, Chemistry42, custom), biology (Evo 2, ESM-3, fine-tuned Llama), protein design (RFdiffusion, ProteinMPNN). For each model: characterize the latent space, identify CWA-adjacent regions, assess reward-function manipulability, test refusal boundaries, evaluate weight-access controls.

Limitation: Audit identifies vulnerabilities. It does not fix them. A pharma that wants the audit report for insurance purposes but does not commit to remediation will have a documented liability.

Defense Layer Build

8-12 weeks

Build and integrate the specific defense layers identified in the audit: safety middleware for chemistry pipelines, knowledge-gap engineering for biology models, pre-synthesis screening integration. Each component deployed into your existing MLOps infrastructure, not a parallel system.

Limitation: Knowledge-gap engineering on a 70B parameter model requires significant GPU time. Budget $50K-$150K in compute for a full RMU + SAE ablation pass depending on model size. SAE-targeted ablation reduces this vs. full-model unlearning but does not eliminate it.

Adversarial Red-Team

3-4 weeks

Full-spectrum attack simulation against the deployed defense layers. GeneBreaker homology attacks, SMILES-prompting variants, MFT simulation (on a sandboxed copy), relearning recovery attempts on unlearned models. Document what breaks, what holds, and what requires monitoring.

Limitation: Red-team tests known attack classes. Novel attacks (unknown-unknowns) require ongoing monitoring and quarterly re-assessment. A passing red-team does not mean "secure." It means "robust against the current state-of-the-art adversarial techniques."

Certification and Continuous Monitoring

2-3 weeks + ongoing retainer

Compile the compliance evidence package. Map controls to ISO 42001, NIST AI 600-1, EU AI Act GPAI obligations. Establish the monthly re-certification cadence: relearning attacks, middleware performance validation, new-threat integration. Hand off to your compliance team with runbooks.

Ongoing: $8K-$15K/month retainer covers monthly re-certification, quarterly red-team refresh, and threat-intelligence integration (new papers, new attack techniques, regulatory updates).

Questions Pharma Compliance Teams Ask Us

Can machine unlearning actually remove dangerous knowledge from a biology LLM?

Partially, and the honest answer matters. RMU (Representation Misdirection for Unlearning) can reduce a model's WMDP-Bio score from 75% to near random chance (26%). But the relearning research from CMU (ICLR 2025) demonstrated that unlearned models can be jogged back toward pre-unlearning performance using loosely related data like public medical articles.

UIPE (ACL 2025) improves durability by removing knowledge related to forgetting targets, and SAE feature ablation targets specific capability circuits. We treat unlearning as one defense layer with a monthly re-certification cycle. Every 30 days, we run relearning attacks against the unlearned model. If recovery exceeds a threshold, we re-apply the unlearning pass with updated parameters.

This is not a set-and-forget solution. It is a continuous maintenance commitment, typically 2-3 engineering days per monthly cycle.

What does biosecurity AI safety cost for a mid-size pharma?

A full engagement covering manifold audit, safety middleware build, knowledge-gap engineering, red-team, and compliance evidence package runs in the range of $180K-$450K depending on the number of models in scope, whether they are open-weight or API-based, and the regulatory jurisdictions you operate in. The ongoing red-team and re-certification retainer is typically $8K-$15K per month.

For context: EU AI Act non-compliance penalties for GPAI providers reach €15M or 3% of global turnover. A single biosecurity incident that makes headlines will cost multiples of the engagement in reputational damage, regulatory scrutiny, and insurance premium increases. The engagement is insurance with a deliverable.

We already use Claude with ASL-3 protections. Do we still need biosecurity controls on our own models?

Yes. Anthropic's ASL-3 constitutional classifiers protect the Claude API boundary. They monitor inputs and outputs for a defined class of CBRN-relevant generations. This is valuable and represents the strongest commercial posture available.

But ASL-3 does not protect your internal fine-tuned biology models (Evo 2, ESM-3, or a custom protein diffusion model), your generative chemistry pipelines (REINVENT, Chemistry42), your retrieval-augmented workflows where a biology model pulls from internal databases, or the outputs of any open-weight model running on your own infrastructure.

If a researcher fine-tunes an open-weight model on internal data for a legitimate drug-discovery task, ASL-3 has no visibility into that model's outputs. The GeneBreaker attack works on Evo 2, not Claude. Your biosecurity posture needs to cover the full pipeline, not just the frontier API you call for text generation.

How do you handle the open-weights problem when we run models on-premise for IP reasons?

This is the hardest problem in biosecurity AI safety, and we are honest about the residual risk. A model whose weights are accessible to anyone with file-system access can be maliciously fine-tuned with 10-50 examples and a few hundred dollars of GPU time (arXiv 2508.03153). No amount of alignment survives MFT.

Our approach has three layers. First, knowledge-gap engineering (RMU + SAE ablation) removes dangerous capabilities from the weights before deployment, making MFT recovery harder. Second, inference-time safety middleware intercepts outputs regardless of the model's internal state. Third, operational controls: weight-file integrity monitoring, access logging, and anomaly detection on generation patterns.

The residual risk we cannot eliminate: if an adversary exfiltrates weights AND has access to a curated bioweapons dataset, they can rebuild capability. No consultant can prevent this. What we can do is make it detectably harder and ensure your documented controls satisfy ISO 42001 and EU AI Act due-diligence requirements.

Does pre-synthesis in-house screening replace our DNA vendor's screening?

No. It complements it. Your DNA synthesis vendor (Twist, IDT, Genscript) runs IGSC Harmonized Screening Protocol v3.0 and increasingly ISO 20688-2:2024 compliant checks. As of late 2025, vendors have patched the specific AI-paraphrase vulnerability the Microsoft Paraphrase Project exposed.

But screening happens after you place the order. That creates two problems: a failed screen means wasted time and a compliance flag on your account, and you have no visibility into what your internal generative models are proposing before the order goes out.

In-house pre-synthesis screening catches problematic sequences at generation time, before they enter your electronic lab notebook, before a researcher decides to order them, and before your vendor's screening triggers an investigation. We integrate with SecureDNA's cryptographic hashing protocol and add a functional-prediction layer that catches the class of AI-paraphrased variants that homology alone misses. Think of it as moving the checkpoint upstream from the vendor to the pipeline.

Your Generative Chemistry Pipeline Is One Config Change Away from Designing Weapons