A striking editorial image showing a video conference grid where most participant faces are subtly glitching or dissolving into digital artifacts, while one real human face looks on — conveying the core premise of the Arup attack where one real person sat among synthetic identities.
Artificial IntelligenceCybersecurityDeepfakes

A Deepfake CFO Stole $25 Million on a Zoom Call. Here's Why Your Company Could Be Next.

Ashutosh SinghalAshutosh SinghalApril 24, 202614 min read

I was on a call with a prospective client — a CFO at a mid-size manufacturing firm — when he said something that stopped me cold.

"We already verify identity on video calls. We can see each other's faces."

I asked him if he'd heard about what happened to Arup. He hadn't. So I told him: in February 2024, a finance employee at Arup — the global engineering firm behind the Sydney Opera House — joined a video conference with his CFO and several senior executives. They discussed a confidential transaction. The CFO instructed him to wire funds. He made 15 transfers totaling $25.6 million across five bank accounts. Every face on that call was fake. Every voice was synthetic. The CFO was an AI-generated deepfake. So were the other executives. The employee was the only real human in the room.

The line went quiet for about ten seconds. Then he said, "That can't be real."

It is. And it's the reason I've spent the last stretch of time rethinking everything we build at Veriprajna — because the Arup breach didn't just expose a cybersecurity gap. It exposed a trust architecture problem that most companies haven't even begun to confront.

The Night I Realized "Seeing Is Believing" Is Dead

I first read the forensic analysis of the Arup breach late one evening, sitting in my home office with a cup of chai that went cold before I finished the second page. What struck me wasn't the dollar amount — though $25.6 million is staggering. It was the elegance of the attack. There was no malware. No credential theft. No unauthorized database access. Arup's digital infrastructure was never breached at all.

The attackers didn't hack the system. They hacked the human.

When the CFO's face and voice can be perfectly fabricated, the traditional signals of trust are broken. Not weakened — broken.

They spent months scraping publicly available video of Arup executives from YouTube, conference talks, and corporate recordings. They trained Generative Adversarial Networks — two neural networks that compete against each other, one generating fake content, the other trying to detect it, iterating millions of times until the fakes are indistinguishable from reality — to create what forensic experts call "high-fidelity synthetic twins." Not just faces. Speech patterns. Intonations. The way someone pauses before answering a question.

Then they sent a spear-phishing email from the "CFO" requesting help with a confidential transaction. The employee was skeptical. Good instincts. But the attackers had a second move: they invited him to a live video call where multiple familiar faces confirmed the request in real time.

His skepticism dissolved. Of course it did. What rational person doubts the evidence of their own eyes when four colleagues are looking back at them on screen?

How Do You Deepfake an Entire Boardroom?

A diagram explaining the video injection attack pipeline — how synthetic video bypasses detection by feeding directly into conferencing software's data stream, contrasted with a simpler presentation attack.

This is the question my team kept circling back to. We'd seen single-person deepfakes before — a cloned voice here, a face-swapped video there. But a multi-participant live video conference? That felt like a leap.

It turns out the technical barriers have collapsed faster than most security teams realize.

The attackers used a technique called video injection rather than a simpler "presentation attack" (where someone holds a screen in front of a camera). Injection attacks feed synthetic video directly into the conferencing software's data stream using virtual camera software. Zoom, Teams — the application treats the AI-generated feed as if it's coming from a physical webcam. There's no screen border to detect, no depth anomaly to flag. Research shows injection attacks targeting identity verification providers increased by 255% in 2023, while face-swap attacks rose by 704%.

I remember sitting in a team meeting where one of our engineers demonstrated a real-time face swap using open-source tools. It took him about forty minutes to set up. The result wasn't perfect — there was a slight flicker around the jawline — but on a compressed Zoom feed? You wouldn't notice. And that was with free software and no training data. The Arup attackers had months of preparation and, presumably, resources.

My CTO looked at me across the table and said, "We need to stop thinking about this as a cybersecurity problem. This is an epistemology problem. How does anyone know what's real?"

He was right. And that realization reshaped how I think about everything we build.

Why Does Your "AI Strategy" Make This Worse?

Here's the part that most coverage of the Arup breach misses entirely: the way most companies have adopted AI actually increases their vulnerability to this kind of attack.

I'm talking about the "LLM wrapper" — the dominant enterprise AI architecture right now. You take a public API from OpenAI or Anthropic, wrap a thin software layer around it, connect it to some business processes, and call it your AI strategy. It's fast to deploy. It's cheap. And it's fundamentally inadequate for anything that matters.

Three reasons.

First, data egress. In a wrapper-based deployment, your most sensitive data — financial spreadsheets, internal memos, executive communications — leaves your corporate perimeter to be processed by a third-party cloud. Even if the provider promises not to train on it, the data exists in an external environment subject to the US CLOUD Act, opaque sub-processor relationships, and potential model-based exfiltration. You're sending the exact kind of information an attacker would need to build convincing deepfakes of your executives outside your walls.

Second, the reliability gap. LLMs are probabilistic. They predict the most likely next word based on statistical patterns, not grounded understanding of your corporate reality. When an AI agent reports a price, approves a discount, or interprets a policy, it's generating a plausible answer — not retrieving a verified fact. In high-stakes environments, that gap between "plausible" and "true" is where fraud lives.

Third — and this one haunts me — the "unembodied advisor" problem. For engineering firms like Arup, a text-based LLM wrapper generates advice without any integrated feedback loops to verify physical or biological safety. In structural engineering or chemistry, a minor change in a calculation can lead to a catastrophically different outcome. A wrapper operating on semantic distance rather than the laws of physics can't identify these critical deviations. It doesn't know what it doesn't know.

I wrote about this architectural vulnerability in depth in the interactive version of our research — the core argument is that wrappers create an illusion of intelligence while leaving the organization structurally exposed.

What Would Have Actually Stopped the Arup Attack?

A defense stack diagram showing the three complementary detection/verification layers — physiological detection (heartbeat analysis), behavioral biometrics (keystroke/mouse patterns), and cryptographic provenance (C2PA) — and how they work together as a multi-layered identity verification system.

This is the question I kept asking myself. Not "what should Arup have done differently" — that's Monday-morning quarterbacking. But: what architecture would make this kind of attack fail?

The answer isn't a single technology. It's a stack. And it starts with abandoning the idea that visual confirmation equals identity verification.

The Heartbeat You Can't Fake

One of the most fascinating detection approaches I've encountered analyzes something called "heartbeat-induced" changes in facial color. Technologies like Intel's FakeCatcher monitor micro-variations in skin tone — invisible to the human eye — that correspond to cardiovascular activity. A living human face subtly changes color with each heartbeat. A deepfake doesn't. Or if it does, the timing is wrong.

When I first learned about this, I thought it sounded like science fiction. Then I watched a demo where the system correctly identified a high-quality deepfake that had fooled every person in the room. The synthetic face had perfect skin texture, perfect lip sync, perfect eye movement. But no pulse.

A deepfake can replicate your face, your voice, and your mannerisms. It cannot replicate your heartbeat.

The Way You Type Is Your Signature

Behavioral biometrics is the layer that excites me most, because it's nearly impossible to forge. Your keystroke dynamics — the speed, rhythm, and pressure of your typing — create a recognizable pattern unique to you. So do your mouse movements, your swipe speed on mobile, even the way you navigate between applications.

Imagine building a behavioral baseline for every senior executive. During a video call, the system continuously monitors whether the "CFO" typing in the chat behaves like the real CFO. If the typing cadence deviates from the historical profile while an unusual financial request is being made, the system flags it automatically. No human judgment required.

This is what continuous authentication looks like — not a one-time password at login, but an ongoing, invisible verification that the person you're talking to is who they claim to be.

Cryptographic Proof That Video Is Real

Instead of only trying to detect fakes, we need to start verifying authenticity at the source. The C2PA standard — Coalition for Content Provenance and Authenticity — embeds cryptographic metadata at the moment of video capture: the device, time, location, and a tamper-evident chain of custody. If a video feed in a Teams or Zoom call lacks these credentials, it should be treated with the same suspicion as an unsigned software package.

This is a mindset shift. We've spent years asking "is this fake?" The better question is: "can this prove it's real?"

The Architecture We're Actually Building

A layered architecture diagram showing the Neuro-Symbolic Sandwich — the three-layer stack where deterministic symbolic logic layers encase the neural LLM, with labeled data flows showing how inputs are sanitized and outputs are verified against real databases.

At Veriprajna, we've been calling our approach Deep AI — not because it's a marketing term, but because it describes a fundamentally different relationship between an organization and its AI infrastructure. Instead of "AI-as-a-service" through public APIs, we build "AI-as-infrastructure" within the organization's own secure environment.

Three pillars.

The first is infrastructure ownership. We deploy full inference stacks — Private Enterprise LLMs — directly into the client's Virtual Private Cloud or on-premises Kubernetes clusters. Sensitive data never leaves the perimeter. This isn't just a security measure; it creates bespoke model assets that belong to the client. Their intelligence stays sovereign.

The second is what we call Private RAG 2.0 — Retrieval-Augmented Generation that's natively integrated with internal security. If an employee doesn't have permission to view a document in SharePoint, the AI won't retrieve it to answer their question. This sounds obvious, but most RAG implementations treat the knowledge base as a flat pool. Ours respects the same access controls that govern the rest of the organization.

The third — and the one I'm most proud of — is the Neuro-Symbolic Sandwich. We encase the neural network (the LLM, with its creative language capabilities) between two layers of deterministic, symbolic logic. The bottom layer sanitizes inputs to prevent prompt injection before they reach the model. The top layer intercepts the model's output and executes it through rigid, pre-defined functions — querying a SQL database, checking an ERP system, retrieving a verified price. When the AI reports a number, it's pulling a fact, not predicting one.

The Neuro-Symbolic Sandwich ensures that when AI reports a price or an authorization status, it's retrieving a deterministic value from a database — not predicting one based on token probability.

I've had people tell me this is overengineered. "Just use GPT with good prompts," an investor said to me once, with the confidence of someone who has never been responsible for a wire transfer. I think about the Arup employee — a competent professional who did everything that seemed reasonable — and I know that "good enough" prompts are not good enough when the stakes are measured in millions.

For the full technical breakdown of this architecture, including the neuro-symbolic design patterns and RBAC-aware retrieval, see our detailed research paper.

What Happens When the CIO Becomes Personally Liable?

There's a legal dimension to the Arup breach that most technologists aren't tracking, and it should terrify every CIO and CTO reading this.

Courts increasingly follow the "Impostor Rule" for wire transfer fraud: losses should be borne by the party in the best position to have prevented the fraud. In the Arup case, while the employee was deceived, the firm's failure to implement multi-channel verification for high-value transactions could be seen as the primary point of failure.

CIOs and CTOs are corporate officers with fiduciary duties. As deepfake-enabled fraud becomes a known and documented risk — and after Arup, it is definitively known — failure to implement deepfake-aware controls could result in personal liability if a company is sued by shareholders for negligence. This isn't hypothetical. The California Consumer Privacy Act, the EU AI Act, and frameworks like NIST's AI Risk Management Framework are all converging on the expectation that organizations will have specific, documented defenses against synthetic media attacks.

I've started asking CIOs a simple question in every meeting: "If an attacker deepfaked your CEO on a video call tomorrow and someone wired $10 million, could you demonstrate to a court that you had reasonable safeguards in place?"

The silence that follows tells me everything.

Can't We Just Train People to Spot Deepfakes?

People ask me this constantly, and I understand the instinct. It's the cheapest solution. Just teach everyone what to look for — the flickering jawline, the weird ear, the slightly off lighting.

Here's the problem: detection by human eye is an arms race you've already lost. The artifacts that were detectable in 2023 deepfakes are largely absent in 2025 deepfakes. The technology improves faster than human perception adapts. And on a compressed video call with mediocre lighting and intermittent bandwidth — which describes most corporate Zoom calls — even current-generation deepfakes are functionally invisible.

Training helps, but not in the way most people think. The goal isn't to make employees into deepfake detectors. It's to build what I call a culture of empowered skepticism — rewarding people who challenge suspicious requests, even when those requests appear to come from the CEO. The Arup employee's initial instinct was to be skeptical of the phishing email. That instinct was correct. It was overridden by the social proof of a video call with familiar faces.

The fix is procedural, not perceptual. High-value transactions require out-of-band verification: a direct call to a pre-verified phone number, a pre-agreed authentication code shared through a separate channel, or dual authorization from someone who wasn't on the original call. Video conferencing can no longer be the gold standard for identity authentication in financial transactions. Period.

The $25 Million Blueprint

I keep coming back to something that bothers me about how the Arup story is usually told. It's framed as a cautionary tale — "look how sophisticated the bad guys are getting." And that's true, but it's incomplete.

The deeper lesson is architectural. Arup's digital systems were fine. Their firewalls held. Their encryption worked. The attack succeeded because the organization's trust architecture — the set of assumptions about how identity is verified and decisions are authorized — hadn't evolved to account for a world where synthetic media is cheap, convincing, and real-time.

Most organizations I talk to are in the same position. They've invested heavily in perimeter defense while leaving the human layer — the layer that actually authorizes the wire transfers, approves the contracts, signs off on the engineering specifications — protected by nothing more than the assumption that faces and voices are hard to fake.

That assumption died in a Hong Kong conference room in February 2024. The question is whether your organization will update its trust architecture before or after it pays its own $25 million tuition.

The Arup breach wasn't a cybersecurity failure. It was a trust architecture failure — and most organizations haven't updated theirs since the era when faces couldn't be faked.

I'm not hedging on this. The organizations that move now — deploying sovereign AI infrastructure, implementing behavioral biometrics, demanding cryptographic provenance for video feeds, and building procedural circuit-breakers into every high-value decision — will define the next era of enterprise security. The ones that wait will become case studies.

The cost of a deepfake that can fool your finance team is dropping toward zero. The cost of being fooled is not.

Related Research