Video-call grid with four synthetic wireframe execs and one real person, money flowing out.
Artificial IntelligenceCybersecurityFintech

Your CFO's Face Costs $50 to Fake. Your Wire Controls Weren't Built for That.

Ashutosh SinghalAshutosh SinghalJune 18, 202613 min read

The first time I tried to fool one of these detection tools, it took me an afternoon and about fifty dollars.

I pulled public conference talks of a colleague off YouTube, fed them through a face-swap model running on a consumer graphics card, and pointed the output at a virtual camera. Then I joined a test video call as someone I am not. The "deepfake detection" plugin we were evaluating sat there, green light on, perfectly happy. It had been built to catch someone holding a printed photo up to a webcam. I had handed it a clean synthetic feed straight into the data stream, and it never looked at the right thing.

That afternoon is the whole problem with enterprise deepfake detection in one scene. The tools are real, some of them are good, and almost none of them protect you from the attack that's actually emptying corporate bank accounts. The defense that works isn't a tool at all. It's the thing I spent the next year convincing CFOs they needed and couldn't buy off a marketplace — the reason we eventually built a layered deepfake defense practice at Veriprajna instead of reselling somebody's plugin.

Let me tell you how I got there, because I started out believing the opposite.

I thought this was a detection problem. It is not.

When I first looked at corporate deepfake fraud, I did what every technologist does: I assumed it was an accuracy problem waiting for a better model. Buy the best detector, wire it into Zoom, done. Catch the fake face, stop the fraud.

So I went looking for the best detector. And the deeper I read, the more the floor dropped out from under that plan.

The number that broke my assumption came out of Purdue's 2025 benchmark work: detection tools that advertise 96 to 99 percent accuracy in the lab fall to 50 to 65 percent in real-world production. Read that again with a CFO's brain. Fifty percent is a coin flip. Sixty-five percent means one in three fakes walks through. You are being asked to stake a wire transfer — sometimes an eight-figure wire transfer — on a probabilistic alert that's wrong a third of the time.

A detector that's right two times in three is a great research result and a catastrophic control. You cannot put a coin flip in the authorization path for money that doesn't come back.

I sat with that for a while. The instinct in security is always to chase the better model. But there's no accuracy number that makes "the machine thinks this face is probably real" an acceptable basis for moving money. Even 99 percent wouldn't be — because the attacker only needs to win once, and they get to pick the day.

That was the first thing I got wrong, and admitting it changed everything we built afterward.

How does a $25.6 million deepfake call actually work?

Two panels: a presentation attack caught by the camera vs an injection attack bypassing it.

The case that everyone in this field studies is Arup, the global engineering firm, in February 2024. A finance employee in Hong Kong joined a video call with the company's CFO and several other senior executives, discussed a confidential transaction, and over the following days executed fifteen wire transfers totaling $25.6 million to five Hong Kong bank accounts.

Every person on that call except the victim was synthetic.

What gets me about Arup isn't the technology. It's how ordinary the path was. The attackers harvested public video and audio of the executives — YouTube, conference recordings, LinkedIn — and trained generative models to reproduce not just faces but intonation and micro-expression. The cost of the training data was zero, because we all publish our executives' faces ourselves. The model training ran on consumer hardware for under fifty dollars.

Then came the part most defenses miss. The employee was skeptical at first — good instinct — so the attackers escalated from email to a video call, because seeing familiar faces is exactly what overrides skepticism. And they injected the synthetic video using virtual-camera software, tools like OBS VirtualCam or the open-source Deepfake Offensive Toolkit, feeding fabricated frames directly into the conferencing stream.

This is the distinction I now open every CISO conversation with, because it's the one that determines whether your money is safe:

A presentation attack holds something in front of a camera. An injection attack bypasses the camera entirely. Liveness checks catch the first and never see the second.

Most "deepfake detection" on the market — the iProov-style biometric liveness checks built for identity onboarding, the tools that ask you to turn your head or follow a light — are designed for presentation attacks. They're genuinely good at it; iProov's Flashmark technology is NIST-certified for exactly that job. But an injection attack hands the conferencing app a synthetic feed that looks like legitimate hardware input, and the liveness logic upstream is happy. Injection attacks rose 255 percent in 2023 for a reason. They're how you beat the defenses that companies just bought.

No malware. No stolen credentials. No breached network. The only thing compromised at Arup was trust in what a person saw and heard on a screen.

Why won't a single vendor just stop this?

Once I understood the attack, I started mapping the vendor landscape to see who could stop it. I built a spreadsheet — modality, platform integration, what each tool is actually good for, and crucially, where each one breaks. It got long fast, and the conclusion was uncomfortable for anyone hoping to write one purchase order.

Reality Defender does real-time multimodal monitoring inside Zoom, but its server-side analysis adds round-trip latency to every frame, and because it inspects content rather than the camera path, its injection-attack coverage is thin — a clean synthetic feed still reads as legitimate input. Pindrop is excellent at voice — it documented the 1,300 percent surge in deepfake fraud and is the reason I take audio seriously — but it doesn't analyze the video stream at all. GetReal Security correlates biometric, behavioral, and context signals during a live call, which is closer to the right shape, but it's a newer entrant on a $17.5 million Series A with a limited track record at enterprise scale. Beyond Identity's RealityCheck verifies the webcam feed comes from physical hardware — directly relevant to injection — but it's device-level and doesn't look at content. Adaptive Security runs deepfake simulation training for employees, which matters, but training is not a control; it doesn't block anything.

I could keep going. The point is the shape, not the catalog: video coverage here, audio there, liveness in a third place, device attestation in a fourth, and the gaps between them are exactly where a competent attacker lives.

And then there's the consultancy answer. The Big Four and large integrators will happily run a deepfake engagement — for $500,000 to $5 million — and hand you a governance framework and a board deck. No detection tooling. They'll recommend the vendors above; they rarely build or integrate anything. I've read those deliverables. They are not wrong. They are just not a defense.

No single vendor covers video, audio, behavior, and the human process. Someone has to architect the seams between them. That someone is usually nobody, which is why the seams are where the money leaves.

The control that costs nothing and stops everything

Out-of-band verification flow: a wire instruction must clear a pre-registered callback before funds release.

Here's where my thinking flipped completely.

I'd spent weeks deep in detection accuracy, vendor modalities, injection-versus-presentation taxonomy. And the answer to "what would have stopped Arup" turned out to have nothing to do with any of it.

A mandatory out-of-band verification policy: any financial instruction above a defined threshold must be confirmed through a pre-registered callback number or a separate encrypted channel before execution. Not a number the caller gives you — a number you already had, stored before any of this started. The deepfaked CFO can be flawless. The injection can be undetectable. It doesn't matter, because the money doesn't move on the strength of the call. It moves after a treasury analyst dials a number taped inside their own desk drawer and hears the real person say yes.

This control costs nothing to implement. It's effective against every variant of synthetic media fraud — present, future, video, voice, whatever the attackers build next — because it doesn't try to detect the fake. It removes the video call from the authorization path entirely.

I'll be honest about why this was hard for me to accept. I'm a technologist. I wanted the answer to be a model. The idea that the highest-ROI intervention in a cutting-edge AI threat is a phone-call policy from 1995 felt like a letdown. It took watching detection numbers fail in testing — over and over, the green light on a fake — before I stopped resisting the unglamorous truth.

Detection layers add confidence. Process controls add certainty. The mistake is buying the first and skipping the second.

So that's the spine of what we build now: process first, detection as defense-in-depth on top. Detection tools earn their place flagging anomalies and buying you a moment of doubt. They do not earn the right to be the only thing between an attacker and your treasury.

The bill nobody saw coming: your insurance lapsed

For a long time the budget conversation with CFOs went nowhere. The loss is hypothetical until it happens, the spend is real today, and somewhere in the back of everyone's mind sat the assumption that cyber insurance would cover it anyway.

Then, in January 2026, that assumption quietly died.

Standard cyber policies now explicitly exclude "AI-generated intermediaries." D&O, E&O, and employment-practices policies are adding broad AI exclusions. The deepfake fraud that was your insurer's problem in 2024 is your problem in 2026 — unless you bought a separate deepfake endorsement, which runs $500 to $3,000 a year and which almost nobody has. If the Arup attack happens to you now, the $25.6 million is uninsured. Full stop.

That single change did more to move budget than every loss statistic I'd ever cited. It reframed the whole thing. This isn't "spend money to maybe avoid a hypothetical." It's "the coverage you were counting on is gone, and the liability landed on your desk." Courts are increasingly finding employers negligent for the absence of specific deepfake controls, and the old "impostor rule" puts the loss on whoever was best positioned to prevent the fraud. After January 2026, that's you.

What the regulators are about to require

There are two dates I tell every board to paint on the wall, because they convert this from a judgment call into a deadline.

August 2, 2026: the EU AI Act's Article 50 transparency obligations for deepfake content take effect, with penalties up to €35 million or 7 percent of global turnover. If you operate anywhere near the EU, the era of treating synthetic-media governance as optional ends on that date.

And since December 2023, the SEC has required material cybersecurity incidents to be disclosed on Form 8-K within four business days. A $25-million-plus deepfake fraud almost certainly clears the materiality bar. So picture the actual sequence: you discover the fraud, and now a four-day clock is running on a public filing describing exactly how your wire controls failed — while the money is already gone. The disclosure rule turns a private loss into a public one with your investors reading along.

There's a harder, quieter tension underneath the compliance story, and it's where the consultancy frameworks tend to wave their hands. The behavioral biometrics that make continuous authentication actually work — keystroke dynamics, mouse patterns, the signals GetReal-style tools correlate — are precisely the data that triggers Illinois's biometric privacy law. BIPA produced 107-plus class actions in 2025; Clearview AI settled one for $51.75 million. GDPR Article 9 treats biometric data as a special category needing explicit consent. So your best detection signal is also your biggest privacy liability if you deploy it without consenting your employees properly. Solving the security problem the careless way creates a litigation problem. Mapping each control to BIPA, GDPR, the SEC rule, and the ISO 30107 and CEN/TS 18099 testing standards isn't paperwork — it's how you avoid trading one eight-figure exposure for another.

Can't I just train employees to spot the fakes?

People always ask me whether employee awareness training solves this. Just teach everyone to spot the fakes.

I wish. Human detection of deepfakes runs around 50 percent — about what you'd get from chance, and the technology improves every month. We are, as a species, bad at this. The Arup employee wasn't careless; they were skeptical enough to push back initially. The video call is what overcame their judgment, because faces and voices we recognize bypass the analytical brain. Training helps people pause and invoke the process. It does not turn anyone into a reliable human detector, and any program sold on that premise is selling comfort.

The other question I get is whether this is overblown — a couple of scary headlines. It isn't. US deepfake fraud losses tripled from roughly $360 million in 2024 to $1.1 billion in 2025. The average enterprise incident now runs around $680,000, and CEO-fraud campaigns are hitting hundreds of companies a day. Synthetic identity kits sell for about five dollars on the dark web; a fake video starts at fifty. The economics have inverted — it now costs almost nothing to attack and a fortune to be unprepared.

And it's about to get worse in a specific way. Nearly half of security professionals expect agentic AI — systems that chain together reconnaissance, deepfake generation, and social engineering without a human in the loop — to be a top attack vector by the end of 2026. The fifty-dollar afternoon I spent fooling a detector becomes a fully automated pipeline that runs against hundreds of targets while the operator sleeps.

What we actually build

So when an enterprise comes to us, we don't show up with a product to sell. We're vendor-neutral by design — we don't resell any of the tools I named, which is the only honest way to tell a client that the $500K consultancy framework and the slick Zoom plugin are both, on their own, insufficient.

We start with the process layer, because it's the highest-ROI intervention and it doesn't require buying anything: out-of-band verification workflows, dual authorization above thresholds, the callback discipline that makes detection accuracy almost irrelevant to whether the money is safe. Then we architect the detection stack around your actual conferencing environment — picking the right combination of modalities from those twenty-odd vendors instead of a single-platform pitch, and closing the injection-attack seam the liveness tools leave open. We map every control to the regulations that now bear on you. And we red-team it: we run the synthetic attack against your existing defenses to find the gap before a criminal does — the same afternoon exercise I ran at the start, turned into a service. If you want the full architecture, it's laid out on our deepfake defense page.

I'll leave you with what I tell every board once the demo lands and the room goes quiet.

The control that survives every version of this — the deepfaked face, the injected feed, whatever they build next — is the pre-registered number a treasury analyst dials before the money moves. It is the least sophisticated thing in your security stack, and after January 2026 it is the only thing standing between a synthetic face and an uninsured loss.

Related Research