The Problem
In February 2024, an employee at the global engineering firm Arup joined a video call with the company's CFO and several senior executives. Every face looked right. Every voice sounded right. The CFO ordered 15 wire transfers to five different bank accounts. The employee complied. Total loss: $25.6 million.
Here is the terrifying part. None of those people were real. Every executive on that call was an AI-generated deepfake — a synthetic replica built from publicly available YouTube videos and conference footage. The attackers spent months harvesting video and audio of real Arup leaders. They trained AI models to replicate not just faces, but specific speech patterns and micro-expressions. The result was a cast of "high-fidelity synthetic twins" convincing enough to fool a trained finance professional in a live meeting.
No malware was used. No passwords were stolen. No database was breached. Arup's digital infrastructure remained fully intact throughout the entire incident. The attackers did not hack the company's systems — they hacked the employee's trust in what they saw and heard. The fraud was only discovered when the employee later contacted the real CFO's office in London. No such meeting had ever taken place.
This was not a failure of your firewalls. It was a failure of the assumption that seeing is believing.
Why This Matters to Your Business
The $25.6 million loss at Arup is dramatic, but the real threat is structural. Your organization likely relies on video calls to confirm identity and authorize high-value decisions every day. That assumption is now broken.
The numbers tell the story of a rapidly escalating threat:
- Injection attacks — where attackers feed fake video directly into conferencing software like Zoom or Teams — increased 255% in 2023.
- Face-swap attacks — where an attacker's live webcam feed is replaced frame-by-frame with a deepfake — rose 704% in the same period.
- The cost to generate a convincing deepfake has dropped to approximately $15 and 45 minutes of effort.
Beyond the direct financial hit, your leadership faces growing personal exposure. CIOs and CTOs now face a higher standard of fiduciary care when it comes to deepfake-aware controls. Failure to implement reasonable security procedures could result in personal liability if shareholders or clients sue for negligence. Courts increasingly follow the "Impostor Rule," which places losses on the party best positioned to have prevented the fraud.
Your compliance obligations are expanding too. The EU AI Act, the NIST AI Risk Management Framework, and standards like ISO/IEC 30107-3 for biometric attack detection all now expect organizations to defend against synthetic media threats. If your verification processes still treat a video call as proof of identity, you have a gap that regulators and plaintiff attorneys will find.
This is not a technology problem for your IT team. It is a financial controls problem for your board.
What's Actually Happening Under the Hood
To understand why this attack worked, think of it like a perfect counterfeit bill. A counterfeiter does not need to rob the mint — they just need to produce a bill good enough that the cashier accepts it. The Arup attackers did the same thing with human faces and voices.
They used two types of AI models working together. Generative Adversarial Networks (GANs) — systems where two AI models compete to create and detect fakes — handled real-time face swapping. One model generates a fake face; the other tries to spot it. After millions of rounds, the generator produces faces that even the detector cannot distinguish from real ones. Diffusion models — AI that learns to build clear images from random noise — ensured the deepfake face did not flicker or distort during movement. This "temporal consistency" is critical because your eye is extremely sensitive to visual glitches.
The most important technical detail is how the fake video reached the call. The attackers did not hold a screen up to a camera. They used a "video injection attack," feeding synthetic video packets directly into the conferencing software's data stream. The application treated this digital feed as if it came from a real camera. Standard liveness checks — the kind that look for depth and physical borders — cannot detect this method because there is no physical artifact to analyze.
This is the specific failure mode: your conferencing tools trust whatever video stream they receive. They have no way to verify whether that stream comes from a real camera pointed at a real person.
What Works (And What Doesn't)
Let's start with three approaches that do not solve this problem:
Standard multi-factor authentication. The Arup attackers bypassed MFA entirely because the attack did not target login credentials. They targeted human judgment on a live call. Your MFA protects your accounts, not your eyes.
Generic AI chatbots built on public APIs. These "LLM wrapper" tools — thin software layers that send your data to a third-party cloud for processing — are purely probabilistic. They predict the most likely next word rather than verify facts against your actual records. They cannot detect deepfakes, and they introduce new risks by sending your sensitive financial data outside your network.
Traditional phishing training. Most training programs prepare employees for suspicious emails, not live video calls featuring convincing replicas of their boss and five colleagues. The Arup employee was initially skeptical of the phishing email. The video call was specifically designed to overcome that skepticism.
Here is what actually works — a layered defense that verifies identity through signals AI cannot yet fake:
1. Input verification — Behavioral biometrics as a silent guardian. Before any high-value transaction, your system should continuously analyze how the person interacts with their technology. Keystroke speed, mouse movement patterns, and touchscreen pressure create a unique behavioral profile for each executive. These neurobiological patterns are nearly impossible to forge. If the "CFO" on your call is asking for a $25 million transfer while their typing behavior deviates from their historical profile, your system should flag the interaction automatically.
2. Processing — Out-of-band confirmation through a separate channel. Video calls can no longer serve as the final word on identity for financial transactions. Every high-value instruction should require independent verification through a pre-verified phone number, an encrypted messaging platform like Signal, or a pre-agreed verification code shared through a non-digital channel. Require a second approver who was not on the original video call.
3. Output — Cryptographic provenance and physiological verification. The C2PA standard embeds cryptographic metadata at the moment of video capture, creating a tamper-evident chain of custody for every frame. If a video feed in a Teams or Zoom call lacks these credentials, treat it with the same suspicion you would give an unsigned contract. Pair this with physiological signal analysis — technology that monitors heartbeat-induced changes in facial color invisible to the human eye. Synthetic video lacks these biological signals.
The critical advantage for your compliance and audit teams: this layered architecture creates a verifiable trail at every step. Your security assessment and hardening practice can map these controls directly to the NIST AI Risk Management Framework's four-step process — Govern, Map, Measure, Manage — and to ISO/IEC 30107-3 certification requirements for biometric attack detection.
For organizations in financial services, this architecture also connects directly to your existing regulatory risk and litigation readiness obligations. Every verification event is logged. Every override is documented. When auditors or regulators ask how you protect against synthetic media fraud, you can show them the logic trail rather than pointing to a training slide deck.
You can read the full technical analysis for the complete architectural specification, or explore the interactive version for a visual walkthrough of the defense framework.
Key Takeaways
- The Arup deepfake attack stole $25.6 million through a fake video call — no malware, no passwords stolen, no systems breached.
- Face-swap attacks rose 704% in 2023, and creating a convincing deepfake now costs about $15 and 45 minutes.
- Video calls can no longer serve as identity verification for high-value financial transactions — every major transfer needs out-of-band confirmation through a separate channel.
- Behavioral biometrics — analyzing how someone types and moves their mouse — can detect imposters that deepfake technology cannot yet mimic.
- CIOs and CTOs face growing personal liability if they fail to implement deepfake-aware controls before a breach occurs.
The Bottom Line
The Arup breach proved that AI-generated video is now good enough to fool trained professionals on live calls. Your defense must move beyond what people can see and hear to what technology can measure — behavioral patterns, physiological signals, and cryptographic proof of authenticity. Ask your AI vendor: if a deepfaked executive appeared on a video call and requested a wire transfer, which specific layer in your system would catch it, and can you show me the audit log?