An audio waveform carrying a hidden watermark passes through a transcoder and emerges clean, its mark erased.

Artificial IntelligenceMusicMedia

Audio Watermarking Was Supposed to Survive the Internet. Ours Died at the First Transcoder.

Ashutosh Singhal May 16, 202614 min read

We embedded a watermark in a track, ran it through every test we had, and watched it survive. MP3 compression at three bitrates — survived. A round-trip through a consumer player — survived. Then we handed the file to a distribution partner whose ingest chain re-encoded everything to Opus before it reached listeners, and the mark was gone. Not degraded. Gone. The song came out the other end clean, unmarked, indistinguishable from a file that had never carried any provenance at all.

That was the morning I understood that audio provenance — the ability to prove where a piece of audio came from and that it is what it claims to be — is not a watermarking problem. It's a survival problem. A watermark that lives in your lab and dies in the real content journey is worse than no watermark, because it tells you you're protected when you aren't.

I run a small team that builds audio licensing, watermarking and provenance pipelines for media companies. I want to tell you how we got the product wrong, what the EU AI Act is about to force on everyone in this business, and why the hard part was never the algorithm.

The bet I backed that fell apart in six weeks

When we started, the pitch was clean and I believed it: generative audio tools were legally radioactive, so be the safe alternative. If you used Suno or Udio for a commercial track, you were renting a lawsuit. We would build the licensed, defensible path instead — commissioned voices, clean chain of title, zero copyright exposure. I sold that story to my own team and to the first buyers I talked to.

Then the ground moved. On October 30, 2025, Universal Music Group settled with Udio and signed a strategic agreement for a new licensed platform launching in 2026, trained on a UMG-approved corpus. Less than four weeks later, on November 25, Warner Music Group settled with Suno and announced a joint venture to build licensed, opt-in AI music. The "you're renting a lawsuit" argument I'd built the company around softened in a six-week window.

I spent a few days convinced we'd missed the market. We hadn't — but I had to stop reading the news as a vendor and start reading it as the rights-tech architect I'd spent years being. When I looked at the actual settlement terms, the buyer's pain hadn't disappeared. It had moved.

The licensed tools solved the lawsuit. They created a new prison: the walled garden.

Read the fine print on those settlements. In Udio's transition product, users cannot download or export the works they make — creations are locked on-service for consumption only. On Suno's new licensed models, only paid-tier subscribers can download off-platform, and those downloads are capped. A media company that needs an asset to ship across broadcast, streaming, social, cinema and in-game cannot use a tool that won't let the asset leave the building. The legal question was answered. The ownership and portability question was wide open.

So what is actually broken now?

Once I stopped defending the old thesis, the buyer's problem resolved into three questions, and none of them was "is this legal."

The first is portability. Walled-garden outputs can't travel, and most commercial use cases are broken before they start because the asset can't ship across every surface a campaign or a release touches.

The second is registrability, and it surprises people. The US Copyright Office has held since early 2025 that prompt-only outputs aren't copyrightable. So even a pristine, licensed AI track can be uncopyrightable — which means a competitor can free-ride on your AI jingle with impunity, and you have zero downstream IP to defend. The Anthropic settlement in September 2025, where a federal judge found that training on pirated material was transformative but the acquisition of it was wrongful, put a roughly $3,000-per-work number on the table that music-industry analysts treat as a floor, not a ceiling. The legal exposure didn't vanish with the settlements; it changed shape.

The third is detectability, and it has a date attached.

The clock: EU AI Act Article 50, August 2, 2026

Every other problem in this essay you could, in theory, choose to live with. This one has a date and a number, and it is what moves a buyer from interested to signing. Under Article 50 of the EU AI Act, effective August 2, 2026, anyone whose system generates synthetic audio must mark the output in a machine-readable format so it's detectable as artificially generated. The European Commission's first draft Code of Practice, published in January 2026, made the operational expectation explicit, and it's the part most people miss: metadata alone is not enough. The Code pushes a multi-layered approach — embedded metadata like C2PA manifests and imperceptible watermarking. Penalties under Article 99 run up to €15 million or 3% of global turnover.

So here is the position a media company is in. You need a marking-and-watermarking pipeline operational before August 2 if you touch EU audiences. The watermark has to survive the real content journey — the one that killed ours. And metadata, the easy half, is the half that gets stripped first.

Why doesn't C2PA just solve this?

Most teams reach for C2PA content credentials and think they're done. C2PA is good — the RIAA, Roland and Avid, the makers of Pro Tools, are all members, so the audio standard-bearers are at the table. But most social platforms strip C2PA metadata the moment you upload. The credential you so carefully attached is gone by the time the track reaches the surface where provenance actually matters.

The only thing that survives is soft binding: you embed a tiny identifier in the audio itself with a watermark, and that identifier points to a manifest stored in the cloud. The watermark survives the journey; the lookup restores the provenance on the other side. It's elegant, and it has a trap I've watched teams fall into. If your client is in the EU and your manifest store is hosted in the US, GDPR now applies to every manifest query. If the manifest holds creator identity, you owe privacy redaction under the C2PA spec — and almost nobody implements it. If the manifest store is offline when someone checks, the track has no provenance chain at all. The "easy" metadata half is where the legal and architectural landmines actually are.

And then there's the layer the buyer actually lives in: DDEX. In September 2025, Spotify and fifteen labels and distributors committed to a DDEX-based standard for AI disclosures in music credits — vocals, songwriting, production, the lot. Here's the problem I run into on every label engagement: the live DDEX ERN 4.3 delivery spec has no fields for that disclosure data. The extension is in draft. A label distributing through an aggregator like DistroKid or CD Baby inherits whatever the aggregator sends, and most aggregators aren't passing granular AI disclosure through yet. To be compliant with both Spotify's policy and Article 50 by August 2, that pipeline needs custom middleware. There is no setting to toggle.

The watermark that didn't survive — and the matrix that would have caught it

A pipeline diagram showing a watermark intact through DAW, WAV and DDEX, then lost at the Opus/AAC/MP3 transcoder.

Let me go back to the file that came out clean, because the fix isn't "use a better watermark." It's "stop trusting any single one."

The embedders are mostly free. Google's SynthID-Audio is baked into Lyria and NotebookLM, with a detector portal rolled out globally in November 2025, and over 10 billion pieces watermarked across modalities — but the audio detector is Google-controlled and only works on Google-generated output. Meta's AudioSeal is open-source under MIT, does sample-level localized detection, and is the strongest tool I've used for voice and speech — but its robustness on music is weaker, and that gap is exactly where files die.

This is where the research-grade numbers earn their keep. XAttnMark, published at ICML 2025, holds 68% detection under waveform adversarial attacks where AudioSeal drops to 15%, and 91–94% detection through generative edits from tools like Stable Audio. AWARE, an October 2025 method, reports a 1.61% bit error rate even after a track is run through a neural vocoder — the kind of resynthesis a voice-cloning pipeline performs. None of these has commercial support. None of them is a product. They're inputs.

A watermark that survives MP3 and dies in Opus isn't a weak watermark. It's an untested one.

The discipline that would have saved us is unglamorous: a survival matrix. You take your specific ingest chain — DAW to WAV to DDEX delivery to a multi-bitrate transcoder spitting out Opus, AAC and MP3, to a CDN, to a player, to someone screen-recording it off a phone, to a TikTok re-upload that re-encodes the whole thing — and you run every candidate watermark through every stage and measure what's left. We hadn't tested the Opus leg because our lab didn't have one. The transcoder did.

And for anyone in broadcast, there's a worse stage: the analog gap. Radio and TV outputs get captured by a second device's microphone in a car or a kitchen. A watermark that dies in room reverb is useless for broadcast provenance. AudioSeal's autocorrelation approach tends to survive that; most LSB-style watermarks don't. You only know which by running it.

The fraud number nobody at the label said out loud

The first time I sat with a DSP's trust-and-safety lead, the conversation wasn't about the EU at all. It was about a number they'd been quietly absorbing.

Deezer's own data: AI-generated tracks rose from 18% of new uploads in June 2025 to 28% by September — more than a quarter of everything coming in the door. And 70% of the plays on AI-only tracks are fraudulent. Put those two numbers together and roughly a fifth of everything delivered is potentially fraudulent — not a spam problem at the edges but a structural leak in the royalty pool. Deezer now excludes those from royalty payments and, as of January 2026, is licensing its patented detector to rival platforms. Spotify removed more than 75 million spam tracks in twelve months and introduced a 1,000-stream minimum before royalties trigger — which fraud rings simply game by making sure each of ten thousand AI tracks clears about 1,050 streams. Industry-wide, Beatdapp and Beatport estimate $2–3 billion a year in fraudulent royalty diversion. For a mid-tier DSP, that's a nine-figure hole in their own royalty pool.

This is where the DSP buyer and the label buyer turn out to want different things from the same pipeline. The label wants provenance and disclosure. The DSP wants to keep fraudulent AI floods out of its royalty pool — and that's a fingerprinting and behavioral-detection problem, not a watermarking one. Pex's Attribution Engine matches fingerprints against a registry in under five seconds and can even identify whether a track came from Suno or Udio. Beatdapp, which raised $17 million and partners with UMG and the MLC, catches the fraud behaviorally. Watermarking and fingerprinting aren't competitors; they answer different questions, and a real platform needs both.

What about the ad agencies?

The agency buyer keeps me up more than any of the others, because the exposure is so concentrated and so few of them see it.

Suno's Pro and Premier plans explicitly do not include indemnification. An agency that drops an AI-generated jingle into a national spot is carrying the rights risk itself — and most haven't negotiated AI-specific indemnity into their client MSAs, even though the 4A's guidance now tells them to. A GC at one agency walked me through their tool's terms of service on a call and went quiet when we got to the indemnification section, because there wasn't one. If that jingle triggers a rights claim mid-flight, the agency eats the pulled campaign, the re-shoot, the media reschedule and the reputational damage.

And the voice itself is now property. Tennessee's ELVIS Act, effective July 2024, makes cloning an artist's voice without consent a misdemeanor with civil remedies. California's AB 2602 gives performers contract protections. The federal NO FAKES Act, advancing in the 119th Congress, would create a property right in digital voice replicas plus notice-and-takedown for platforms. Forty-eight states now have some form of deepfake or voice-cloning law. The chain of title on a voice isn't a nicety anymore; it's the thing that keeps the campaign from becoming a lawsuit.

What we actually build now

Five standards — SynthID, AudioSeal, XAttnMark, C2PA manifests, DDEX disclosures — converging into one multi-standard detection layer.

So we stopped trying to be the safe-generation alternative. I can't out-ship Google on an embedding algorithm or out-detect Beatdapp on fraud, and pretending otherwise was the original mistake. What no single vendor does — and what every media company I talk to is missing — is the integration.

A typical buyer has six to twelve disconnected systems: a digital asset manager, a media asset manager, the DAW, the rights-admin database, the DDEX distribution pipeline, a C2PA verifier, a fingerprint database, a fraud detector, an internal review queue. Each works. Nothing talks. The provenance chain breaks at every seam. What we design is the thing that makes them hum together — and the policy that goes with it. The hardest questions are never technical. Who gets the alert when a watermark turns up on a track uploaded to the distributor? What's the takedown SLA? How does the artist opt-out list get synced to the generator? Those answers don't live in a GitHub repo.

We build a multi-standard detection layer, because no buyer can bet on one watermark winning — it has to read SynthID, AudioSeal and XAttnMark outputs, cross-reference C2PA manifests, and match against DDEX disclosures. We build the takedown workflow, because the real question after the next "Heart on My Sleeve" moment — that 2023 fake Drake/Weeknd track that hit 20 million views before it was pulled — isn't "can we prove it's fake." In 2026 a convincing clone takes about ninety minutes to make. The question is how fast you can take it down and notify the artist, the label, the DSPs and the social platforms in parallel. The watermark is one input to that workflow, not the workflow.

And for the clients who genuinely need licensed voice transformation — podcast localization, audiobook narration, accessibility, dubbing — we build a real voice bank, which is mostly an economics problem. A commissioned actor recording 45 minutes of clean singing across a few genre styles runs $8,000 to $18,000 depending on union status and buy-out scope. Covering a usable range of age, gender and accent is 45 to 75 actors — call it $360,000 to $1.35 million in capex before a dollar of client revenue. Which is exactly why the answer is almost never a general-purpose bank and almost always a targeted library: twenty actors in four languages for podcast localization beats a vanity catalog you'll never amortize. Scraped celebrity-voice models off the internet aren't an option — the ELVIS Act and AB 2602 make them a liability, not an asset. That's the engagement: the end-to-end audio licensing, watermarking and provenance pipeline, designed for the specific systems and the specific deadline a given buyer is facing.

People ask me whether the licensed Suno and Udio platforms will just make all of this unnecessary. They won't, for the same reason the walled garden created the opening: those tools solve generation, not the chain of custody around an asset that has to live in your library, ship to every surface, and stand up to a regulator's audit. They also ask whether they can wait and see what watermark standard wins. You can't — Article 50 doesn't wait for the standards war to resolve, which is the whole reason the detection layer has to be multi-standard from day one.

The track that came out of that transcoder unmarked is the one I think about. It had a real artist behind it and a real license, and for the length of one re-encode it became indistinguishable from a deepfake — no way to prove it was anything at all. On August 2, 2026, that gap stops being an engineering embarrassment and becomes a compliance failure with a €15 million number attached. Everyone in this business has that transcoder somewhere in their chain. The only question is whether they meet it before the regulator does, or after.