An editorial image showing a film director's hand physically guiding/sculpting a partially-rendered AI-generated scene, representing human intent governing machine output.
Artificial IntelligenceMarketingBrand Strategy

I Watched Coca-Cola Spend Millions Teaching AI to Smile. The AI Couldn't.

Ashutosh SinghalAshutosh SinghalFebruary 2, 202614 min read

I was sitting in my office late one evening in November when a colleague pinged me a link. "You need to see this." It was Coca-Cola's 2025 "Holidays Are Coming" ad — the one generated entirely by AI. I watched it twice. The first time, something felt wrong but I couldn't name it. The second time, I could.

The trucks were red. The snow glistened. The polar bears lumbered across the screen. And none of it mattered, because every smile in that commercial was dead behind the eyes.

That ad became the most important case study in our work at Veriprajna — not because it was bad, but because it was almost good. And "almost good" is where brands go to die. The Coca-Cola AI ad is the clearest signal I've seen that the era of what I call the "LLM Wrapper" — slapping a nice interface on top of a foundational model like Sora or Runway and calling it a production pipeline — is over for any brand that cares about its reputation. Trust in ads made entirely by AI sits at 13%. Co-created with humans? 48%. That gap isn't a rounding error. It's a chasm.

This essay is about what sits on the other side of that chasm: hybrid AI workflows, where human intent governs machine velocity. It's the approach we've been building at Veriprajna, and it's the only architecture I believe can protect brand equity in the age of synthetic media.

The Ad That Broke the Spell

Here's what most people missed about the Coca-Cola debacle. It wasn't cheap. It wasn't lazy. The production team reportedly generated over 70,000 video clips to assemble a single 30-second spot. Two studios — Secret Level and Silverside AI — were involved. Coca-Cola's head of generative AI publicly insisted the craftsmanship was "ten times better" than their previous AI attempt.

And the public still hated it.

The comments were brutal. "Soulless." "Dystopian." My personal favorite, dripping with the kind of anger only a betrayed fan can muster: "Coca-Cola is red because it's made from the blood of out-of-work artists."

I remember pulling up the ad frame-by-frame with my team, trying to articulate exactly what was failing. One of our designers pointed at the screen and said, "The truck has a different number of wheels in this shot than it did three seconds ago." She was right. We started counting. The cabin shape shifted between cuts. The chassis floated over the snow like a hovercraft — no suspension, no weight transfer, no friction.

But the real problem wasn't the trucks. It was the people. Or rather, the non-people.

Why Can't AI Smile?

This is the question that sent me down a research rabbit hole I'm still climbing out of. A genuine human smile isn't just a mouth shape. It involves an involuntary contraction of the orbicularis oculi — the muscle around the eye — creating what psychologists call the "Duchenne marker." It's the difference between a smile that reaches the eyes and one that stops at the lips. We're biologically wired to detect the difference, even if we can't consciously articulate it.

Diffusion models don't know this. They operate on pixel-level probability distributions, not anatomical rules. They've seen millions of images tagged "smile" and learned to reproduce the geometry of a smile. But they cannot reproduce the physics of one.

Generative models produce visually plausible but emotionally hollow content. We call this "Aesthetic Hallucination" — the image looks right, but it feels wrong.

That term — Aesthetic Hallucination — is something we coined at Veriprajna to describe this specific failure mode, and I think it's the most important concept for any brand leader to understand right now. It's not about resolution or rendering quality. It's about the gap between what looks real and what feels real. The Coca-Cola ad had beautiful textures. Snow that glistened. Light that bounced off chrome. And smiles that made your skin crawl.

A ByteDance Research study published in 2025 confirmed what we were seeing in practice: video generation models like Sora and Gen-3 do not learn Newtonian physics. They memorize visual transitions. They can reproduce the appearance of a truck driving because they've seen thousands of driving videos, but they don't understand suspension, friction, or weight. The researchers found a hierarchy of what these models get right: Color > Size > Velocity > Shape. Color is almost always accurate — hence the perfect Coca-Cola red. Shape is where things fall apart. The model ensures the truck is red in every frame but "forgets" how many wheels it has because it generates video in latent chunks without a unified 3D representation.

This is why the liquid in AI-generated beverage ads looks like mercury. The model nails the caramel color but has no concept of volume conservation. It doesn't know that liquid can't appear and disappear inside a glass.

What Does "Prompt and Pray" Actually Look Like?

A side-by-side comparison diagram contrasting the "Prompt & Pray" workflow (Coca-Cola's approach) with the "Human-in-the-Loop" workflow (Veriprajna's approach), showing why one fails and the other succeeds.

I want to be concrete about what the Coca-Cola workflow actually was, because understanding it explains why it failed.

The team typed prompts into generative video tools. The tools produced clips. The team watched thousands of those clips, hoping to find ones that looked coherent enough to cut together. This is what I call the "prompt and pray" methodology, and it's the dominant approach in what I consider the "wrapper era" of AI video production. You write a description of what you want. You hit generate. You cross your fingers.

Seventy thousand clips. For thirty seconds.

That number haunted me. It meant the creative process had been reduced to a curation task — sifting through an ocean of hallucinations to find the few that looked least wrong. The director wasn't directing. The director was filtering. There's a world of difference.

When the creators at Silverside AI were asked about the backlash, they compared it to the early resistance to CGI in Toy Story. I found this comparison almost offensively wrong. Toy Story used technology to tell a story that couldn't be told any other way — the inner life of toys. Coca-Cola used technology to retell a story that had already been told better with practical effects thirty years ago. The AI didn't add anything. It subtracted humanity.

The narrative shifted from "Coca-Cola is innovative" to "Coca-Cola is cheap." That's a brand equity catastrophe dressed up as a technology showcase.

I wrote about this dynamic in much more depth in the interactive version of our research, including the Toys 'R' Us case — where an AI-generated child actor triggered such visceral rejection that brand sentiment plummeted overnight.

Why Did Nike's AI Ad Win a Cannes Grand Prix?

This is the part of the story that gives me hope.

Around the same time brands were getting destroyed for AI-generated slop, Nike released "Never Done Evolving" for their 50th anniversary. The concept: simulate a tennis match between 1999 Serena Williams and 2017 Serena Williams. It won a Grand Prix at Cannes. Universal acclaim. No backlash.

The difference wasn't budget. It was architecture.

Nike didn't ask an AI to imagine Serena. They fed a machine learning model real archival footage of her gameplay — years of it — and used it to analyze her speed, shot selection, and reactivity at different points in her career. The AI calculated possibilities based on reality. It was a time machine, not a fabrication engine. Stanford's "vid2player" technique generated behaviorally accurate player sprites based on domain knowledge of tennis physics. Then human compositors and editors ensured the visual fidelity and narrative pacing.

The AI generated the movements and the gameplay logic. Humans ensured it looked and felt like a Nike production.

This is the model. This is what works. And it's what we've been building toward at Veriprajna.

How Do You Use AI Without Losing Your Brand's Soul?

A three-phase pipeline diagram showing exactly how AI plays a different role in pre-production, production, and post-production, with the specific tools and techniques labeled at each stage.

I get asked this question constantly. Usually by CMOs who've seen the Coca-Cola headlines and are terrified of being next, but who also know they can't ignore AI entirely because their competitors won't.

My answer is always the same: don't let AI render the final pixel.

At Veriprajna, we've built what we call a Human-in-the-Loop architecture. It's not a philosophy. It's a literal production pipeline with human checkpoints at every layer. The principle is simple: human intent must govern machine execution. Not the other way around.

In practice, it breaks down into three phases, and the AI plays a different role in each.

In pre-production, AI is the dreamer. We use tools like Krea AI for real-time visualization — a designer sketches a layout and sees it rendered photorealistically in milliseconds. This cuts storyboarding costs by 60–80%. But nobody's committing to a final look. The director is "shooting" the commercial virtually, iterating on lighting and composition instantly, before a single camera rolls.

In production, humans capture what matters. For anything requiring emotional resonance — a face, a product interaction, a moment of genuine human connection — we film real talent. We use what I call the "Sandwich Method": film the hero elements (the actor, the product) on green screen or LED volumes, then use AI to generate high-fidelity backgrounds projected onto those LED walls. The actor interacts with real light from a synthetic scene. The emotion is real. The environment is generated.

In post-production, AI becomes the sculptor. This is where deep AI shines — not text-to-video generation, but video-to-video transformation. We composite real actors into synthetic environments. We apply consistent brand aesthetics using custom-trained LoRA (Low-Rank Adaptation) models — lightweight files trained on a brand's specific cinematography style. For a client like Nike, we'd train a LoRA on twenty years of their visual language. Every AI-generated frame feels like a Nike ad because the model has internalized those brand codes.

And we use ControlNet to lock the geometry. Instead of hoping a prompt preserves a product's exact shape, we feed the network a Canny Edge Map or Depth Map of the actual product. The AI generates around the exact silhouette. Lighting and backgrounds can be generative, but the product remains mathematically perfect — 94.2% structural integrity compared to the dice-roll of prompting alone.

What Actually Causes the "Flickering Truck" Problem?

The technical term is temporal inconsistency, and it's the single biggest barrier to enterprise AI video. It's why the Coca-Cola truck changed shape between cuts. It's why AI-generated characters morph when they turn their heads. The model doesn't maintain a unified representation of an object across frames — it regenerates from scratch each time, and each regeneration is a new probabilistic roll.

We solve this with a metric called Video Consistency Distance (VCD), which we integrate into our fine-tuning process. VCD measures the frequency-domain distance between a conditioning image and the generated frames. By penalizing high VCD values during training, we force the model to prioritize coherence. Models fine-tuned this way achieve 95.22% subject consistency and 96.32% background consistency on standard benchmarks.

For object permanence — the problem where a person walks behind a tree and the model forgets they exist — we anchor AI generation to 3D proxy scenes using NeRF (Neural Radiance Fields) integration. The AI "skins" a 3D blockout, combining the geometric logic of traditional CGI with the aesthetic flexibility of generative AI.

For the full technical breakdown of these pipelines, including our approaches to mode collapse and latent space manipulation, see our research paper.

The Argument I Keep Having

There's a conversation I've had probably fifty times in the last year. It usually starts with someone saying, "But the models will get better. In two years, Sora will be able to do all of this."

Maybe. Probably, even, for certain narrow tasks. But this argument misses the point entirely.

The question was never "Can AI generate a technically flawless video?" The question is "Should your brand's emotional identity be a function of a probability distribution?"

Even if the flickering trucks get fixed and the dead eyes learn to crinkle, you're still left with the trust problem. 44% of consumers are actively bothered by AI-generated content. NielsenIQ found that even polished AI ads cause a "negative halo effect" — viewers labeled them "annoying," "boring," and "confusing" even when the visual quality was high. The damage extends beyond the individual campaign to the brand itself.

Dove built an entire campaign — "The Code" — around rejecting AI distortion of human bodies. It was a massive brand equity win. They turned the threat into a differentiator. For categories like beauty, food, wellness, and luxury, "real" isn't a limitation. It's a premium.

The brands that win with AI don't use it to replace humanity. They use it to amplify stories they couldn't afford to tell before.

Heinz proved this brilliantly. They asked AI to generate images of "ketchup" and showed that every model defaulted to a Heinz bottle. They turned the AI's bias into proof of brand dominance. The hallucination was the feature. It was transparent, funny, and it worked because the brand was in on the joke rather than trying to fool anyone.

The Part Where I Admit What Keeps Me Up at Night

I'll be honest about something. The thing that worries me isn't that AI video will stay bad. It's that it'll get just good enough that lazy brands will settle for it, and the market will be flooded with content that's technically passable but emotionally vacant. The term people are already using is "AI slop" — high-volume, low-effort synthetic content that fills feeds without saying anything.

My fear is normalization. That consumers will stop expecting craft. That a generation of viewers will grow up thinking the plastic sheen and the dead eyes are just what ads look like.

We had a team meeting about this a few months ago that turned into a genuine argument. One of our engineers made the case that consumers will adapt — that the uncanny valley will shrink as exposure increases. Our creative director pushed back hard. "People didn't adapt to bad food just because fast food got everywhere," she said. "They developed a taste for quality. The same thing will happen here."

I think she's right. The data supports her. The backlash against Coca-Cola wasn't from a niche group of AI skeptics. It was mainstream. Consumers are developing a sixth sense for synthetic content, and the penalty for getting caught is steeper than the savings from cutting corners.

The next frontier — what researchers call "World Models" — will eventually give AI an understanding of physics, not just pixels. ByteDance estimates meaningful progress by 2026–2027. Until then, the hybrid workflow is the only safe bridge. It lets you harness the rendering power of today's AI while borrowing the physical and emotional intelligence that only human creators possess.

The Question That Actually Matters

Every enterprise leader I talk to asks the same question: "How much money can AI save us on production?"

It's the wrong question. It leads directly to the uncanny valley — to 70,000 generated clips and a 30-second ad that makes people feel nothing.

The right question is: "What stories can AI help us tell that we couldn't afford to tell before?"

Nike didn't save money with "Never Done Evolving." They spent plenty. But they created something impossible without AI — a match between two versions of the same athlete separated by eighteen years. That's not cost optimization. That's creative expansion.

Stop asking how AI can make your production cheaper. Start asking how it can make your storytelling braver.

The novelty phase is over. "Look what the AI made" doesn't impress anyone anymore. The new standard — the only standard that will matter in 2026 and beyond — is "Look what we made with AI." The emphasis lands squarely on the we.

The brands that understand this will build legends. The ones that don't will spend millions teaching an algorithm to smile and wonder why nobody smiles back.

Related Research